[Dspace-tech] how can I find out the collectionID?

15 views
Skip to first unread message

Pan Family

unread,
Aug 24, 2015, 4:24:32 PM8/24/15
to dspac...@lists.sourceforge.net
dsrun org.dspace.app.itemimport.ItemImport --add --eperson=j...@user.com  --collection=collectionID --source=items_dir --mapfile=mapfile

Hi,

The above command for batch import requires
the collectionID as input. I wonder how
I can find out this ID? Is it the string
that I used to name my collection, or an ID
that DSpace uses internally?

Thanks a lot!

-Pan

Dorothea Salo

unread,
Aug 24, 2015, 4:24:34 PM8/24/15
to dspac...@lists.sourceforge.net
You can use the collection's handle for this; go to the collection's home page
and use the numbers after "handle/" in the URL.

If you should need the internal DSpace collection ID for some reason, though,
log in, surf to the collection page, and then use the "Edit" button under Admin
Tools. From there, choose "Collection's Authorizations," and DSpace will pop up
the "DB ID" in the title of the page.

(I hope there's an easier way to do this! There certainly should be.)

Dorothea

--
Dorothea Salo, Digital Repository Services Librarian
(703)993-3742 ds...@gmu.edu AIM: gmumars
MSN 2FL, Fenwick Library
George Mason University
4400 University Drive, Fairfax VA 22031

Pan Family

unread,
Aug 24, 2015, 4:24:35 PM8/24/15
to Dorothea Salo, dspac...@lists.sourceforge.net
Hi Dorothea:

Thanks a lot for your help!
In my case, the handle is 123456789/2.
So I used the following command to add
a pdf file under /User/pan/tmp, but somehow
the pdf file was not added into the collection
and the file test_map is empty.  No error
message was shown either.  I wonder what
I did wrong.  Could you give me some ideas
on how to debug?

Thanks again,

-Pan

bubba:~/dspace-1.4.1-source /bin pan$ dsrun org.dspace.app.itemimport.ItemImport --add --eperson=pan.f...@gmail.com --collection=123456789/2 --source=/Users/pan/tmp/ --mapfile=/Users/pan/tmp/test_map
Destination collections:
Owning  Collection: PODAAC collection
Adding items from directory: /Users/pan/tmp/
Generating mapfile: /Users/pan/tmp/test_map


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
DSpac...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Jayan Chirayath Kurian

unread,
Aug 24, 2015, 4:24:36 PM8/24/15
to Pan Family, Dorothea Salo, dspac...@lists.sourceforge.net

Can you please try with source=/Users/pan/

I encountered the same problem on windows platform. This was rectified by giving the main folder name with the import command. I assume that “pan” contains the subfolder “tmp” which infact contains the pdf file. Hope you will let me know if this works with you.

 

Thanks,

Jayan

 


Pan Family

unread,
Aug 24, 2015, 4:24:42 PM8/24/15
to Jayan Chirayath Kurian, dspac...@lists.sourceforge.net, Dorothea Salo
Thanks for your help!

I am working on Mac OS X.  Yes, "pan" contains "tmp"

It seems that for me the dir that I give to source= cannot contain any
subdirs.  For example, if I give it "/Users/pan/" I got an error
complaining about the missing file ".fvwm/dublin_core.xml"
.fvwm is a subdir under "Users/pan/"

If I give it "/Users/pan/tmp/"
then it complains about the same missing file under the subdirs
of "tmp" until I removed all the subdirs under "tmp"
But I still don't get the files under "tmp" imported to my collection,
even if no error shows after I removed all subdirs.

bubba:$ dsrun org.dspace.app.itemimport.ItemImport --add --eperson=pan.f...@gmail.com --collection=123456789/2 --source=/Users/pan/ --mapfile=/Users/pan/test_map --test
**Test Run** - not actually importing items.
Destination collections:
Owning  Collection: PODAAC collection
Adding items from directory: /Users/pan/
Generating mapfile: /Users/pan/test_map
Adding item from directory .fvwm
java.io.FileNotFoundException : /Users/pan/.fvwm/dublin_core.xml (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream .<init>(FileInputStream.java:66)
        at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
        at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java :161)
        at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
        at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse (Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.DOMParser.parse (Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
        at org.dspace.app.itemimport.ItemImport.loadXML (ItemImport.java:1269)
        at org.dspace.app.itemimport.ItemImport.loadDublinCore(ItemImport.java:795)
        at org.dspace.app.itemimport.ItemImport.loadMetadata(ItemImport.java:780)
        at org.dspace.app.itemimport.ItemImport.addItem (ItemImport.java:626)
        at org.dspace.app.itemimport.ItemImport.addItems(ItemImport.java:498)
        at org.dspace.app.itemimport.ItemImport.main(ItemImport.java:407)
java.io.FileNotFoundException: /Users/pan/.fvwm/dublin_core.xml (No such file or directory)
***End of Test Run***

Jayan Chirayath Kurian

unread,
Aug 24, 2015, 4:24:45 PM8/24/15
to Pan Family, dspac...@lists.sourceforge.net, Dorothea Salo

I feel the tmp directory should have (1) the Dublin_core.XML (2) contents file and (3) actual resource. The tmp directory should have all these files without any more subdirectories for these files. Can you try with source=/Users/pan/ and removing all subdirectories under tmp and having only these 3 files listed above. Hope it works.

 

My structure is src = C:\DSpace\bin\archive_directory

The archive_directory contains the directory Item_001

Item_001 contains (1) Dublin_core.XML (2) contents file and (3) actual resource.

There are no more subdirectories under Item_001.

 

Thanks,

Jayan

 


Jayan Chirayath Kurian

unread,
Aug 24, 2015, 4:24:46 PM8/24/15
to Pan Family, dspac...@lists.sourceforge.net, Dorothea Salo

 

 


From: Pan Family [mailto:pan.f...@gmail.com]
Sent: Wednesday, January 31, 2007 1:15 PM
To: Jayan Chirayath Kurian
Cc: Dorothea Salo; dspac...@lists.sourceforge.net
Subject: Re: [Dspace-tech] how can I find out the collectionID?

 

Ok.  I will give this a try.

Still two questions:
(1) Where can I get the file Dublin_core.XML?

Dublin_core.xml contains the meta data descriptions of the resource (e.g. title, date published etc). You have to create the xml file using a notepad.

(2) Let's say I only want to index one file named: foo.pdf, and I put
     it under /Users/pan/tmp/foo.pdf and pass src=/Users/pan to dsrun
     Is foo.pdf considered the content file or the resource?  And which is
     the third type of file?

foo.pdf is the resource (i.e. pdf or ppt or jpeg…..)

Content file is a text file that just contains the name of the resource i.e. foo.pdf


Thanks a lot!

-Pan

Pan Family

unread,
Aug 24, 2015, 4:24:46 PM8/24/15
to Jayan Chirayath Kurian, dspac...@lists.sourceforge.net, Dorothea Salo
Ok.  I will give this a try.

Still two questions:
(1) Where can I get the file Dublin_core.XML?
(2) Let's say I only want to index one file named: foo.pdf, and I put
     it under /Users/pan/tmp/foo.pdf and pass src=/Users/pan to dsrun
     Is foo.pdf considered the content file or the resource?  And which is
     the third type of file?

Thanks a lot!

-Pan

Pan Family

unread,
Aug 24, 2015, 4:24:51 PM8/24/15
to Jayan Chirayath Kurian, dspac...@lists.sourceforge.net
Could you please kindly provide a sample Dublin_core.xml?

I assumed that dsrun would recursively go through the
directories and index all the files under them.  Apparently
I was wrong.  The requirement of
Dublin_core.xml and
the content file makes the process much less automatic.
Is there a way around this?

Jayan Chirayath Kurian

unread,
Aug 24, 2015, 4:32:55 PM8/24/15
to Pan Family, dspac...@lists.sourceforge.net

<?xml version="1.0" encoding="iso-8859-1" ?>

- <!--

 title of pdf AMIC_1984_10_CM_03.pdf

  -->

- <dublin_core>

  <dcvalue element="creator" qualifier="conference">AMIC-Chiangmai University Refresher Course on Communication Research Methodology : Chiangmai, Oct 29-Nov 2, 1984.</dcvalue>

  <dcvalue element="title" qualifier="none">The Logic of Social Science Research.</dcvalue>

  <dcvalue element="contributor" qualifier="author">Atal, Yogesh.</dcvalue>

  <dcvalue element="date" qualifier="issued">1984-10-29</dcvalue>

  </dublin_core>

Pan Family

unread,
Aug 24, 2015, 4:37:06 PM8/24/15
to Jayan Chirayath Kurian, dspac...@lists.sourceforge.net
Not yet.  I am still working on it.  I would like to avoid using
the GUI to submit.  Instead, I would like to be able to recursively
go through a dir and its sub-dirs and automatically crawl.
Has anybody done this before?

Thanks,

-Lei


On 2/1/07, Jayan Chirayath Kurian <Ja...@ntu.edu.sg> wrote:

You solved your problem in importing documents or are u using the interface to upload documents into the repository.

 

Jayan

 


From: Pan Family [mailto:pan.f...@gmail.com]

Sent: Friday, February 02, 2007 5:19 AM
To: Jayan Chirayath Kurian

Subject: Re: [Dspace-tech] how can I find out the collectionID?

 

Thanks a lot!

-Pan

On 1/31/07, Jayan Chirayath Kurian <Ja...@ntu.edu.sg> wrote:

<? xml version="1.0" encoding="iso-8859-1" ?>

- <!--

 title of pdf AMIC_1984_10_CM_03.pdf

  -->

- <dublin_core>

  <dcvalue element=" creator" qualifier ="conference">AMIC-Chiangmai University Refresher Course on Communication Research Methodology : Chiangmai, Oct 29-Nov 2, 1984.</dcvalue >

  <dcvalue element=" title" qualifier ="none">The Logic of Social Science Research. </dcvalue>

  <dcvalue element=" contributor" qualifier ="author">Atal, Yogesh. </dcvalue>

Pan Family

unread,
Aug 24, 2015, 4:37:07 PM8/24/15
to Jayan Chirayath Kurian, dspac...@lists.sourceforge.net
Hi Jayan (or anyone who knows how to do batch submission):

I am still unable to do batch submission.  Here is what I did:
(1) Created a directory, /Users/pan/tmp and put 3 files under it:
Content (a text file, attached); Dublin_core.xml (attached); and
batch_import.pdf (the doc I wanted to submit to DSpace);
(2) Ran:
pan$ dsrun org.dspace.app.itemimport.ItemImport --add --eperson=pan.f...@gmail.com --collection=123456789/2 --source=/Users/pan/tmp --mapfile=/Users/pan/test_map
Destination collections:
Owning  Collection: PODAAC collection
Adding items from directory: /Users/pan/tmp
Generating mapfile: /Users/pan/test_map

No error message was shown, but the pdf file was not imported.
An empty test_map file was generated.  I also ran filter-media
and found that all bitstreams were skipped because no new
doc has been added.

I found out from 1.4.1 beta 1 System Doc (pp. 22) that
there are batch tools and registration is an althernate means
to upload bitstreams, but no details or examples are provided.
Can you provide links to more details or examples please?


Thanks a lot for your help!

-Pan




On 2/1/07, Jayan Chirayath Kurian <Ja...@ntu.edu.sg> wrote:

You solved your problem in importing documents or are u using the interface to upload documents into the repository.

 

Jayan

 


From: Pan Family [mailto:pan.f...@gmail.com]

Sent: Friday, February 02, 2007 5:19 AM
To: Jayan Chirayath Kurian

Subject: Re: [Dspace-tech] how can I find out the collectionID?

 

Thanks a lot!

-Pan

On 1/31/07, Jayan Chirayath Kurian <Ja...@ntu.edu.sg> wrote:

<? xml version="1.0" encoding="iso-8859-1" ?>

- <!--

 title of pdf AMIC_1984_10_CM_03.pdf

  -->

- <dublin_core>

  <dcvalue element=" creator" qualifier ="conference">AMIC-Chiangmai University Refresher Course on Communication Research Methodology : Chiangmai, Oct 29-Nov 2, 1984.</dcvalue >

  <dcvalue element=" title" qualifier ="none">The Logic of Social Science Research. </dcvalue>

  <dcvalue element=" contributor" qualifier ="author">Atal, Yogesh. </dcvalue>

  <dcvalue element=" date" qualifier ="issued">1984-10-29 </ dcvalue>

  </dublin_core>

Content
Dublin_core.xml

Jayan Chirayath Kurian

unread,
Aug 24, 2015, 4:37:09 PM8/24/15
to Pan Family, dspac...@lists.sourceforge.net
i have Dspace 1.4.1 on windows 2003.
 
(1)My directory structure is C:\DSpace\bin\archive_directory
(2)The "archive_directory" contains the folder Item_001
(3) Item_001 folder contains (1) Dublin_core.XML (2) contents file and (3) test.pdf
please check the name of the file. It should be contents and not contents.txt
To rename contents.txt to contents, i used REN contents.txt contents at command prompt.
(4) dsrun org.dspace.app.itemimport.ItemImport -a -e=pan.f...@gmail.com -c=123456789/2 -s=C:\DSpace\bin\archive_directory -m=mapfile10
 
I hope this helps.
 
Jayan

 

From: Pan Family [mailto:pan.f...@gmail.com]
Sent: Sat 2/24/2007 11:02 AM
To: Jayan Chirayath Kurian
Cc: dspac...@lists.sourceforge.net; pan.f...@gmail.com

Pan Family

unread,
Aug 24, 2015, 4:37:11 PM8/24/15
to Jayan Chirayath Kurian, dspac...@lists.sourceforge.net
Yes, it did help!!!

Still two problems:
(1) ... element="creator" qualifier="conference" or qualifier="email" ...
caused some exception until I changed qualifier="none"
But in your example, "conference" was the qualifier.
Where can I find more info. on how to write good Dublin_core.xml?
(2) what is this about?  Can I ignore it?
Processing handle file: handle
It appears there is no handle file -- generating one

Questions:
(1) A map file is gnereated, but what is it for?
(2) What if I have several documents, each is an item,
under one directory, say Items_001?  Do I prepare
multiple corresponding .xml files?  Do I list all the
file names in the file contents?

Thanks!

-Pan

Jayan Chirayath Kurian

unread,
Aug 24, 2015, 4:37:12 PM8/24/15
to Pan Family, dspac...@lists.sourceforge.net
Your import is fine now ?
 
(1) It's fine if u have used none.I edited the metadata registry and added the conference qualifier for a second creator element. You can refer w3schools.com for basic XML.
(2) No problem.
 
(1) mapfile stores the details of files imported using batch import. You can note that incase u need to remove those imported files this mapfile is required. 
(2) For each item we have created a directory structure in archive_directory. i.e item_001, item_002 etc.
 
You are using Dspace for individual use or corporate organization.
 
Jayan


From: Pan Family [mailto:pan.f...@gmail.com]
Sent: Sat 2/24/2007 12:27 PM
To: Jayan Chirayath Kurian

Pan Family

unread,
Aug 24, 2015, 4:37:37 PM8/24/15
to Jayan Chirayath Kurian, dspac...@lists.sourceforge.net
Yes, I can import items in batch mode now.  Thanks!
I have also tried to import two items under two directories,
item_001 and item_002, and DSpace imported them all
at once, which is what I wanted.  But DSpace does not
seem to know that the items are already in its database
and it will import them as many times as I asked it to.
So it looks that for automatically importing only the delta
of a document collection spred out under directories and
sub-directories, I'll need to write some code. 
Has anyone done this before?

FYI, I am using DSpace for a distributed data center
at JPL, a Caltech laboratory.

Thanks,

Stephen De Gabrielle

unread,
Aug 24, 2015, 4:37:40 PM8/24/15
to Pan Family, dspac...@lists.sourceforge.net, Jayan Chirayath Kurian
Hi.

I think you can use the mapfile and --resume to import only items not
in the mapfile.

(mapfile is just a list of handle/folder pairs - one for each item imported)

--replace may also be useful for updating items

dsrun org.dspace.app.itemimport.ItemImport --replace
--eperson=j...@user.com --collection=collectID --source=items_dir
--mapfile=mapfile

"Replacing items uses the map file to replace the old items and still
retain their handles."
See http://dspace.org/technology/system-docs/application.html#itemimporter

I hope this helps.

Cheers,

Stephen
> opinions on IT & business topics through brief surveys-and earn cash
--

--
Stephen De Gabrielle

Pan Family

unread,
Aug 24, 2015, 4:37:52 PM8/24/15
to Stephen De Gabrielle, dspac...@lists.sourceforge.net, Jayan Chirayath Kurian
Thanks, Stephen!

I used --add --resume and it worked: If the items under my archive_dir
are the same, nothing is added.  But if I add new items under
the archive_dir, only the new items are added.

I assume that I can use the same mapfile in this way, and as
I grow the number of items under the archive_dir, my mapfile
will have more and more items listed in the file.  Correct?

--replace did not work for me.  I got NullPointerException,
as shown below.  What is the right way of using --replace?

Thanks,

-Pan
--------  error from --replace -------------
 dsrun org.dspace.app.itemimport.ItemImport --replace --eperson=pan.f...@gmail.com --collection=123456789/2 --source=/Users/pan/tmp/ --mapfile=/Users/pan/matfile2.txt
Destination collections:
Owning  Collection: PODAAC collection
        Replacing:  123456789/18
java.lang.NullPointerException
        at org.dspace.app.itemimport.ItemImport.deleteItem(ItemImport.java:692)
        at org.dspace.app.itemimport.ItemImport.replaceItems(ItemImport.java:567)
        at org.dspace.app.itemimport.ItemImport.main(ItemImport.java:411)
java.lang.NullPointerException


> -e= pan.f...@gmail.com -c=123456789/2 -s=C:\DSpace\bin\archive_directory
> org.apache.xerces.parsers.XMLParser.parse (Unknown Source)
> --eperson= pan.f...@gmail.com --collection=123456789/2

> --source=/Users/pan/tmp/ --mapfile=/Users/pan/tmp/test_map
> > > > Destination collections:
> > > > Owning  Collection: PODAAC collection
> > > > Adding items from directory: /Users/pan/tmp/
> > > > Generating mapfile: /Users/pan/tmp/test_map
> > > >
> > > >
> > > > On 1/29/07, Dorothea Salo < ds...@gmu.edu> wrote:
> > > >
> > > > Pan Family wrote:
> > > > > dsrun org.dspace.app.itemimport.ItemImport --add
> > > > > --eperson= j...@user.com  --collection=collectionID --source=items_dir

Stephen De Gabrielle

unread,
Aug 24, 2015, 4:37:56 PM8/24/15
to Pan Family, dspac...@lists.sourceforge.net
You might want to check the page in the version of dspace you are using in cvs or subversion at the dspace sourceforge site
 the doc's on the website are for dspace 1.3.

the proble with using resume that way is you end up with two copies of all your files.
you might want to look at trying to  'register' your files - its in the manual near import,  but doesn't make a copy of the file- just uses where it is.

As far as your error - I'd try looking at the mapfile with a text editor, and looking at the item in the system with the same handle. (It might not work if you have manually deleted the item, or the access right are wrong.  you could try running it a root...)

Cheers,

Stephen



On 2/28/07, Pan Family <pan.f...@gmail.com> wrote:
Thanks, Stephen!

I used --add --resume and it worked: If the items under my archive_dir
are the same, nothing is added.  But if I add new items under
the archive_dir, only the new items are added.

I assume that I can use the same mapfile in this way, and as
I grow the number of items under the archive_dir, my mapfile
will have more and more items listed in the file.  Correct?

--replace did not work for me.  I got NullPointerException,
as shown below.  What is the right way of using --replace?

Thanks,

-Pan
--------  error from --replace -------------
 dsrun org.dspace.app.itemimport.ItemImport --replace --eperson= pan.f...@gmail.com --collection=123456789/2 --source=/Users/pan/tmp/ --mapfile=/Users/pan/matfile2.txt
Destination collections:
Owning  Collection: PODAAC collection
        Replacing:  123456789/18
java.lang.NullPointerException
        at org.dspace.app.itemimport.ItemImport.deleteItem (ItemImport.java:692)
> --eperson= pan.f...@gmail.com --collection=123456789/2
> --add --eperson= pan.f...@gmail.com --collection=123456789/2

Jayan Chirayath Kurian

unread,
Aug 24, 2015, 4:38:07 PM8/24/15
to Stephen De Gabrielle, Pan Family, dspac...@lists.sourceforge.net

Hi! Pan,

 

For replacing items in Dspace

 

Say, you have a jpg item in one of the folders from which you are importing. You want to replace that item with a modified version of the jpg file. Save the modified version of the jpg in the respective folder and issue the command. The item will be replaced. The same can be applied if you want to replace a jpg item with a pdf item.

 

Thanks,

Jayan

 

C:\DSpace\bin>dsrun org.dspace.app.itemimport.ItemImport -a --replace -e nack@nt

u.edu.sg -c 123456789/153 -s c:\dspace\bin\archive_directory -m mapfile100

 

 

Using DSpace installation in: C:\DSpace

Destination collections:

Owning  Collection: First Collection

        Replacing:  123456789/174

Adding item from directory item_002

        Loading dublin core from c:\dspace\bin\archive_directory\item_002\dublin

_core.xml

        Schema: dc Element: date Qualifier: issued Value: 1971

        Schema: dc Element: title Qualifier: none Value: Mass Communication In P

akistan

        Schema: dc Element: contributor Qualifier: author Value: Abdus Salam Khu

rshid

        Processing contents file: c:\dspace\bin\archive_directory\item_002\conte

nts

        Bitstream: AMIC_1971_09_11.jpg

Processing handle file: handle

read handle: '123456789/174'

 

 

 


Reply all
Reply to author
Forward
0 new messages