[Dspace-tech] Dspace 1.5 cannot batch import

205 views
Skip to first unread message

upload

unread,
Aug 25, 2015, 11:05:55 AM8/25/15
to dspac...@lists.sourceforge.net

Hello,
I'm new to dspace and trying to upload using batch import.
When I do the batch import, there're no errors generated but when I go to
the URL for the collection, it doesn't show any submissions.

The command I run is:
/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport -a -e user@domain -c
123456789/10 -s /dspace/run -m /dspace/map/mapfile.txt

The contents, dublin_core.xml, and the actual document in pdf are located in
/dspace/run/

Upon running the command, I get:
Destination collections:
Owning Collection: Upload
Adding items from directory: /dspace/batch
Generating mapfile: /dspace/mapfiles/mapfile.txt

And the mapfile.txt is empty.

The log doesn't say much in /dspace/log/dspace.log:
2008-08-11 15:21:03,140 INFO org.dspace.core.ConfigurationManager @ Loading
from classloader: file:/data/dspace/config/dspace.cfg
2008-08-11 15:21:03,152 INFO org.dspace.core.ConfigurationManager @ Using
dspace provided log configuration (log.init.config)
2008-08-11 15:21:03,152 INFO org.dspace.core.ConfigurationManager @
Loading: /dspace/config/log4j.properties

The system is Redhat 5.
--
View this message in context: http://www.nabble.com/Dspace-1.5-cannot-batch-import-tp18947640p18947640.html
Sent from the DSpace - Tech mailing list archive at Nabble.com.


Dorothea Salo

unread,
Aug 25, 2015, 11:05:57 AM8/25/15
to DSpace Tech-List
What looks odd to me is the contrast between -s here:

> The command I run is:
> /dspace/bin/dsrun org.dspace.app.itemimport.ItemImport -a -e user@domain -c
> 123456789/10 -s /dspace/run -m /dspace/map/mapfile.txt

and this line in the output:

> Adding items from directory: /dspace/batch

I don't know offhand what would cause that -- why would DSpace
override -s? -- but that's where I would start troubleshooting. The
empty mapfile just means that DSpace didn't import any items, which
you already knew. Good luck!

Dorothea

--
Dorothea Salo ds...@library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493

Claudia Jürgen

unread,
Aug 25, 2015, 11:05:57 AM8/25/15
to upload, dspac...@lists.sourceforge.net
Hello,

you need one directory per item and the source directory given with -s
option is the directory where these item directories reside.

So if the elements of the item are really straight under /dspace/run,
your command should be:
/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport -a -e
user@domain -c 123456789/10 -s /dspace -m /dspace/map/mapfile.txt

your info differs as you say
"-s /dspace/run" is your command
and from the output it seems as if you used
"/dspace/batch"


hope that helps

Claudia



upload schrieb:

Dorothea Salo

unread,
Aug 25, 2015, 11:05:59 AM8/25/15
to dspac...@lists.sourceforge.net
On Tue, Aug 12, 2008 at 11:34 AM, upload <ki...@uci.edu> wrote:

> The command I run is:
> /dspace/bin/dsrun org.dspace.app.itemimport.ItemImport -a -e user@domain -c
> 123456789/10 -s /dspace/batch -m /dspace/mapfiles/mapfile.txt
>
> And nothing gets imported.

Well, following on Claudia's suggestion, let's look at your directory
structure. Your /dspace should have a folder "batch" in it something
like this:

batch
item1
contents
dublin-core.xml
item.pdf

Is that what you've got?

upload

unread,
Aug 25, 2015, 11:06:00 AM8/25/15
to dspac...@lists.sourceforge.net

Thanks for the response.
That was a typo, sorry.

The command I run is:
/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport -a -e user@domain -c
123456789/10 -s /dspace/batch -m /dspace/mapfiles/mapfile.txt

And nothing gets imported.


> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the
> world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>

--
View this message in context: http://www.nabble.com/Dspace-1.5-cannot-batch-import-tp18947640p18948003.html

Claudia Jürgen

unread,
Aug 25, 2015, 11:06:00 AM8/25/15
to upload, dspac...@lists.sourceforge.net
Hi,

does /dspace/batch contain

contents
dublin_core.xml
foo1.pdf
...
?
if so and if this is just for testing do

mkdir /dspace/testItemImport
cp -r /dspace/batch /dspace/itemimport


and then run
/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport -a -e user@domain
-c 123456789/10 -s /dspace/testItemImport -m /dspace/mapfiles/mapfile.txt

Claudia

upload schrieb:

Dorothea Salo

unread,
Aug 25, 2015, 11:06:02 AM8/25/15
to DSpace Tech-List
Wait, are you putting your items for upload into your DSpace source or
production directory? Or assetstore? I'm not sure, but it's sounding
like DSpace is getting very confused about file locations.

Probably not a good idea to do that, in any case. :) Items for upload
can go anywhere else on the server that the dspace user has read
access to (write access for the mapfiles directory). So, let's assume
that user kimsk has a directory on the server called /home/kimsk. Make
a directory "uploads" there, and make sure the dspace user can read
it. Should look like this:

/home/kimsk/uploads
item1
contents
dublin_core.xml
item.pdf

Then change the value of -s to /home/kimsk/uploads and let us know how it goes.

upload

unread,
Aug 25, 2015, 11:06:03 AM8/25/15
to dspac...@lists.sourceforge.net


Thank you. Now I'm getting some where.
So I've changed the -s part from the command from -s /dspace/batch to -s
/dspace/ and it generated more output. Do the files (dublin_core.sml,
contents, and the actual file that's being uploaded) need to be in
/dspace/search and /dsapce/lib ?:

Adding item from directory search
Loading dublin core from /dspace//search/dublin_core.xml

Adding item from directory lib
java.io.FileNotFoundException: /dspace/lib/dublin_core.xml (No such file or
directory)

java.io.FileNotFoundException: /dspace/lib/contents (No such file or
directory)
at java.io.FileInputStream.open(Native Method)


So I've copied the files (dublin_core.sml, contents, and the actual file
that's being uploaded) from /dspace/search/ to /dspace/lib so they would
exist in both of the directories /dspace/search and /dspace/lib
Then I ran the command again:

Adding item from directory search
Loading dublin core from /dspace//search/dublin_core.xml

Adding item from directory lib
Loading dublin core from /dspace//lib/dublin_core.xml


Adding item from directory etc.bak-20080701-153023
java.io.FileNotFoundException:
/dspace/etc.bak-20080701-153023/dublin_core.xml (No such file or directory)

java.io.FileNotFoundException:
/dspace/etc.bak-20080701-153023/dublin_core.xml (No such file or directory)



It's looking for /dspace/etc.bak-20080701-153023/dublin_core.xml...

Any suggestions?
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the
> world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>

--
View this message in context: http://www.nabble.com/Dspace-1.5-cannot-batch-import-tp18947640p18948289.html

Claudia Juergen

unread,
Aug 25, 2015, 11:06:03 AM8/25/15
to upload, dspac...@lists.sourceforge.net
Hello,

wait STOP. Do not copy anything around in your dspace installation
directory, it seems as if /dspace is your dspace installation directory.
Sorry did not think of this in my prior example.


To explain it a bit more verbose, for batch import as source you need a
directory. In this directory you need 1 directory per to be imported item
and NOTHING else.

Make a directory for import somewhere where you can play around, e.g.
/dspaceTesting/itemimport/TestImport1
Create a directory item1 there.
Copy your contents, dublin_core.xml and to be imported bitstreams there,
so the structure looks like
/dspaceTesting/itemimport/TestImport1
/dspaceTesting/itemimport/TestImport1/item01
/dspaceTesting/itemimport/TestImport1/item01/contents
/dspaceTesting/itemimport/TestImport1/item01/dublin_core.xml
/dspaceTesting/itemimport/TestImport1/item01/foo1.pdf
/dspaceTesting/itemimport/TestImport1/item01/foo2.pdf

foo1.pdf are just example filenames. Actually all files listed in the file
/dspaceTesting/itemimport/TestImport1/item01/contents should be there.

run your import command with
-s /dspaceTesting/itemimport/TestImport1

The item importer will go through
/dspaceTesting/itemimport/TestImport1 treating each directory as an item.

Claudia

Thomas A McGee

unread,
Aug 25, 2015, 11:06:10 AM8/25/15
to dspac...@lists.sourceforge.net

A couple ideas: Can you copy-and-paste what's in your dublin_core.xml and contents file, and post them here? Also, a quickie map of your directory structure, similar to what Dorothea has given as an example?

Have you tried running the batch import with the "--test" flag set? Put it at the end of the command-line string, and it's two dashes followed by "test".


_____________________
Tom McGee
Seton Hall University TLTC
973 761 9000 x5021

upload

unread,
Aug 25, 2015, 11:06:10 AM8/25/15
to dspac...@lists.sourceforge.net


Thanks Claudia and Dorothea! I got it to work now. It was the directory
structure that I was messing up on. The imported submissions all show up
now.

Is creating the contents and dublin_core.xml files manually to upload the
documents the bestway to do mass importing? We have a lot of documents to
import and were wondering if other people use the same method to do the
importing. I wonder if there's more automated way of doing the importing...

Thanks!
View this message in context: http://www.nabble.com/Dspace-1.5-cannot-batch-import-tp18947640p18950413.html

upload

unread,
Aug 25, 2015, 11:06:16 AM8/25/15
to dspac...@lists.sourceforge.net

Hello,

dublin_core.xml:
<dublin_core>
<dcvalue element="contributor" qualifier="author">Tom</dcvalue>
<dcvalue element="language" qualifier="iso">en</dcvalue>
<dcvalue element="subject" qualifier="none">Professor Invites
Students</dcvalue>
<dcvalue element="title" qualifier="none">oldUniversity</dcvalue>
<dcvalue element="type" qualifier="none">Vol.1 No.3</dcvalue>
</dublin_core>
~
~

contents:
U1_001.pdf bundle:ORIGINAL
U1_002.pdf bundle:ORIGINAL
U1_003.pdf bundle:ORIGINAL
U1_004.pdf bundle:ORIGINAL
license.txt bundle:LICENSE

Directory structure:
/home/uploads/item01
/home/uploads/item01/contents
/home/uploads/item01/dublin_core.xml
/home/uploads/item01/U1_001.pdf
/home/uploads/item01/U1_002.pdf
/home/uploads/item01/U1_003.pdf
/home/uploads/item01/U1_004.pdf

Running it with "--test":
Exception in thread "main" org.apache.commons.cli.MissingArgumentException:
no argument for:e
at org.apache.commons.cli.Parser.processArgs(Parser.java:239)
at org.apache.commons.cli.Parser.processOption(Parser.java:277)
at org.apache.commons.cli.Parser.parse(Parser.java:170)
at org.apache.commons.cli.Parser.parse(Parser.java:114)
at org.dspace.app.itemimport.ItemImport.main(ItemImport.java:171)
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the
> world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>

--
View this message in context: http://www.nabble.com/Dspace-1.5-cannot-batch-import-tp18947640p18953548.html

Claudia Jürgen

unread,
Aug 25, 2015, 11:06:21 AM8/25/15
to upload, dspac...@lists.sourceforge.net
Hi,

seem as if this time there is an error in the command.
Even with --test you need to supply the other arguments like eperson,
source, target ...

Claudia


upload schrieb:

Claudia Jürgen

unread,
Aug 25, 2015, 11:06:34 AM8/25/15
to upload, dspac...@lists.sourceforge.net
P. S. Here a handout from Texas Digital Library about batch import

https://www.tdl.org/documents/DSpaceBatchImportFormat.pdf

Claudia


upload schrieb:

Claudia Jürgen

unread,
Aug 25, 2015, 11:06:36 AM8/25/15
to upload, dspac...@lists.sourceforge.net
Hi,

great that it works now.

Creating batch imports by hand is not very efficient.

DSpace basically works on crosswalks and packagers.
There are many ways of crosswalking content. It largely depends on your
use case and your data. One distinction is whether you need it regularly
or not.

If you're just migrating one big chunk from a proprietary format you
might write your own skript to transform (correct, enrich) the metadata
and create the appropriate import structure.

If you're regularly importing items (e.g. from a catalogue, database)
might be worth creating an interface.

Hope that helps

Dorothea Salo

unread,
Aug 25, 2015, 11:06:37 AM8/25/15
to dspac...@lists.sourceforge.net
On Wed, Aug 13, 2008 at 8:13 AM, Mark H. Wood <mw...@iupui.edu> wrote:

> Sad to say, there are probably as many automated ways of building
> batches as there are DSpace sites. What you do will depend on the
> form in which you can get the data.

This is my experience too. I wrote a tiny Python library of
DSpace-automation-stuff (with classes for building a contents file, a
dublin_core.xml file, a mapfile [yes, that's rare, but it has
happened], breaking up a namelist from a citation or from HTML, and
parsing a name) that I remix as needed for new projects. (Next on the
list to add to it: better file/folder management, because I'm so
error-prone when I write that stuff...)

I can see that I'll have to rewrite a lot of this to create SWORD
packages instead. So be it; I think SWORD is a better way and I'll be
able to do more with it. (I got to talking with some people long ago
about drop boxes for the repository, and it just plain broke my brain,
how hard that was going to be. SWORD makes it a good deal more
feasible to write drop boxes and hands-off gateways, I think.)

For my sins, I do a lot of HTML screenscraping -- back issues of
e-periodicals, mostly. That's all ad-hoc, as no two e-periodicals have
the same HTML. It tends to be an 80/20 problem (give or take 10% based
on HTML quality and consistency); I can whack out most of the metadata
with regular expressions and my namelist/name parsers, but not all of
it. Information is often lurking in PDFs, which means handwork.

I say all this to (I hope) help people understand what the bounds
around what's feasible look like for untalented scripters.

Mark H. Wood

unread,
Aug 25, 2015, 11:12:40 AM8/25/15
to dspac...@lists.sourceforge.net
On Tue, Aug 12, 2008 at 11:59:40AM -0700, upload wrote:
> Is creating the contents and dublin_core.xml files manually to upload the
> documents the bestway to do mass importing? We have a lot of documents to
> import and were wondering if other people use the same method to do the
> importing. I wonder if there's more automated way of doing the importing...

Sad to say, there are probably as many automated ways of building
batches as there are DSpace sites. What you do will depend on the
form in which you can get the data.

For example, one of our DSpace instances has been blessed with many
outside agencies who are willing to contribute their collections.
Each one has its own way of cataloging its holdings. We've been
fortunate that most have agreed to a somewhat standard way of
delivering metadata in a spreadsheet, which a Librarian works over for
quality control and then exports to me as flat TSV records. I've
built -- well, I continue to build :-) -- a Perl script that turns a
directory full of document files and a TSV metadata file into a batch
directory tree. But that may not be the way you should build your
batches; the best form for you to receive the material may be quite
different.

--
Mark H. Wood, Lead System Programmer mw...@IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

Reply all
Reply to author
Forward
0 new messages