[Dspace-tech] Bulk Import using CSV files

25 views
Skip to first unread message

Andy Kirkyla

unread,
Aug 26, 2015, 1:21:33 PM8/26/15
to dspac...@lists.sourceforge.net
Folks,

     I trust all is well.

    I want to thank everyone on the list for their willingness to answer questions. 

    I have a few questions concerning bulk imports. We would like to allow users to import data using excel csv files as templates and I am wondering how to handle the following:

     1) How do you ensure that a valid item ID is used; or does Dspace handle it.
     2) How do I make sure that all fields are included; when I exported the data I noticed that item type specific data was missing. How do I ensure that the data is properly imported.

     Thanks is advance for all your help.

Andy 

Terry Brady

unread,
Aug 26, 2015, 1:21:34 PM8/26/15
to Andy Kirkyla, dspac...@lists.sourceforge.net
Andy,

I presume from your note that you are only importing metadata not metadata and bitstreams.

The metadata import tool is very flexible and very permissive.  I believe you would need to write your own validation routine to enforce required fields or the format for specific fields.  Fortunately, the report that the metadata import tool presents is informative highlighting exactly what will change.  I presume that the report would flag an invalid item id.


We run a fair number of bulk imports (metadata + bitstreams), and we run a validation to check for the existence of a title and a creation date before submitting the metadata.

We have had some inconsistency in the way that we tag the language (i.e "en_US" vs "en") between manual ingest and bulk ingest.  This has minimal impact to the DSpace user interface, but it can be problematic when editing metadata in bulk. 

Terry


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck®
Code Sight™ - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
DSpace-tech mailing list
DSpac...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology

Hilton Gibson

unread,
Aug 26, 2015, 1:21:37 PM8/26/15
to Terry Brady, dspac...@lists.sourceforge.net

Hilton Gibson
Ubuntu Linux Systems Administrator
JS Gericke Library
Room 1025D
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa



Code Sight - the same software that powers the world's largest code

Terry Brady

unread,
Aug 26, 2015, 1:21:39 PM8/26/15
to Andy Kirkyla, dspac...@lists.sourceforge.net
I have not seen an error like that before, and I do not use JSPUI.

Generally, I only edit dc elements via the bulk metadata process.  If you run an import excluding the elements in your custom schema, does everything work OK?

Is is possible that your custom namespace is not being recognized on import?

Terry


On Mon, Jul 14, 2014 at 2:36 PM, Andy Kirkyla <an...@bridgit.com> wrote:
Dear Hilton and Terry,

       I thank you so much for your quick responses. 

       I have been able to import the data however when I try to view it I get an internal server error. I looked in the logs an saw the following error: 

2014-07-14 17:22:05,431 WARN  org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=25DCE935E931FD060080E5C0E3D0D44B:internal_error:-- URL Was: http://localhost:8080/jspui/handle/123456789/38/simple-search?filterquery=123456789%2F43&filtername=author&filtertype=equals
-- Method: GET
-- Parameters were:
-- filtertype: "equals"
-- filtername: "author"
-- filterquery: "123456789/43"

org.apache.jasper.JasperException: java.lang.NullPointerException
at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:549)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:470)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

     I believe that the issue is that I have a schema called 'bridgitterms' that is causing all the dc terms to be shifted down. Should I place my schema specific terms at the end of my file?

      Thanks again for all your help.

Andy
 

Andy Kirkyla

unread,
Aug 26, 2015, 1:21:39 PM8/26/15
to Hilton Gibson, dspac...@lists.sourceforge.net
Dear Hilton and Terry,

       I thank you so much for your quick responses. 

       I have been able to import the data however when I try to view it I get an internal server error. I looked in the logs an saw the following error: 

2014-07-14 17:22:05,431 WARN  org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=25DCE935E931FD060080E5C0E3D0D44B:internal_error:-- URL Was: http://localhost:8080/jspui/handle/123456789/38/simple-search?filterquery=123456789%2F43&filtername=author&filtertype=equals
-- Method: GET
-- Parameters were:
-- filtertype: "equals"
-- filtername: "author"
-- filterquery: "123456789/43"

org.apache.jasper.JasperException: java.lang.NullPointerException
at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:549)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:470)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

     I believe that the issue is that I have a schema called 'bridgitterms' that is causing all the dc terms to be shifted down. Should I place my schema specific terms at the end of my file?

      Thanks again for all your help.

Andy
 


On Mon, Jul 14, 2014 at 4:46 PM, Hilton Gibson <hilton...@gmail.com> wrote:

helix84

unread,
Aug 26, 2015, 1:21:40 PM8/26/15
to Andy Kirkyla, dspac...@lists.sourceforge.net
> 1) How do you ensure that a valid item ID is used; or does Dspace
> handle it.

You have to put "+" in the "id" column for new items. A value in the
"collection" column is mandatory (collection handle).

> 2) How do I make sure that all fields are included; when I exported the
> data I noticed that item type specific data was missing. How do I ensure
> that the data is properly imported.

What exactly do you mean by item type specific data?

> I have been able to import the data however when I try to view it I
> get an internal server error. I looked in the logs an saw the following
> error:

We'll need the full trace up to and including the "Caused by:" line.

> I believe that the issue is that I have a schema called 'bridgitterms'
> that is causing all the dc terms to be shifted down. Should I place my
> schema specific terms at the end of my file?

A custom schema with CSV import works just like dc. The only
requirement is that the schema has to exist in the schema registry
before you attempt to import it.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Hilton Gibson

unread,
Aug 26, 2015, 1:21:41 PM8/26/15
to Ivan Masár, dspac...@lists.sourceforge.net

On 15 July 2014 09:15, helix84 <hel...@centrum.sk> wrote:
You have to put "+" in the "id" column for new items. A value in the
"collection" column is mandatory (collection handle).

​Are there any other "id instructors" bedsides "+"?

Cheers

helix84

unread,
Aug 26, 2015, 1:21:43 PM8/26/15
to Hilton Gibson, dspac...@lists.sourceforge.net
On Tue, Jul 15, 2014 at 9:23 AM, Hilton Gibson <hilton...@gmail.com> wrote:
>
> On 15 July 2014 09:15, helix84 <hel...@centrum.sk> wrote:
>>
>> You have to put "+" in the "id" column for new items. A value in the
>> "collection" column is mandatory (collection handle).
>
>
> Are there any other "id instructors" bedsides "+"?

No [1], but you may want to look at the "action" column [2].

[1] https://github.com/DSpace/DSpace/blob/dspace-4_x/dspace-api/src/main/java/org/dspace/app/bulkedit/DSpaceCSV.java#L525
[2] https://wiki.duraspace.org/display/DSDOC4x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems

Andy Kirkyla

unread,
Aug 26, 2015, 1:21:51 PM8/26/15
to hel...@centrum.sk, dspac...@lists.sourceforge.net
Folks,

      I trust all is well. Thank you so much for all your help so far. 

      Per Terry's suggestion I have removed all custom schema details from the import file.

      I have been able to import the file; however when I view the record the data is displayed in different fields; for example the 'Author' field displays the id value ('+') and the Author field displays the Collection Value (123456789/43). Attached is the test CSV fille that I am using. Is there a place that I need to set the order of the import fields.

      Thanks gain for all you help.

Andy      
      

test-import-short.csv

Hilton Gibson

unread,
Aug 26, 2015, 1:21:53 PM8/26/15
to Andy Kirkyla, dspac...@lists.sourceforge.net
​​Hi Andy

Try to import the attached.

Change all instances of "collection" with destination collection handle for the items.

Cheers

hg

Hilton Gibson
Ubuntu Linux Systems Administrator
JS Gericke Library
Room 1025D
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa



ro-1999.csv

Terry Brady

unread,
Aug 26, 2015, 1:21:56 PM8/26/15
to Hilton Gibson, dspac...@lists.sourceforge.net
Andy,

I imported your file with the import metadata tool.  I had to make the collection handle valid.

Once I updated the collection, your file imported successfully for me and the fields appear to have mapped properly.

Terry

Inline image 1

Inline image 2


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
DSpace-tech mailing list
DSpac...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Andy Kirkyla

unread,
Aug 26, 2015, 1:21:58 PM8/26/15
to Terry Brady, dspac...@lists.sourceforge.net
Dear Terry and Hilton,

      Thank you so much for all your help so far.

      I am new to dspace (as if you could not tell) can you tell me what I need to do to make sure that I am using a proper collection id.

      Thanks again for all your help.

Andy

Terry Brady

unread,
Aug 26, 2015, 1:21:59 PM8/26/15
to Andy Kirkyla, dspac...@lists.sourceforge.net
In your browser, navigate to your collection.  The collection handle follows "handle/".

Example

The collection handle will be 99999/559401.

Note: community, collection, and item handles each have the same format.

Terry
Reply all
Reply to author
Forward
0 new messages