Metadata Import of CSV file stops when a field in the CSV file with new data matches a record where the corresponding field in the DSpace record contains blank/null data

101 views
Skip to first unread message

Bouchard, Kerry

unread,
Feb 10, 2021, 2:40:24 PM2/10/21
to dspac...@googlegroups.com, Bouchard, Kerry

 

We’ve been having problems with the Metadata Import function sometimes either stopping after a few records or sending a “No changes detected” message. I think I’ve narrowed the problem to the following circumstance:

·         One or more records in the CSV file being imported contains a new value for a field that in the existing DSpace metadata record is present, but contains a blank/null value.

·         For that record, the new value in the corresponding CSV file is the only field that has a changed value.

·         If the record is the first row in the CSV file, we will then get a “No changes detected” message. If it occurs further down in the file, the records above will process correctly, but then processing stops with the problem row.

If the metadata field doesn’t exist in the DSpace record at all, then there’s not a problem – the import will show that it is Adding the new field and value, without Removing the original value (since it doesn’t exist). But if the field is present in the record but contains blank/null data, processing stops with that record.

 

I cannot find any ERROR lines in the DSpace log file that appear to correspond to this. Below are DEBUG lines that I *think* correspond to the last test I ran, where the row in the CSV file contains a new value (“test”) for the dc.rights.license field, which exists in the matching DSpace record but contains blank data:

2021-02-10 11:22:46,067 DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT * FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER BY metadata_field_id, place"  with parameters: 26629,2

2021-02-10 11:22:46,067 DEBUG org.dspace.app.bulkedit.MetadataImport @ k.bou...@tcu.edu:session_id=59CE8F97786E6F463ABBE0EBD1F95BCA:ip_addr=127.0.0.1:metadata_import:item_id=26629,fromCSV=test,

2021-02-10 11:22:46,067 DEBUG org.dspace.app.bulkedit.MetadataImport @ k.bou...@tcu.edu:session_id=59CE8F97786E6F463ABBE0EBD1F95BCA:ip_addr=127.0.0.1:metadata_import:item_id=26629,fromCSV=test,,looking_for_schema=dc,looking_for_element=rights,looking_for_qualifier=license,looking_for_language=null

2021-02-10 11:22:46,067 DEBUG org.dspace.app.bulkedit.MetadataImport @ k.bou...@tcu.edu:session_id=59CE8F97786E6F463ABBE0EBD1F95BCA:ip_addr=127.0.0.1:metadata_import:item_id=26629,fromCSV=test,,found=null

2021-02-10 11:22:46,067 DEBUG org.dspace.app.xmlui.aspect.administrative.FlowMetadataImportUtils @ k.bou...@tcu.edu:session_id=59CE8F97786E6F463ABBE0EBD1F95BCA:ip_addr=127.0.0.1:metadataimport:1 items with changes identified

 

(I am new to DSpace support, and the log in DEBUG mode outputs an overwhelming amount of data, so I could easily be missing something.)

 

Is this a known issue? I could not find any messages in the DSpace Issue Tracker that seemed to match this circumstance.

 

Thanks, Kerry

 

cid:F2F056B5-F514-4903-AFF0-0089CBD328C7

 

Kerry Bouchard

DIRECTOR OF LIBRARY SYSTEMS

TCU LIBRARY

TCU BOX 298400

FORT WORTH, TX 76129

817-257-6809

k.bou...@tcu.edu

 

 

Kelley Canon

unread,
Feb 10, 2021, 3:19:49 PM2/10/21
to Bouchard, Kerry, dspac...@googlegroups.com
Hi Kerry -

I would recommend first looking at the column headers in row 1 of your CSV file.  Is ID the first column header in column A?  Is any column missing a valid metadata element name?  Do you perhaps have one or more blank columns included after your metadata columns, to the right of your data?

If I'm understanding correctly what you describe in your first bullet point, I think dSpace will simply overwrite the null data with the new data included in your CSV.  Feel free to attach a sample if you're not able to resolve these issues.

From down the road in Dallas,
Kelley Canon
Language & Culture Archives
REAP Administrator
SIL International


--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/5c10d8751b444045b04eae2521b6f06e%40tcu.edu.

Bouchard, Kerry

unread,
Feb 12, 2021, 5:29:12 PM2/12/21
to Kelley Canon, dspac...@googlegroups.com

           

            Kelly,

            Thanks. I would have thought that if any of the column headers were wrong, none of the records would import, but from what I can see the problem only occurs when one of the DSpace records has an empty field.

            Attached is a spreadsheet with four records, where the dc.license.rights field in the first, second, and fourth records in DSpace have the value “Not blank”:

 

            And the third record has empty data in the dc.rights.license field:

            When I try to import the CSV file, it sees the changes on the first and second record, but not the next two:

            After that it just stops and doesn’t recognize that changes have been made in the two remaining spreadsheet records.

 

                        Thanks, Kerry

Bond_20210210_original.csv

Kelley Canon

unread,
Feb 15, 2021, 9:42:19 AM2/15/21
to Bouchard, Kerry, dspac...@googlegroups.com
Kerry -

I looked at your data and I don't see any problems.  dSpace should just overwrite the empty field with what's in your import file unless there is a config parameter that keeps it from doing that.  I'm not aware of anything like that but am not intimately familiar with all the config options.

I'd be curious to know if it would simply create 4 NEW records for you.  A "+" in the id column will cause new items to be created, and you'd need to add a "collection" column. 

Or change the order of the items in your import file and see if you get different results.  Maybe put the one with the empty dc.rights.license field first and see what happens.

Kelley Canon
Language & Culture Archives
REAP Administrator
SIL International

Sean Kalynuk

unread,
Feb 17, 2021, 11:43:30 AM2/17/21
to Bouchard, Kerry, dspac...@googlegroups.com

Hi Kerry,

 

Are you only importing the CSV file from the user interface? I wonder what would happen if you tried importing the metadata using the command-line tool. Since it’s the same core code that’s used, I’m guessing that the merge logic won’t be any different, but the method for transferring the CSV file to the server via the user interface is different than if you transferred the file yourself and used the command-line tool.

 

For your earlier tests with debugging turned on, see if there are lines in the log files that have the keyword metadata_import. (Note that I’m basing this on my knowledge of DSpace 6.3. If you have a much older version, the log entries might be different.) That information might give you a clue as to what is going on.

 

--

Sean

 

From: dspac...@googlegroups.com <dspac...@googlegroups.com> on behalf of Kelley Canon <kelley...@sil.org>
Date: Monday, February 15, 2021 at 8:42 AM
To: Bouchard, Kerry <k.bou...@tcu.edu>
Cc: dspac...@googlegroups.com <dspac...@googlegroups.com>
Subject: Re: [dspace-tech] Metadata Import of CSV file stops when a field in the CSV file with new data matches a record where the corresponding field in the DSpace record contains blank/null data

Caution: This message was sent from outside the University of Manitoba.

 

Kerry Bouchard

DIRECTOR OF LIBRARY SYSTEMS

TCU LIBRARY

TCU BOX 298400

FORT WORTH, TX 76129

817-257-6809

k.bou...@tcu.edu

 

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/5c10d8751b444045b04eae2521b6f06e%40tcu.edu.

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

Kerry Bouchard

unread,
Feb 22, 2021, 3:29:32 PM2/22/21
to DSpace Technical Support
Thanks for the suggested tests. I've confirmed that the same problem happens when I try the import from the command line rather than the web interface. And rearranging the order of columns in the spreadsheets doesn't make any difference. I can import the same data as new records, but that just confirms that the problem is triggered by empty metadata fields in existing records. 

Is there a process for reporting an issue like this as a potential bug?

        Thanks, Kerry

Sean Kalynuk

unread,
Feb 22, 2021, 4:14:11 PM2/22/21
to Kerry Bouchard, DSpace Technical Support

Hi Kerry,

 

Please consult the support page, specifically the part that describes what to do if you’ve found a bug.

 

https://wiki.lyrasis.org/display/DSPACE/Support

 

--

Sean

José Geraldo

unread,
Feb 23, 2021, 9:55:54 AM2/23/21
to Sean Kalynuk, Kerry Bouchard, DSpace Technical Support
Hi,

Check the formatting of the file, such as spaces at the end of lines and lines left at the end of the file, as below:

image.png

I ran a test on the Dspace Demo 6.3 (https://demo.dspace.org/jspui/), as the attached files, and it worked perfectly, adding a value to an empty metadata or changing an existing value.



--
At.te,

José Geraldo

Bond_20210210_original.csv
dspace-import.csv

Bouchard, Kerry

unread,
Feb 23, 2021, 3:01:37 PM2/23/21
to José Geraldo, Sean Kalynuk, DSpace Technical Support

 

       Thank you. I had not tested in the DSpace Demo site earlier. I did that today, and what I notice is that after importing records from the sample file, and then editing them to change some of the metadata fields to contain blank data, instead of saving the metadata field to the record with empty data, DSpace removed the metadata tag completely – the field no longer appears in the detailed display or edit screen. I had no problem reimporting an edited spreadsheet where there was data in fields that had been removed from the corresponding record. (On our DSpace v5.9 instance, we also don’t have a problem if the existing DSpace record doesn’t have the attached field at all; the problem only occurs if the field is present in the record, but empty, per the source.uri and rights.license examples in the screen shot below.)

        So it looks like the problem is not the metadata import function per se, but that somehow our DSpace v5.9 instance has ended with records where there are attached metadata fields that don’t contain any data. It looks like the software isn’t designed to work with that circumstance.

        So, I’m thinking I should be able to clear this up on our site by deleting records in the Oracle [metadatavalue] table that contain blank or null data.

        

        Thanks for the help.

 

                 -Kerry

 

cid:F2F056B5-F514-4903-AFF0-0089CBD328C7

 

Kerry Bouchard

DIRECTOR OF LIBRARY SYSTEMS

TCU LIBRARY

TCU BOX 298400

FORT WORTH, TX 76129

817-257-6809

k.bou...@tcu.edu

 

 

 

From: dspac...@googlegroups.com <dspac...@googlegroups.com> On Behalf Of José Geraldo
Sent: Tuesday, February 23, 2021 8:56 AM
To: Sean Kalynuk <Sean.K...@umanitoba.ca>
Cc: Bouchard, Kerry <k.bou...@tcu.edu>; DSpace Technical Support <dspac...@googlegroups.com>
Subject: Re: [dspace-tech] Metadata Import of CSV file stops when a field in the CSV file with new data matches a record where the corresponding field in the DSpace record contains blank/null data

 

[EXTERNAL EMAIL WARNING] DO NOT CLICK LINKS or open attachments unless you recognize the sender and know the content is safe.

You received this message because you are subscribed to a topic in the Google Groups "DSpace Technical Support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dspace-tech/ODu76pfD2OM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/CANP4ikRuaHuAu72-%3DT0UM3hWo6YGMLh6N%3DmGnADecBiZe4BSFA%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages