Import errors with dc.subject metadata related to <srsc> assigned in submission form?

52 views
Skip to first unread message

Julia Gilmore

unread,
May 6, 2025, 4:33:27 PM5/6/25
to DSpace Technical Support
Hi all, 

We've received notice of failed import processes when attempting to do some metadata cleaning and we suspect it may be related to the use of <srsc> as the controlled vocabulary for the dc.subject field.

Here are the steps performed: 

1. Bulk metadata export from a collection through the UI
2. Metadata for dc.subject is cleaned (e.g., adding 'Organism biology')
3. CSV is reimported but the process fails

Error that displays in the Processes output log: 

2025-04-25 10:10:06.890 INFO metadata-import - 27 @ The script has started 2025-04-25 10:10:07.291 ERROR metadata-import - 27 @ For input string: "Organism biology" 2025-04-25 10:10:07.294 ERROR metadata-import - 27 @ java.lang.NumberFormatException: For input string: "Organism biology" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:652) at java.base/java.lang.Integer.valueOf(Integer.java:983) at org.dspace.app.bulkedit.MetadataImport.compareAndUpdate(MetadataImport.java:748) at org.dspace.app.bulkedit.MetadataImport.runImport(MetadataImport.java:415) at org.dspace.app.bulkedit.MetadataImport.internalRun(MetadataImport.java:217) at org.dspace.scripts.DSpaceRunnable.run(DSpaceRunnable.java:150) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

If we try using the srsc format: NATURAL SCIENCES::Biology::Organism biology in the csv, the import completes but only 'NATURAL SCIENCES'  gets added as a value for the field. 

Do we need to remove <vocabulary>srsc</vocabulary> in the submission form in order to import a free-text value? If yes, this is different from our expectation as our understanding was that submission forms do not impact metadata import processes, as long as the corresponding elements are defined in the metadata registry or in the DSpace configuration (e.g., embargo terms). 

Any guidance would be appreciated :)

Thanks!

Julia

DSpace Technical Support

unread,
May 15, 2025, 5:35:17 PM5/15/25
to DSpace Technical Support
Hi Julia,

I believe you are hitting this bug: https://github.com/DSpace/DSpace/issues/9896  But, it sounds like you are hitting that same issue on a different metadata field, in this case the subject.

Essentially, there's a known issue that if you include "::" (double colon) in a metadata field, then DSpace will try to parse that value as an Authority value, which can cause errors to occur if it's *not* an Authority value.

I'm still looking for a volunteer to tackle that issue.

Tim

Julia Gilmore

unread,
May 30, 2025, 11:32:20 AM5/30/25
to DSpace Technical Support
Hi Tim, 

Thanks for alerting us to this bug and this note about conditional behaviour with the double colon. 

In our first test, which just used the free text string 'Organism biology', we were also not able to import the metadata successfully, so it may be that we are encountering two different errors?

Thanks

Julia

DSpace Technical Support

unread,
May 30, 2025, 1:19:43 PM5/30/25
to DSpace Technical Support
Hi Julia,

I'm a bit confused how you would have gotten that error with the free text string "Organism biology".  Based on the error stacktrace, it looks like that NumberFormatException occurs on line 748 of the MetadataImport class: https://github.com/DSpace/DSpace/blob/dspace-8_x/dspace-api/src/main/java/org/dspace/app/bulkedit/MetadataImport.java#L748

That line of code is *assuming* that the string passed into it is an integer *because* the string contained a double colon ("::").  The if/else clause just above it is checking for the existence of the "csv.getAuthoritySeparator()" which is that double colon.

So, based on my reading of that code, it seems like you should only be able to get that NumberFormatException if the original string included a double colon (and was not able to be parsed properly as an authority value).  That's why I said it sounds like the same error that was reported here: https://github.com/DSpace/DSpace/issues/9896

I'd recommend double checking that error message *does* occur when you just use "Organism biology".  I'm not sure that's the case, just cause that bit of code doesn't seem possible to reach unless you have a double colon in the string.

All that said, it's also *entirely possible* I'm missing something here.  I haven't tested this myself, and this analysis is just based on my reading the error message & looking at the code it says is throwing the error.  You could also try and see if you can reproduce this error on our demo site (https://demo.dspace.org), as that might give us more information as well.

Tim
Reply all
Reply to author
Forward
0 new messages