Error when exporting site because of ID "bitstream_2" already exists

150 views
Skip to first unread message

euler

unread,
Dec 12, 2015, 6:41:31 AM12/12/15
to DSpace Technical Support
Dear All,

I am doing a backup of the entire site using the instructions here: https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-ExportingEntireSite. The backup was not successful because of this error:

Exception: Error exporting METS for DSpace Object, type=ITEM, handle=10665.1/8822, dbID=8907 org.dspace.content.packager.PackageValidationException: Error exporting METS for DSpace Object, type=ITEM, handle=10665.1/8822, dbID=8907, Reason: edu.harvard.hul.ois.mets.helper.MetsException: ID "bitstream_2" already exists at org.dspace.content.packager.AbstractMETSDisseminator.disseminate(AbstractMETSDisseminator.java:280)

Searching for this error, I came across this thread: http://dspace.2283337.n4.nabble.com/Is-Bitstream-sequence-id-supposed-to-be-unique-per-Bundle-or-per-Item-tp3937626.html. Locating the problematic item, I found out that this item have 2 (identical) license bundles (see attachment). Deleting one license bundle and then doing the export again resolved this issue until another problematic item with the same error comes up again.

Is there a way to find all these problematic items with bundles that have duplicate sequence_id in their bitstreams? In the error stated above, I executed the SQL query mentioned in the thread that I referenced and below is the result:

 bundle_id | bitstream_id | sequence_id
-----------+--------------+-------------
     39844 |        46401 |           4
     39844 |        46400 |           3
     42083 |        53559 |          42
     42083 |        53558 |          41
     33805 |        39798 |           2
     46843 |        57468 |          44
     46843 |        57467 |          43
     33806 |        39799 |           2
(8 rows)

Note the duplicate sequence_id=2. I would like to find all items that have these duplicate sequence_id and delete all them.

Thanks in advance and regards,
euler.
license.PNG

Bill T

unread,
Dec 14, 2015, 2:11:06 PM12/14/15
to DSpace Technical Support
Here's one that I used to get a list of items containing duplicate bitstreams that seemed to do the trick:


SELECT h.handle, COUNT(b.sequence_id)

FROM handle h, item2bundle i2b, bundle2bitstream b2b, bitstream b

WHERE h.resource_type_id = 2

AND h.resource_id = i2b.item_id

AND i2b.bundle_id = b2b.bundle_id

AND b2b.bitstream_id = b.bitstream_id

GROUP BY h.handle, b.sequence_id

HAVING COUNT(b.sequence_id) > 1

ORDER BY h.handle;


    handle    

--------------

 12345/108274

 12345/108275

 12345/108280

 12345/108299

... etc


See if that works for you as well!

Cheers!

Bill

euler

unread,
Dec 15, 2015, 4:15:49 AM12/15/15
to DSpace Technical Support
Hi Bill,

Thank you so much for this. I can now locate and delete the duplicate bitstreams without doing the export and then wait for the error to appear.

Thanks again.

Cheers!

euler
Reply all
Reply to author
Forward
0 new messages