I'm trying to retrieve the extracted text bitstream associated with items. Is there a way to get a list of them from the database?
So far, I've only been able to generate a list of all bitstreams with:
SELECT i.item_id, last_modified, owning_collection, internal_id, t.text_value AS title
FROM item i
JOIN item2bundle i2b
ON i.item_id = i2b.item_id
JOIN bundle2bitstream b2b
ON b2b.bundle_id = i2b.bundle_id
JOIN bitstream b
ON b.bitstream_id = b2b.bitstream_id
JOIN metadatavalue d
ON d.resource_id = i.item_id
JOIN metadatavalue t
ON t.resource_id = i.item_id
WHERE in_archive = 't' AND withdrawn = 'f' AND discoverable = 't'
AND d.metadata_field_id = 11 AND d.text_value >= '2021-01' AND d.text_value < '2021-12'
AND t.metadata_field_id = 64
ORDER BY owning_collection
That gives me a list including the internal_id, which I can use to determine where the file is in the assetstore:
77274565375792968793874045792320511138 = /dspace/assetstore/77/27/45/77274565375792968793874045792320511138
But I've noticed some gaps, like id 4117, which has both a PDF and an extracted text bitstream, but in the assetstore, there's only the PDF in that directory:
$ ls /dspace/assetstore/77/27/45/
77274565375792968793874045792320511138
How can I determine the location of the associated text extract bitstream for that item?
Sean
DSpace version: CRIS-5.10.0-SNAPSHOT
SCM revision: 67e7d010e7eda86925980b2a43581b9d4f4929a3
SCM branch: dspace-5_x_x-cris
OS: Linux(amd64) version 4.4.0-210-generic
Applications:
Discovery: enabled.
JRE: Private Build version 1.8.0_292
Ant version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018
Maven version: 3.3.9