Importing/Registering item links as bitstream in Dspace 7.3

47 views
Skip to first unread message

M P

unread,
Oct 1, 2022, 3:47:48 AM10/1/22
to DSpace Technical Support
Greetings!

I am having difficulty incorporating our thesis holdings into DSpace when harvesting from our existing Koha ILS catalogue.  I am a little unsure about the best way to go about it, as it seems DSpace has several different ways of incorporating items.

Here is what we have:
  1. All theses catalogued in Koha
  2. Thesis holdings are converted to DublinCore and available via OAI-PMH (https://library.sats.ac.za/cgi-bin/koha/oai.pl?verb=ListSets)
  3. DSpace successfully harvests the data from the above OAI server
  4. Items appear as they should in each of the configured collections (https://research.sats.ac.za/communities/5a2b008a-e0ec-4470-81a2-3f2c8d1277c0)
  5. The URL to the actual thesis shows up in DSpace as dc.identifier in the Full Item Page.  These urls refer to the PDF of the actual thesis stored in S3. (e.g. https://research.sats.ac.za/items/cb8ff81c-3a33-450f-a9a5-39101afeda70/full)
MY QUESTION:
What is the best way to incorporate the stored thesis PDF into Dspace? Do I harvest with ORE, do I "register", do I import? I am unclear on how these differ and what the end result would be.  I would like for the link to the thesis to show on the primary item page and for the thesis to be indexed by solr so we can take advantage of the full-text indexing.  Furthermore, if the url to the thesis is already available via the metadata, why can't Dspace simply incorporate that as part of the harvest process?

Thanks for the help!

Mark H. Wood

unread,
Oct 3, 2022, 10:43:00 AM10/3/22
to dspac...@googlegroups.com
On Sat, Oct 01, 2022 at 12:47:48AM -0700, M P wrote:
> I am having difficulty incorporating our thesis holdings into DSpace when
> harvesting from our existing Koha ILS catalogue. I am a little unsure
> about the best way to go about it, as it seems DSpace has several different
> ways of incorporating items.
>
> Here is what we have:
>
> 1. All theses catalogued in Koha
> 2. Thesis holdings are converted to DublinCore and available via OAI-PMH
> (https://library.sats.ac.za/cgi-bin/koha/oai.pl?verb=ListSets)
> 3. DSpace successfully harvests the data from the above OAI server
> 4. Items appear as they should in each of the configured collections (
> https://research.sats.ac.za/communities/5a2b008a-e0ec-4470-81a2-3f2c8d1277c0)
> 5. The URL to the actual thesis shows up in DSpace as dc.identifier in
> the Full Item Page. These urls refer to the PDF of the actual thesis
> stored in S3. (e.g.
> https://research.sats.ac.za/items/cb8ff81c-3a33-450f-a9a5-39101afeda70/full)
>
> MY QUESTION:
> What is the best way to incorporate the stored thesis PDF into Dspace? Do I
> harvest with ORE, do I "register", do I import? I am unclear on how these
> differ and what the end result would be. I would like for the link to the
> thesis to show on the primary item page and for the thesis to be indexed by
> solr so we can take advantage of the full-text indexing. Furthermore, if
> the url to the thesis is already available via the metadata, why can't
> Dspace simply incorporate that as part of the harvest process?

Just brainstorming:

I've never tried ORE so I can't comment on that, except to say that it
sounds like the simplest approach if it works for you.

To import an item, you need to make up a complete ingestion packet
with metadata and content, and provide it to DSpace. This sounds
duplicative, if you have already harvested the metadata.

Registration is usually used with very large files which are already
available on the server. Otherwise it is like import. It just avoids
the copying of content into the assetstore.

Having your content in an S3 store raises an intriguing possibility
that I don't think is fully implemented. DSpace does have code to
talk to S3, but I don't think this was ever incorporated into the
registration code. Whether there is any value in this idea depends on
your interest in creating some glue code and whether you want DSpace
to have an independent copy of the content.

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu
signature.asc

M P

unread,
Oct 4, 2022, 8:39:13 AM10/4/22
to DSpace Technical Support
Thanks for the rumination, Mark. Perhaps I have stumbled upon a use case the devs did not envision, but with so many institutions using both Koha and DSpace, it seems odd to me that others would not be doing something similar.  I wouldn't think our use case is that novel.  

We want to have our thesis holdings available in Koha so that those results show up at the outset of the research journey and institutional outputs can be used in future research without the need for students to perform multiple searches in several platforms. To that end, we catalogue our theses in Koha with item data in 952$u or 856$u.  These point to an s3 bucket where we store all theses.  From there we use the conversion stylesheet so that the metadata is available in the OAI server where DSpace can harvest it.  Once it is setup, it works with minimal effort, although I haven't yet gotten the DSpace ORE client to work- two steps forward one step back, as they say.  I'll see if I can crack this ORE issue and then import the bitstream contents.

Thanks again for the response!
Reply all
Reply to author
Forward
0 new messages