Bagit Re-Ingest or Restore Script?

45 views
Skip to first unread message

mcf...@grinnell.edu

unread,
Aug 9, 2017, 11:33:33 AM8/9/17
to islandora
Forgive me for not conducting a thorough search before posting, but I wonder if anyone can recommend a script or process aimed at re-ingesting an object from a Bagit container back to a Fedora repository?  Basically I'm trying to craft a "restore" or "migrate" script to help with object recovery from cold storage where the object datastreams are in bags (bagit containers).  Thanks in advance for any suggestions.

-Mark M.

Mark Jordan

unread,
Aug 9, 2017, 12:02:47 PM8/9/17
to isla...@googlegroups.com
Hi Mark,

I'm currently on vacation so will respond in more detail next week. There is a module that will batch ingest objects from Bags, but as far as I know it doesn't confirm datastream checksums (which you would want to do to ensure an integral ingest). I've been doing some work lately with Bags (https://github.com/mjordan/islandora_fetch_bags) and also with checksum verification on ingest with https://github.com/mjordan/islandora_rest_ingester that might be relevant. Also, ingesting objects and retaining their PIDs, and relationships to other objects, can get a bit complex. I'd very much like to hear more about your use cases and intended use of the capability to restore from  Bags.

Mark

On 9 Aug 2017 12:33 p.m., mcf...@grinnell.edu wrote:
Forgive me for not conducting a thorough search before posting, but I wonder if anyone can recommend a script or process aimed at re-ingesting an object from a Bagit container back to a Fedora repository?  Basically I'm trying to craft a "restore" or "migrate" script to help with object recovery from cold storage where the object datastreams are in bags (bagit containers).  Thanks in advance for any suggestions.

-Mark M.

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/5935ba7a-ae62-4661-92bf-ed8ffaa61e80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dp...@metro.org

unread,
Aug 9, 2017, 1:54:27 PM8/9/17
to islandora
Mark, wow (you have me in that constantly Wow state). Duplicating efforts here. Will clone yours and see how I can help there in yours. Thanks a lot for sharing

Diego

mcf...@grinnell.edu

unread,
Oct 4, 2017, 11:25:09 AM10/4/17
to islandora
Good morning all.  I'm back at looking into this (with some urgency this time) and I'm taking Mark's https://github.com/mjordan/islandora_rest_ingester for a spin.  Unfortunately I've had no luck thus far.  Just now I attempted to create a new object from a set of datastreams saved by Bagit.  The command and output follow...

~/islandora_rest_ingester$ php ingest.php -l mylog.log -m islandora:sp_pdf -p grinnell:student-scholarship -n test -o fedoraAdmin -u fedoraAdmin -t XXXXXXX /archive/grinnell_bags/Bag-grinnell_99

~/islandora_rest_ingester$ cat mylog.log
[2017-10-04 09:56:20] Ingest via REST.INFO: ingest.php (endpoint http://localhost/islandora/rest/v1) started at October 4, 2017, 9:56 am [] []
[2017-10-04 09:56:20] Ingest via REST.WARNING: /archive/grinnell_bags/Bag-grinnell_99/manifest-sha1.txt appears to be empty, skipping. [] []
[2017-10-04 09:56:20] Ingest via REST.WARNING: /archive/grinnell_bags/Bag-grinnell_99/bagit.txt appears to be empty, skipping. [] []
[2017-10-04 09:56:20] Ingest via REST.WARNING: /archive/grinnell_bags/Bag-grinnell_99/tagmanifest-sha1.txt appears to be empty, skipping. [] []
[2017-10-04 09:59:48] Ingest via REST.INFO: Object  ingested from /archive/grinnell_bags/Bag-grinnell_99/data [] []
[2017-10-04 09:59:49] Ingest via REST.INFO: Object  datastream ADMIN_COVERSHEET ingested from /archive/grinnell_bags/Bag-grinnell_99/data/ADMIN_COVERSHEET.html [] []

Watching this work in my debugger shows me that the REST request returns a status 200, which I presume is "OK", but the response_body has no 'pid', hence the empty/null PID reference in the log above.

I checked my catalina.out log and found only numerous instances of this...

ClientAbortException:  java.net.SocketException: Connection reset
    at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)
    at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:368)
    at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392)
    at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381)
    at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)
    at org.apache.cxf.helpers.IOUtils.copy(IOUtils.java:160)
    at org.apache.cxf.helpers.IOUtils.copy(IOUtils.java:104)

Can anyone suggest what/where to look for the root cause and a solution?  Thanks.

Mark Jordan

unread,
Oct 4, 2017, 11:33:57 AM10/4/17
to isla...@googlegroups.com
Mark,

I'm currently at a workshop so can't focus on this, but I suspect that the Bagit manifests are confusing the ingester. Can you remove everything but the datastream files from the input directories and try that? I actually hadn't thought about having the REST ingester handle Bag ingest, but that's an awesome idea - I'm even generating checksums to confirm that the ingested datastream files are intact. Doh!

I assume that you are using the REST Authen module to control access? The user (-u) should be a user on your Drupal, not in Fedora, e.g., 'admin'. But since you're getting a 200 back, that's probably not your problem.

Mark
--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.

McFate, Mark

unread,
Oct 4, 2017, 11:53:06 AM10/4/17
to isla...@googlegroups.com
Thanks for the quick follow-up Mark, and for the suggestions.  Actually, I believe I am trying to run this using a Fedora admin account, not a Drupal user.  I’ll try changing that first, and will also clean up the source directory before I try again.  I’ll post results when I can.  If I’m successful I thought about forking your code and adding an option to make the process “bag friendly” by anticipating that the source directory may have a typical “bag structure”.

Thanks and take care.

You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/7i0HlQ7uO50/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+...@googlegroups.com.

mcf...@grinnell.edu

unread,
Oct 4, 2017, 2:14:31 PM10/4/17
to islandora
Got it working, very nicely too!  The problem was a little more sinister...  In a panic I failed to enable the Islandora_REST module.  Duh.

So looking at the code, there are a couple of enhancements I might attempt, like adding a "bag friendly" option, and if at all possible a "purge and replace datastreams" option that accepts a PID and replaces existing datastreams with copies from the source directory.  While I'm at it perhaps I will add some quick tests to verify that Islandora_REST and Islandora_REST_authen are installed and enabled.   

Thanks for another great module Mark!  Take care.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+unsubscribe@googlegroups.com.

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/7i0HlQ7uO50/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+unsubscribe@googlegroups.com.

Mark Jordan

unread,
Oct 4, 2017, 2:48:58 PM10/4/17
to isla...@googlegroups.com
Mark, thanks, good to hear it was user error =8^). I would have expected you to get a 404, not a 200, to that request if the REST module wasn't enabled.

I'm tickled that you are trying the REST ingester. Please open a github issue for each of those new features before you open a PR (I say this is CONTRIBUTING.md files in my repos, but I notice that Ingester doesn't have one - my bad). Those suggestions are great!

Also thanks for using REST Authen. I'd love some feedback on that module, later.

Mark
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/7i0HlQ7uO50/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+...@googlegroups.com.

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.

mcf...@grinnell.edu

unread,
Oct 5, 2017, 11:02:00 AM10/5/17
to islandora
Hello Mark, et al.  I'm no longer in crisis mode so this morning I'm looking at completing a few code additions to give the REST ingester a bagit-friendly option.  But to be honest, I'm not exactly sure if my case is "typical".  So I have a bagit-related question and I've posted the details of it as an "issue" in the REST ingester github repo at https://github.com/mjordan/islandora_rest_ingester/issues/2.  Please have a look if/when you have some time.

Reply all
Reply to author
Forward
0 new messages