Islandora Datastream CRUD

162 views
Skip to first unread message

Mark Jordan

unread,
Mar 28, 2016, 6:37:37 PM3/28/16
to islandora, island...@googlegroups.com
Hello Islandora community,

A number of tools exist to perform batch operations on Islandora datastreams (some of them mentioned in this thread and in this one). I'd like to announce another such tool that I've recently written, Islandora Datastream CRUD. This module provides a set of Drush commands for "fetching" (downloading), "pushing" (replacing, creating), and "deleting" (purging) datastreams. It applies its commands to a simple list of object PIDs that can be hand-curated or generated using a built-in command that queries Solr.

The module itself doesn't modify or create datastream content. You need to do that. Islandora Datastream CRUD gives you the option (i.e., requires you) to use whatever tool is best for the particular task you need to perform on your datastreams. That said, the module does include two sample scripts that can modify datastreams, one that adds an XML element to a set of XML files (like MODS XML file) and one that puts a watermark or label on top of a set of image files (like thumbnails). You can no doubt accomplish these two tasks with oXygen and Photoshop respectively, and even though the two sample scripts are fully functional, they are also  intended as examples of how you might modify a directory full of files corresponding to datastreams. Islandora Datastream CRUD's purpose is to help you get any datastream content out of Islandora and put the updated version back in, not to provide super-easy ways of updating obscure attributes in little-used MODS elements if another MODS element has a specific value.

There's no graphical user interface for this module yet, but it would be really nice to have a way to let a user package up all the datastreams of interest in a zip file (maybe using a Solr query or a view), let the user download the zip, and then, after the user has modified the datastream files, upload them in another zip to replace the original datastreams. If you've got resources that you can apply to this feature, let's collaborate!

Mark




Nick Ruest

unread,
Mar 28, 2016, 7:53:24 PM3/28/16
to island...@googlegroups.com, isla...@googlegroups.com
Hey Mark,

This is great!

Thank you for sharing it with the community!!!

-nruest

On 2016-03-28 06:37 PM, Mark Jordan wrote:
> Hello Islandora community,
>
> A number of tools exist to perform batch operations on Islandora
> datastreams (some of them mentioned in this thread
> <https://groups.google.com/forum/#!searchin/islandora/update$20datastreams/islandora/oRgxMbPXWJM/G7yQtlnGAFUJ>
> and in this one
> <https://groups.google.com/forum/#!topic/islandora/cao0jVauKyc>). I'd
> like to announce another such tool that I've recently written, Islandora
> Datastream CRUD <https://github.com/mjordan/islandora_datastream_crud>.
> --
> You received this message because you are subscribed to the Google
> Groups "islandora-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora-de...@googlegroups.com
> <mailto:islandora-de...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora-dev/1280989481.14875099.1459204656377.JavaMail.zimbra%40sfu.ca
> <https://groups.google.com/d/msgid/islandora-dev/1280989481.14875099.1459204656377.JavaMail.zimbra%40sfu.ca?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Mark Jordan

unread,
Mar 28, 2016, 8:18:29 PM3/28/16
to island...@googlegroups.com, isla...@googlegroups.com
Nick,

We come for the software, we stay for the community.

Mark


To unsubscribe from this group and stop receiving emails from it, send an email to islandora-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora-dev/56F9C3F0.7080508%40gmail.com.

Diego Pino

unread,
Mar 29, 2016, 3:48:39 PM3/29/16
to islandora-dev, isla...@googlegroups.com
Very nice Mark.
I can imagine piping Solr search results into this using some type of toolbox/UI. Very nice indeed. Do you ever Sleep?

Thanks a lot for sharing. Great work.

Mark Jordan

unread,
Mar 29, 2016, 3:58:58 PM3/29/16
to island...@googlegroups.com, isla...@googlegroups.com
Very nice Mark.
I can imagine piping Solr search results into this using some type of toolbox/UI. Very nice indeed. Do you ever Sleep?
Not very much! Maybe once our migration is done I'll give it a try.

Mark

Aaron Krebeck

unread,
Sep 8, 2016, 10:50:42 AM9/8/16
to islandora, island...@googlegroups.com

Thanks for this work - it is is a very helpful module.  Especially when used in conjunction with Notepad++ regex searches or with simple Python scripts to find and replace across a large number of files.  

I can definitely see us using it to help groom our metadata for harvest into DPLA.  

One question, when pushing with something like: 

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_mimetype=image/jpeg --datastreams_source_directory=/tmp/imagemods_modified --datastreams_crud_log=/tmp/crud.log 

I'm not getting the expected result.  Instead of adding these modified MODS.xml files as the most current version of the MODS datastream, they are being added as a completely new datastream with type ID "MODS.xml" (instead of the existing "MODS").  The existing MODS datastream still shows the older version from the initial fetch.  Am I missing an option in my push command?  The datastream list is below and you can see the new "MODS.xml" stream at the bottom

MODSMODS DatastreamManagedapplication/xml1.67 KiB5replacedownloadeditdelete
DCDC RecordManagedapplication/xml1.21 KiB1replacedownloadregenerate
ARCHIVAL0manifest.jsonManagedapplication/json145 B1replacedownloaddelete
OBJJCSL_4166.tifManagedimage/tiff22.36 MiB1replacedownloaddelete
ARCHIVAL1dublin_core.xmlManagedapplication/xml1.65 KiB1replacedownloaddelete
TECHMDTECHMDManagedapplication/xml8.38 KiB1replacedownloaddeleteregenerate
TNThumbnailManagedimage/jpeg29.34 KiB1replacedownloaddeleteregenerate
JPGMedium sized JPEGManagedimage/jpeg56.96 KiB1replacedownloaddeleteregenerate
JP2JPEG 2000Managedimage/jp2477.45 KiB1replacedownloaddeleteregenerate
POLICYXACML Policy StreamInline XMLtext/xml3.73 KiB1replacedownloaddelete
MODS.xmlManagedtext/xml1.71 KiB1replacedownloaddelete

Mark Jordan

unread,
Sep 8, 2016, 12:35:14 PM9/8/16
to island...@googlegroups.com, islandora
Aaron,

Definitely not the intended result. I've been using the module to push datastreams quite a bit in the last couple weeks doing some post-migration cleanup and haven't see this behavior, but we'll figure this out.

Would you mind filing an issue at https://github.com/mjordan/islandora_datastream_crud/issues  ? Linking to your post instead of repeating it is fine (https://groups.google.com/d/msg/islandora/q1udVKoh194/LELtH89uCAAJ). I'll look into the problem and have a fix by end of the weekend at the latest. It would be helpful if you could include in the issue:

1) the exact command you ran that resulted in the extra MODS.xml datastreams (the command below includes --datastreams_mimetype=image/jpeg so I assume that's not the one you ran),
2) some sample filenames for the MODS datastreams you pushed, and
3) the git commit hash your copy is at.

You can recover from this push by deleting the unwanted MODS.xml datastreams, but you'll need a list of the affected PIDs. I can walk you through that in the issue comments if you'd like.

Mark


--
You received this message because you are subscribed to the Google Groups "islandora-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora-de...@googlegroups.com.

Aaron Krebeck

unread,
Sep 8, 2016, 12:49:46 PM9/8/16
to islandora, island...@googlegroups.com
Thanks Mark.  I've filed the issue on github with some additional information. I'm not too worried about recovering from the push at this time. Deleting the unwanted datastream seems pretty straightforward.  I just want to clear up the problem with my commands and help improve documentation on this great module.
Reply all
Reply to author
Forward
0 new messages