Islandora Datastream CRUD

253 views
Skip to first unread message

Mark Jordan

unread,
Mar 28, 2016, 6:37:38 PM3/28/16
to islandora, island...@googlegroups.com
Hello Islandora community,

A number of tools exist to perform batch operations on Islandora datastreams (some of them mentioned in this thread and in this one). I'd like to announce another such tool that I've recently written, Islandora Datastream CRUD. This module provides a set of Drush commands for "fetching" (downloading), "pushing" (replacing, creating), and "deleting" (purging) datastreams. It applies its commands to a simple list of object PIDs that can be hand-curated or generated using a built-in command that queries Solr.

The module itself doesn't modify or create datastream content. You need to do that. Islandora Datastream CRUD gives you the option (i.e., requires you) to use whatever tool is best for the particular task you need to perform on your datastreams. That said, the module does include two sample scripts that can modify datastreams, one that adds an XML element to a set of XML files (like MODS XML file) and one that puts a watermark or label on top of a set of image files (like thumbnails). You can no doubt accomplish these two tasks with oXygen and Photoshop respectively, and even though the two sample scripts are fully functional, they are also  intended as examples of how you might modify a directory full of files corresponding to datastreams. Islandora Datastream CRUD's purpose is to help you get any datastream content out of Islandora and put the updated version back in, not to provide super-easy ways of updating obscure attributes in little-used MODS elements if another MODS element has a specific value.

There's no graphical user interface for this module yet, but it would be really nice to have a way to let a user package up all the datastreams of interest in a zip file (maybe using a Solr query or a view), let the user download the zip, and then, after the user has modified the datastream files, upload them in another zip to replace the original datastreams. If you've got resources that you can apply to this feature, let's collaborate!

Mark




Nick Ruest

unread,
Mar 28, 2016, 7:53:24 PM3/28/16
to island...@googlegroups.com, isla...@googlegroups.com
Hey Mark,

This is great!

Thank you for sharing it with the community!!!

-nruest

On 2016-03-28 06:37 PM, Mark Jordan wrote:
> Hello Islandora community,
>
> A number of tools exist to perform batch operations on Islandora
> datastreams (some of them mentioned in this thread
> <https://groups.google.com/forum/#!searchin/islandora/update$20datastreams/islandora/oRgxMbPXWJM/G7yQtlnGAFUJ>
> and in this one
> <https://groups.google.com/forum/#!topic/islandora/cao0jVauKyc>). I'd
> like to announce another such tool that I've recently written, Islandora
> Datastream CRUD <https://github.com/mjordan/islandora_datastream_crud>.
> --
> You received this message because you are subscribed to the Google
> Groups "islandora-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora-de...@googlegroups.com
> <mailto:islandora-de...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora-dev/1280989481.14875099.1459204656377.JavaMail.zimbra%40sfu.ca
> <https://groups.google.com/d/msgid/islandora-dev/1280989481.14875099.1459204656377.JavaMail.zimbra%40sfu.ca?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Mark Jordan

unread,
Mar 28, 2016, 8:18:29 PM3/28/16
to island...@googlegroups.com, isla...@googlegroups.com
Nick,

We come for the software, we stay for the community.

Mark


To unsubscribe from this group and stop receiving emails from it, send an email to islandora-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora-dev/56F9C3F0.7080508%40gmail.com.

Diego Pino

unread,
Mar 29, 2016, 3:48:42 PM3/29/16
to islandora-dev, isla...@googlegroups.com
Very nice Mark.
I can imagine piping Solr search results into this using some type of toolbox/UI. Very nice indeed. Do you ever Sleep?

Thanks a lot for sharing. Great work.

Mark Jordan

unread,
Mar 29, 2016, 3:59:00 PM3/29/16
to island...@googlegroups.com, isla...@googlegroups.com
Very nice Mark.
I can imagine piping Solr search results into this using some type of toolbox/UI. Very nice indeed. Do you ever Sleep?
Not very much! Maybe once our migration is done I'll give it a try.

Mark

Aaron Krebeck

unread,
Sep 8, 2016, 10:50:40 AM9/8/16
to islandora, island...@googlegroups.com

Thanks for this work - it is is a very helpful module.  Especially when used in conjunction with Notepad++ regex searches or with simple Python scripts to find and replace across a large number of files.  

I can definitely see us using it to help groom our metadata for harvest into DPLA.  

One question, when pushing with something like: 

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_mimetype=image/jpeg --datastreams_source_directory=/tmp/imagemods_modified --datastreams_crud_log=/tmp/crud.log 

I'm not getting the expected result.  Instead of adding these modified MODS.xml files as the most current version of the MODS datastream, they are being added as a completely new datastream with type ID "MODS.xml" (instead of the existing "MODS").  The existing MODS datastream still shows the older version from the initial fetch.  Am I missing an option in my push command?  The datastream list is below and you can see the new "MODS.xml" stream at the bottom

MODSMODS DatastreamManagedapplication/xml1.67 KiB5replacedownloadeditdelete
DCDC RecordManagedapplication/xml1.21 KiB1replacedownloadregenerate
ARCHIVAL0manifest.jsonManagedapplication/json145 B1replacedownloaddelete
OBJJCSL_4166.tifManagedimage/tiff22.36 MiB1replacedownloaddelete
ARCHIVAL1dublin_core.xmlManagedapplication/xml1.65 KiB1replacedownloaddelete
TECHMDTECHMDManagedapplication/xml8.38 KiB1replacedownloaddeleteregenerate
TNThumbnailManagedimage/jpeg29.34 KiB1replacedownloaddeleteregenerate
JPGMedium sized JPEGManagedimage/jpeg56.96 KiB1replacedownloaddeleteregenerate
JP2JPEG 2000Managedimage/jp2477.45 KiB1replacedownloaddeleteregenerate
POLICYXACML Policy StreamInline XMLtext/xml3.73 KiB1replacedownloaddelete
MODS.xmlManagedtext/xml1.71 KiB1replacedownloaddelete

Mark Jordan

unread,
Sep 8, 2016, 12:35:15 PM9/8/16
to island...@googlegroups.com, islandora
Aaron,

Definitely not the intended result. I've been using the module to push datastreams quite a bit in the last couple weeks doing some post-migration cleanup and haven't see this behavior, but we'll figure this out.

Would you mind filing an issue at https://github.com/mjordan/islandora_datastream_crud/issues  ? Linking to your post instead of repeating it is fine (https://groups.google.com/d/msg/islandora/q1udVKoh194/LELtH89uCAAJ). I'll look into the problem and have a fix by end of the weekend at the latest. It would be helpful if you could include in the issue:

1) the exact command you ran that resulted in the extra MODS.xml datastreams (the command below includes --datastreams_mimetype=image/jpeg so I assume that's not the one you ran),
2) some sample filenames for the MODS datastreams you pushed, and
3) the git commit hash your copy is at.

You can recover from this push by deleting the unwanted MODS.xml datastreams, but you'll need a list of the affected PIDs. I can walk you through that in the issue comments if you'd like.

Mark


--
You received this message because you are subscribed to the Google Groups "islandora-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora-de...@googlegroups.com.

Aaron Krebeck

unread,
Sep 8, 2016, 12:49:45 PM9/8/16
to islandora, island...@googlegroups.com
Thanks Mark.  I've filed the issue on github with some additional information. I'm not too worried about recovering from the push at this time. Deleting the unwanted datastream seems pretty straightforward.  I just want to clear up the problem with my commands and help improve documentation on this great module.

Aaron Krebeck

unread,
Apr 6, 2018, 1:47:31 PM4/6/18
to islandora
I've run into an issue where my ingest script occasionally generates 40 or 50 duplicate versions of derivative datastreams (usually TN).  I may ask DGI for some custom code to solve this when I get a handle on how widespread the problem is, but in the meantime remembered the CRUD module.  We don't need any help filling up our storage volume with a bunch of old thumbnails.  Could the CRUD module be used to, for example, delete all but the last 3 versions of a datastream?

Mark Jordan

unread,
Apr 6, 2018, 1:54:51 PM4/6/18
to isla...@googlegroups.com
Aaron, not in its current form, but that functionality would be super useful and consistent with Datastream CRUD's purpose in life: to help site admins manage datastreams in bulk. I don't have a lot of cycles right now to add large new features, but that functionality sounds so useful, there might be someone else out there who could take it on in the short term.

Mark
--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/9937bf5a-8deb-4d19-a2e2-bb161c4fa3f4%40googlegroups.com.

Mark Jordan

unread,
Apr 6, 2018, 2:04:18 PM4/6/18
to isla...@googlegroups.com

Aaron Krebeck

unread,
Apr 6, 2018, 2:27:01 PM4/6/18
to islandora
Thanks, Mark.

Patrick Dunlavey

unread,
Jun 4, 2019, 4:15:27 PM6/4/19
to islandora
I created a PR to add a versions option to the islandora_datastream_crud_delete_datastreams command. Would love to have some folks test and comment!

Jared Whiklo

unread,
Jun 4, 2019, 5:10:37 PM6/4/19
to isla...@googlegroups.com
Sorry I'm coming to this very late.

While the deleting function will help you now, you can disable
versioning on your TN datastream so you don't create any new versions.

I think this is configurable in islandora_checksum, but if not I know it
is in Fedora.

cheers,
jared

On 2019-06-04 3:15 p.m., Patrick Dunlavey wrote:
> I created a PR to add a versions option to the
> islandora_datastream_crud_delete_datastreams command. Would love to have
> some folks test and comment!
>
> https://github.com/SFULibrary/islandora_datastream_crud/pull/77
>
> On Friday, April 6, 2018 at 2:27:01 PM UTC-4, Aaron Krebeck wrote:
>
> Thanks, Mark.
>
> On Friday, April 6, 2018 at 2:04:18 PM UTC-4, Mark Jordan wrote:
>
> I've opened a placeholder issue:
> https://github.com/SFULibrary/islandora_datastream_crud/issues/53 <https://github.com/SFULibrary/islandora_datastream_crud/issues/53>
> <https://groups.google.com/d/msg/islandora/q1udVKoh194/LELtH89uCAAJ>).
> I'll look into the problem and have a fix by end
> of the weekend at the latest. It would be
> helpful if you could include in the issue:
>
> 1) the exact command you ran that resulted in
> the extra MODS.xml datastreams (the command
> below includes --datastreams_mimetype=image/jpeg
> so I assume that's not the one you ran),
> 2) some sample filenames for the MODS
> datastreams you pushed, and
> 3) the git commit hash your copy is at.
>
> You can recover from this push by deleting the
> unwanted MODS.xml datastreams, but you'll need a
> list of the affected PIDs. I can walk you
> through that in the issue comments if you'd like.
>
> Mark
>
> ------------------------------------------------------------------------
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS/view>
> MODS Datastream *M*anaged application/xml
> 1.67 KiB 5
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS/download>
> edit
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS/edit>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS/delete>
>
> DC
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/DC/view>
> DC Record *M*anaged application/xml 1.21
> KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/DC/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/DC/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/DC/download>
> regenerate
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/DC/regenerate>
> ARCHIVAL0
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL0/view>
> manifest.json *M*anaged application/json
> 145 B 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL0/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL0/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL0/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL0/delete>
>
> OBJ
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/OBJ/view>
> JCSL_4166.tif *M*anaged image/tiff 22.36
> MiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/OBJ/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/OBJ/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/OBJ/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/OBJ/delete>
>
> ARCHIVAL1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL1/view>
> dublin_core.xml *M*anaged application/xml
> 1.65 KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL1/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL1/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL1/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/ARCHIVAL1/delete>
>
> TECHMD
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TECHMD/view>
> TECHMD *M*anaged application/xml 8.38 KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TECHMD/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TECHMD/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TECHMD/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TECHMD/delete>
> regenerate
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TECHMD/regenerate>
> TN
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TN/view>
> Thumbnail *M*anaged image/jpeg 29.34 KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TN/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TN/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TN/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TN/delete>
> regenerate
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/TN/regenerate>
> JPG
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JPG/view>
> Medium sized JPEG *M*anaged image/jpeg
> 56.96 KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JPG/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JPG/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JPG/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JPG/delete>
> regenerate
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JPG/regenerate>
> JP2
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JP2/view>
> JPEG 2000 *M*anaged image/jp2 477.45 KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JP2/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JP2/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JP2/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JP2/delete>
> regenerate
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/JP2/regenerate>
> POLICY
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/POLICY/view>
> XACML Policy Stream Inline *X*ML text/xml
> 3.73 KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/POLICY/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/POLICY/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/POLICY/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/POLICY/delete>
>
> MODS.xml
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS.xml/view>
> *M*anaged text/xml 1.71 KiB 1
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS.xml/version>
> replace
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS.xml/replace>
> download
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS.xml/download>
> delete
> <http://auislandora-stage.wrlc.org/islandora/object/auislandora%3A12243/datastream/MODS.xml/delete>
>
>  
>
>
> On Monday, March 28, 2016 at 6:37:38 PM
> UTC-4, Mark Jordan wrote:
>
> Hello Islandora community,
>
> A number of tools exist to perform batch
> operations on Islandora datastreams
> (some of them mentioned in this thread
> I'd like to announce another such tool
> that I've recently written, Islandora
> Datastream CRUD
> <https://github.com/mjordan/islandora_datastream_crud>.
> <https://groups.google.com/d/msgid/islandora-dev/55f54e28-61c3-4881-bcd0-8f4c53b0baab%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> For more information about using this group, please read
> our Listserv Guidelines:
> http://islandora.ca/content/welcome-islandora-listserv
> <http://islandora.ca/content/welcome-islandora-listserv>
> ---
> You received this message because you are subscribed to
> the Google Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to islandora+...@googlegroups.com.
> Visit this group at
> https://groups.google.com/group/islandora
> <https://groups.google.com/group/islandora>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/9937bf5a-8deb-4d19-a2e2-bb161c4fa3f4%40googlegroups.com
> <https://groups.google.com/d/msgid/islandora/9937bf5a-8deb-4d19-a2e2-bb161c4fa3f4%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> For more information about using this group, please read our
> Listserv Guidelines:
> http://islandora.ca/content/welcome-islandora-listserv
> <http://islandora.ca/content/welcome-islandora-listserv>
> ---
> You received this message because you are subscribed to the
> Google Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to islandora+...@googlegroups.com.
> Visit this group at
> https://groups.google.com/group/islandora
> <https://groups.google.com/group/islandora>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/397458115.340863571.1523037288291.JavaMail.zimbra%40sfu.ca
> <https://groups.google.com/d/msgid/islandora/397458115.340863571.1523037288291.JavaMail.zimbra%40sfu.ca?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> For more information about using this group, please read our Listserv
> Guidelines: http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/islandora.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/1935c275-2ca6-4381-a8e1-25d54c66e6b8%40googlegroups.com
> <https://groups.google.com/d/msgid/islandora/1935c275-2ca6-4381-a8e1-25d54c66e6b8%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Jared Whiklo
Pronouns: he/him/his
jwh...@gmail.com
--------------------------------------------------
You know you're from Winnipeg when...Your grandparents drive at 100 km/h
through four meters of snow during a blizzard, without flinching.

signature.asc
Reply all
Reply to author
Forward
0 new messages