New module: Islandora Find & Replace

509 views
Skip to first unread message

Mitchell MacKenzie

unread,
Apr 11, 2016, 5:27:42 PM4/11/16
to islandora
Another week, another module. I've needed this one many times...

Islandora Find & Replace allows for simple find & replace of text in datastreams via an admin form. If Islandora Pretty Text Diff is enabled, previews of the find & replace can be viewed before submitting the find & replace operation. A log captures the datastream versions to show a diff after the update is complete.


It would be smart to test the updates on sample data before running the operation in production, and as always, make sure backups are up-to-date.

Mitch

Mark Jordan

unread,
Apr 12, 2016, 10:13:26 AM4/12/16
to isla...@googlegroups.com
This will be popular! I've linked to your blog post from the https://github.com/mjordan/islandora_datastream_crud README since a lot of people will want to use (should use) your module instead.

Mark


--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/f0658a8c-e123-4ab0-8315-3adab8f4ab82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jennifer

unread,
Apr 12, 2016, 9:04:56 PM4/12/16
to islandora
This something that we will definitely try out.

Mark or anyone - for those of us who haven't tried the CRUD or this one and are not programmers, what are the differences between CRUD and this find/replace?

Also has anyone tested this yet?

Jennifer

Mark Jordan

unread,
Apr 12, 2016, 9:22:46 PM4/12/16
to isla...@googlegroups.com
Hi Jennifer,

Islandora Find & Replace offers a graphical user interface for finding a set of datastreams, and a useful web form for defining the 'find' string and the 'replace with' string. It then lets you performs the find and replace operation on the datastreams. It only works on text-based datastreams like XML files or OCR. It's much more friendly to more users than Datastream CRUD but it by design only does one (extremely useful) thing - search and replace text strings.

Islandora Datastream CRUD offers only a set of drush commands. It doesn't do the actual modification of datastream content, it just provides command-line tools for retrieving the datastream content, saving it as files, and then pushing the (modified by some other process or software) new versions of the datastream content files back up to your repository. It doesn't care whether the datastreams are text, image, video, etc. because it just retrieves and pushes files. It can also delete datastreams from your repository. It's not as user friendly as Find & Replace but it's more of tool that will let you do whatever you want to your datastreams, provided you have some additional tool to perform the actual modification of the datastreams.

They're both tools that help you manage datastreams, but they have different functionality and intended users. One thing they do have in common is that if used without a lot of testing and caution, can do extensive damage to your datastreams that would be difficult to fix if the damage affected a lot of objects.

Mitch may want to weigh in,

Mark


Mitchell MacKenzie

unread,
Apr 12, 2016, 10:47:38 PM4/12/16
to islandora
Yep, Mark's outline of the differences between the modules is exactly how I see them.

One is for wholesale datastream changes from the command line and the other is for "quick" text replacements across a definable set of objects via a form.

And both should be considered "power tools" that require responsible use but can really help save time when modifying batches of objects.

I think both modules would be very helpful for post-migration data cleanups, though there's no technical reason to avoid using them on production datasets.

Mitch

Brandon Weigel

unread,
Apr 13, 2016, 4:35:01 PM4/13/16
to islandora
This is going to be tremendously helpful to us - thanks, Mitchell!

I think I already know the answer, but I should ask anyway -- is there any chance that this module adheres to namespace restrictions? (particularly important in a multisite environment)

Brandon Weigel

unread,
Apr 13, 2016, 4:36:28 PM4/13/16
to islandora
Whoops, never mind - should have read the blog post. The fact that it does respect namespace restrictions is tremendous. Fantastic work!

Brandon Weigel

unread,
May 2, 2016, 7:14:07 PM5/2/16
to islandora
Hi Mitch,

I have one suggestion for your find/replace module, if you're open to it. (Please let me know if my suggestion already exists and is configurable somewhere; I haven't found it.)

It's great that the proper collection title is used in the Collection dropdown, but in an multisite environment (like ours) it's possible that several collections might have the same or similar titles. It would forestall any confusion if the collection PID was displayed too - perhaps in parentheses beside the title?

Thanks again for offering the module - it's already been very useful to us.

Cheers,

Brandon

Mitchell MacKenzie

unread,
May 3, 2016, 4:07:14 PM5/3/16
to islandora
Thanks Brandon, that's definitely a good tweak that should be made. I will try to get it in place in the next few days.

Mitch

Jennifer Eustis

unread,
May 6, 2016, 2:53:19 PM5/6/16
to islandora
Hi all,

I finally started playing around with this awesome new tool. My test was to find and replace rights statements. These statements tend to be rather long in some instances, definitely exceeding the 127 character limit of textfields. It would be great to be able to find and replace large portions of text. Another added piece of functionality to consider is to add keyword search in addition to selecting the content model and collection. Right now, the user has to select both a content model and collection. It would be awesome to be able to select one or the other. For example, rights statements go beyond both collection and content model. In that respect, what about a global find and replace? That might be really dangerous.

When I look at a the Preview, it's blank. Not sure if this is something on my end. I'm testing with the latest VM on my local machine. I installed both modules for the pretty text diff and the find and replace. Also, the changes took effect in the MODS datastream. When I looked at them in the Details, I didn't see any changes. I had to edit the MODS or just click edit and then update for the changes to appear in the Details.

Jennifer
previewInFindReplace.jpg

Mitchell MacKenzie

unread,
May 8, 2016, 6:49:44 PM5/8/16
to islandora
Thanks for the feedback, I've changed the Search and Replace fields to textarea form fields so the character limit shouldn't be a problem anymore.

Regarding the search conditions, only the content model is a required field. This is to limit the number of objects to search for the matching text in, since the module does not rely on Solr to do the search.

Regarding the preview issue, please create an issue in the project's GitHub with the values of the form fields that you entered, sample of the MODS datastream, and the location where the jQuery.PrettyTextDiff library was placed in the Drupal installation.

Regarding the object display not updating after the operation, this is likely because the display is Dublin Core metadata. The module only updates the selected datastream. The ability to update derivative datastreams would be outside of the scope of this module, it should be in an independent module but that may cause conflicts with the XML Form Builder's updates of Dublin Core.

Mitch

Brandon Weigel

unread,
May 11, 2016, 7:02:16 PM5/11/16
to islandora
Thanks for making those updates, Mitch. The addition of the PID to the collection title makes it a lot easier to work with.

I should note that we ran into a problem using this that others might also run into if their servers aren't powerful enough... Not related to the module itself, but just the effect of making mass changes in this way. It wreaked havoc on our Solr index.

From Nelson (who's providing our support):
We've narrowed the problem down to the MySQL database connections. Fedoragsearch (application that handles updating Solr) was set to multithreaded. We believe the multithreading was overwhelming the database during the full re-index and so the process was failing part way through. We've disabled the multithreading for now (and completed a re-index successfully) and may revisit the issue in the future if the amount of resources on your system changes.
...
The under lying issue is mysql can't handle bulk updates when all the Drupal filters connections are enabled in Fedora. The connections are required for Fedora to authenticate against Drupal; there is a connection for each multisite and Fedora tests each connections for every request it receives. 
We were hoping that disabling the Fedoragsearch multithreading would help, but it clearly still fails under mass updates.

Martha Tenney

unread,
May 15, 2017, 2:28:20 PM5/15/17
to islandora
Hi all,

I'm reviving this old thread as the Islandora Collaboration Group (ICG) [1] is hoping to work on documentation and bug fixes for Islandora Find & Replace at an upcoming hack/doc event. Are the issues [2] on the github still desired fixes? Does anyone have additional issues or use cases that they'd like us to attempt to address? Mitch, is there any other work you'd been thinking about doing but hadn't gotten to? 

Feel free to contact me on this thread or off-list - mtenney at barnard.edu.

Thanks!
Martha

dp...@metro.org

unread,
May 22, 2017, 1:46:39 PM5/22/17
to islandora
Hi Martha, one of the use cases we have is (if still on time)
A) arbitrary list of PID's as source for the changes instead of cmodel collection selection
B) using Solr search/ queries (pretty much A. , but more specific, making use of the existing UI)  as way of defining where to search.

Best

Diego Pino 
Metro.org

PS: if you don´t see any response on the issues i guess you can fork and replicate on your github copy, making references to the original issues. Then you can make pull requests back to contentmath institution and try again, sometimes code is easier to manage for the owner than the issues which can be more abstract?

William Conlin

unread,
Nov 30, 2017, 2:06:53 PM11/30/17
to islandora
Hey everyone,

I'm stuck on debugging some strange behavior in the islandora_find_and_replace module. With sets of items greater than about 600, the form submit seems to not go through. I haven't pinned the number down exactly, but we noticed it with sets > 1k. I've had the debugger plugged into the module at all kinds of different stages. 

The problem may have something to do with our system/build, but I can't imagine why... 
I'm stuck. Any advice/thoughts? appreciated.

Cheers,
Will

dp...@metro.org

unread,
Dec 1, 2017, 5:34:55 PM12/1/17
to islandora
Hey Will,

Guessing maybe you reached some PHP Post limits? [1]

Normally PHP ships with a relatively good number for the number of fields (vars) that can be submitted

php_value max_input_vars 1000

That number looks like yours doomsday number.

Since the PID's are submitted via a table select element, I guess Drupal internally converts them into many little vars.

Give it a shot

Diego Pino
Metro.org

[1] https://www.willmaster.com/library/tutorials/php-form-submission-limits.php // not official PHP doc but I liked the narrative style!

William Conlin

unread,
Oct 2, 2018, 1:56:36 PM10/2/18
to islandora
You were absolutely right Diego.

We made a pull request to contentmath, with a fix for this that posts an error if the form inputs exceed max_input_vars and a way to process a full set without submitting so many checkboxes: 

https://github.com/contentmath/islandora_find_replace/pull/8

It looks like there are some similar ideas in our fork as well as mnylc & bondjimbond's forks. (Namespace filters, Regular expressions, and CSV/Excell downloads)

We've developed a batch revert feature that lets the user reset the metadata to before the replacement from the batch log page. Would be great to package these up into one great F&R module.

dp...@metro.org

unread,
Oct 2, 2018, 2:21:32 PM10/2/18
to islandora
Hi Will, if you want to make a pull against ours please feel free to do, we can merge that ASAP. There is a regex branch you can also checkout. Thinking about officially maintaining it but of course, only if people are interested.

Cheers!

D
Reply all
Reply to author
Forward
0 new messages