Multiple batch loads at once?

49 views
Skip to first unread message

Mark Jordan

unread,
Apr 14, 2016, 11:17:51 AM4/14/16
to islandora, island...@googlegroups.com
Hi,

Has anybody tried running multiple (say 2 or 3) Islandora Batch loads via drush at one time? Or would that be a Dumb Thing To Do? Would love to hear if anyone has any experience.

Mark

Brad Spry

unread,
Apr 15, 2016, 11:04:52 AM4/15/16
to islandora, island...@googlegroups.com
Mark,

Yes, we experience a "collision" when trying to run multiple 'drush islandora_batch_ingest' commands simultaneously or near-simultaneously. The effect is usually ingest failure.

Ideally, you want to stage all your loads using 'drush islandora_batch_scan_preprocess' then run 'drush islandora_batch_ingest' once.

There a more precise method possible (super kudos to Jared Whiklo), which I have not moved into production yet:
https://jira.duraspace.org/browse/ISLANDORA-1376?focusedCommentId=44324&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-44324

...because I haven't put it into production yet, I still do not know if it results in a collision if run simultaneously or near-simultaneously.    I intend on finding out!   


Brad



Brad Spry

unread,
Nov 18, 2016, 4:14:56 PM11/18/16
to islandora, island...@googlegroups.com
Mark,

I've been working intensely on our server-side batch scripts this week and wanted to follow up with some new findings and strategies.

The "collision" I wrote about previously is still out there...  My current working theory is when multiple instances of 'drush islandora_book_batch_preprocess' execute simultaneously or near simultaneously, ingest issues happen.

I have one collision error documented:
<ASSERT>Datastream must have a datastream id. (foxml:datastream: value of ID is missing)</ASSERT>

...I also have been examining BLOBs generated during islandora_book_batch_preprocess.   Still haven't found the cause of the above error...

SO, after much consideration, I decided to implement a locking mechanism and precision set ingest (--ingest_set=).   Each of our ready to ingest objects passes through here:

#!/bin/bash
sleeping_beauty
=$(( $RANDOM % 10 + 10 ));
until [ "$islandora_ingest_lock" == '0' ]
do
sleep $sleeping_beauty
       
if [ ! -d "/tmp/islandora_ingest.lock" ];
               
then
                islandora_ingest_lock
='0'
                mkdir
"/tmp/islandora_ingest.lock"
               
else
                islandora_ingest_lock
='1'
       
fi
done
until [ "$upload_ready" == '1' ]
do
        drush_ready
=$(ps aux 2>/dev/null |grep drush 2>/dev/null |wc -l)
        loadingdock_ready
=$(/usr/bin/lsof /mnt/islandora-loadingdock | grep -e "[[:digit:]]\+[wu]\{1\}" |wc -l)
       
if (( $drush_ready == '0' || $drush_ready == '1' && $loadingdock_ready == '0'))
               
then
                upload_ready
='1'
               
else
                upload_ready
='0'
       
fi
done
       
#preprocess
batch_set_id=$(/usr/local/bin/drush -c /usr/local/drush/drushrc.php -v --user=webmaster --uri=https://island1.uncc.edu islandora_batch_scan_preprocess --namespace=$1 --content_models=$2 --parent=$3 --parent_relationship_pred=isMemberOfCollection --type=directory --target=$4 2>&1 | sed -E '/^SetId:/! d; s/^SetId: ([0-9]+).*/\1/')

       
#ready_for_ingest
       
/usr/local/bin/drush -c /usr/local/drush/drushrc.php vset islandora_bagit_create_on_modify '0'
       
/usr/local/bin/drush -c /usr/local/drush/drushrc.php -v --user=user --uri=https://server islandora_batch_ingest --ingest_set=$batch_set_id >> /mnt/islandora-loadingdock/ingest_log/ingest.log

       
#post_ingest
       
/usr/local/bin/drush -c /usr/local/drush/drushrc.php vset islandora_bagit_create_on_modify '1'
        rmdir
"/tmp/islandora_ingest.lock"


In summary, I'm not just locking during islandora_batch_ingest, I'm locking before islandora_batch_scan_preprocess.   I'm finding great success today with the locking mechanism and precision set ingest.











Brad Spry

unread,
Nov 18, 2016, 4:45:29 PM11/18/16
to islandora, island...@googlegroups.com
Mark,

I've been working to implement a cool server-side book batch pre-processing workflow this week, so I've been working on our nifty ingest scripts.

I ran into the "collision" issue I wrote about previously...  After wrestling with it for days, my current theory is issues can be caused by multiple simultaneous or near-simultaneous execution(s) of islandora_batch_scan_preprocess.

I have one error documented so far:

<ASSERT>Datastream must have a datastream id. (foxml:datastream: value of ID is missing)</ASSERT>

The cause of that error is still eluding me; I've even been disassembling BLOBs created by islandora_batch_scan_preprocess in search of answers :-)

I had to keep moving forward though, so I implemented a locking mechanism and precision set ingest.  All of my ingest-ready objects and related directories pass through here:



        batch_set_id
=$(/usr/local/bin/drush -c /usr/local/drush/drushrc.php -v --user=user --uri=https://server islandora_batch_scan_preprocess --namespace=$1 --content_models=$2 --parent=$3 --parent_relationship_pred=isMemberOfCollection --type=directory --target=$4 2>&1 | sed -E '/^SetId:/! d; s/^SetId: ([0-9]+).*/\1/')


       
#ready_for_ingest
       
/usr/local/bin/drush -c /usr/local/drush/drushrc.php vset islandora_bagit_create_on_modify '0'
       
/usr/local/bin/drush -c /usr/local/drush/drushrc.php -v --user=user --uri=https://server islandora_batch_ingest --ingest_set=$batch_set_id >> /mnt/islandora-loadingdock/ingest_log/ingest.log

       
#post_ingest
       
/usr/local/bin/drush -c /usr/local/drush/drushrc.php vset islandora_bagit_create_on_modify '1'
        rmdir
"/tmp/islandora_ingest.lock"


After implementing the locking mechanism and precision set ingest, I've seen no "collisions".   My testbed has been 2-3 books, audio, and images all trying to ingest simultaneously.   I no longer allow them to fight each other; objects now form a single filed line.

I intend to keep pushing it and see how it holds up!

<B
Reply all
Reply to author
Forward
0 new messages