Mark,
I've been working intensely on our
server-side batch scripts this week and wanted to follow up with some new findings and strategies.
The "collision" I wrote about previously is still out there... My current working theory is when multiple instances of 'drush islandora_book_batch_preprocess' execute simultaneously or near simultaneously, ingest issues happen.
I have one collision error documented:
<ASSERT>Datastream must have a datastream id. (foxml:datastream: value of ID is missing)</ASSERT>
...I also have been examining BLOBs generated during islandora_book_batch_preprocess. Still haven't found the cause of the above error...
SO, after much consideration, I decided to implement a locking mechanism
and precision set ingest (
--ingest_set=)
. Each of our ready to ingest objects passes through here:
#!/bin/bash
sleeping_beauty=$(( $RANDOM % 10 + 10 ));
until [ "$islandora_ingest_lock" == '0' ]
do
sleep $sleeping_beauty
if [ ! -d "/tmp/islandora_ingest.lock" ];
then
islandora_ingest_lock='0'
mkdir "/tmp/islandora_ingest.lock"
else
islandora_ingest_lock='1'
fi
done
until [ "$upload_ready" == '1' ]
do
drush_ready=$(ps aux 2>/dev/null |grep drush 2>/dev/null |wc -l)
loadingdock_ready=$(/usr/bin/lsof /mnt/islandora-loadingdock | grep -e "[[:digit:]]\+[wu]\{1\}" |wc -l)
if (( $drush_ready == '0' || $drush_ready == '1' && $loadingdock_ready == '0'))
then
upload_ready='1'
else
upload_ready='0'
fi
done
#preprocess
batch_set_id=$(/usr/local/bin/drush -c /usr/local/drush/drushrc.php -v --user=webmaster --uri=https://island1.uncc.edu islandora_batch_scan_preprocess --namespace=$1 --content_models=$2 --parent=$3 --parent_relationship_pred=isMemberOfCollection --type=directory --target=$4 2>&1 | sed -E '/^SetId:/! d; s/^SetId: ([0-9]+).*/\1/')
#ready_for_ingest
/usr/local/bin/drush -c /usr/local/drush/drushrc.php vset islandora_bagit_create_on_modify '0'
/usr/local/bin/drush -c /usr/local/drush/drushrc.php -v --user=user --uri=https://server islandora_batch_ingest --ingest_set=$batch_set_id >> /mnt/islandora-loadingdock/ingest_log/ingest.log
#post_ingest
/usr/local/bin/drush -c /usr/local/drush/drushrc.php vset islandora_bagit_create_on_modify '1'
rmdir "/tmp/islandora_ingest.lock"
In summary, I'm not just locking during islandora_batch_ingest, I'm locking before islandora_batch_scan_preprocess. I'm finding great success today with the locking mechanism and precision set ingest.