Jobs Issues and Query regarding OAI-PMH

123 views
Skip to first unread message

Vicky Phillips

unread,
Apr 8, 2022, 5:23:38 AM4/8/22
to AtoM Users
Hi
We've recently upgraded from version 2.4 of AtoM to version 2.6.1

We were unable to switch on the Cache description XML exports upon creation/modification in previous version as it put too much pressure on our site and caused it go down.  During testing of the upgraded site we switched this functionality back on and did some load testing and it appeared to manage fine, with jobs being queued up nicely in gearman.  However, when we went live we soon started experiencing issues with number of jobs queuing for arXmlExportSingleFileJob growing and never seeming to get to the end of the list of jobs, although we could see from gearman monitor page that it was still appearing to process things. However this job appeared to have an impact on other jobs e.g. arObjectMoveJob and arFindingAidJob which had items in queue but they no longer appeared to be processing anything.  We therefore switched the Cache description XML exports upon creation/modification off again to see if this was the cause of the problem.  All seemed to be good for a few days and now we appear to hit the same sort of issue again but this time with the arUpdateEsIoDocumentsJob, this still seems to be running but arObjectMoveJob and arFindingAidJob don't seem to be processing at all, just queuing everything up.  Any ideas why a job which is being heavily used is affecting other jobs? Is there something we can do to prevent this from happening?

What does the arUpdateEsIoDocumentsJob relate to?  An archivist spotted that this was running and updating descendants after he'd amended an archival description.  When I did some testing I noticed that this job appeared when editing a record that had the Name of creator field populated.  The archivist hadn't touched that field when editing, so does this job run whether you've altered the Name of creator field or not?

Another thing I noticed having switched the Cache description XML exports upon creation/modification off was that the DC in the OAI-PMH was still being updated (which was a bit of a surprise to me, I didn't think OAI-PMH would be updated with Cache descriptions switched off) however the descriptive information in the corresponding EAD OAI record wasn't updated, even though the datestamp on the OAI record had changed.  Any idea as to why DC is being updated but not the EAD?  When I manually re-cache the description via command line it then updates the EAD.  Any idea as to why we are seeing this behaviour?  

Ideally we would like both EAD and DC OAI updated automatically without human intervention so that the DC can be harvested by our main library discovery interface Primo in order to keep that in sync with AtoM and also the EAD so that the Archives Hub can harvest that automatically to keep our records up-to-date in their system.  Therefore any help you can provide us with here would be very much appreciated.

Update:  It appears that overnight the jobs did all finally complete, with only a few failures on the arObjectMoveJob I'm not sure if this was an AtoM issue or possibly archivists trying to move same object twice when they didn't see it move initially.  The biggest problem at the moment is that some of these tasks took 9hrs to complete. This is ok for some tasks but not for a  arObjectMoveJob task. As mentioned above I was surprised that some jobs appeared to be impacted by another job, I thought each job had it's own resources.  Is this how it should be behaving?  If so is there a way we can configure this better?

Thanks,
Vicky
Digital Standards Manager
National Library of Wales

Dan Gillean

unread,
Apr 11, 2022, 11:11:40 AM4/11/22
to ICA-AtoM Users
Hi Vicky, 

Any ideas why a job which is being heavily used is affecting other jobs? Is there something we can do to prevent this from happening?

With the default setup included in the AtoM installation documentation, there's only one worker set up, so if a job is already running, any additional jobs will just be queued until the current job completes. 

It is possible with Gearman to set up multiple workers for parallelization. We haven't experimented with that much in AtoM yet, so you'll need to search the Gearman documentation and other online resources for tips. However, you should just be able to configure multiple workers in the configuration file, and Gearman should handle the rest. 

Keep in mind that there are some jobs that should never run in parallel, since they could lead to conflicts if changes are taking place in other jobs or via other user actions. We've tried to account for that in the code, and there are currently 2 jobs configured not to run in parallel - the arObjectMoveJob and the arFileImportJob


The following comments can be found in the code next to this: 

    /*
     * Parallel execution and retry time:
     *
     * In instances where two or more workers are set up, multiple jobs could run in parallel.
     * We want to avoid that in jobs that make sensitive changes to the nested set, like the
     * arObjectMoveJob and the import jobs. The Gearman job server doesn't include a built in
     * system to postpone/schedule jobs so, if multiple jobs from the $avoidParallelExecutionJobs
     * variable bellow are executed at the same time, the late ones will wait, retrying after
     * the amount of seconds indicated in $waitForRetryTime, until the previous ones are finished
     * or the maximun amount of tries ($maxTries) is reached. Due to the limitations of the Gearman
     * job server, the waiting jobs will block the workers executing them until they are ended.
     */


In a highly active site, it's possible there could be further conflicts to consider - one of the things we'd need to investigate further before officially supporting a multi-worker parallelization option in the official documentation. If you do experiment with this, it should be relatively easy to add additional jobs to this list where conflicts are found. 

Additionally, keep in mind that multiple workers set up on the same machine will be using the same resources - so if the bottleneck is memory for example, then adding additional workers may not resolve the issue (and may in fact increase the problem if one job is already consuming all available memory, and then a parallel job starts). 

What does the arUpdateEsIoDocumentsJob relate to?  An archivist spotted that this was running and updating descendants after he'd amended an archival description.  When I did some testing I noticed that this job appeared when editing a record that had the Name of creator field populated.  The archivist hadn't touched that field when editing, so does this job run whether you've altered the Name of creator field or not?

Yes, currently this job is executed anytime a save operation executes. At present, AtoM doesn't check what fields, if any were updated - so going into the template, changing nothing, and still hitting save would still trigger this job. 

It's a known issue that causes a lot of unnecessary churn for the job scheduler. There's an issue filed here: 
Unfortunately, it seems that properly detecting per-field changes, and correlating that to what information objects (descriptions) require an index update, can get surprisingly complex. Our team has started looking into this and made some progress with repository name updates (which are simpler), but because we also need to factor in actor histories and dates of existence (which can display on linked descriptions), relation to repository records (as a maintainer), relationships to access points (which are now browsable from a terms view page) and more, there's not an easy and efficient single method for handling this. Nonetheless, this is an issue we will continue to explore in search of an efficient solution. 

In the meantime, there are some related issues we're looking into as well, to help with job scheduler resource management and how ES updates are performed: 
Another thing I noticed having switched the Cache description XML exports upon creation/modification off was that the DC in the OAI-PMH was still being updated (which was a bit of a surprise to me, I didn't think OAI-PMH would be updated with Cache descriptions switched off) however the descriptive information in the corresponding EAD OAI record wasn't updated, even though the datestamp on the OAI record had changed.  Any idea as to why DC is being updated but not the EAD?  When I manually re-cache the description via command line it then updates the EAD.  Any idea as to why we are seeing this behaviour?  

I will have to do a bit of testing to see if I can reproduce this. However, with caching switched off, then both the EAD and the DC XML should be generated on-demand, synchronously in the browser (rather than loading a pre-generated version) - so both *should* be updating when caching is off. It's possible there's a bug where, if a cached EAD 2002 XML version is already available, AtoM continues to use it even when the setting is disabled. Either way, the behavior should be consistent for all XML, so I'll investigate. 

If you want to ensure that it's generated on the fly, you could manually delete the cached versions from the downloads directory. For more on where to find these, see: 

However, as you know, the caching was introduced as a workaround for very large hierarchies that time out in the browser before completion when attempting to generate a full hierarchical archival unit in EAD. I suspect the synchronous approach will not work for your institution, so hopefully we can get things working so that the automatic caching updates work when a previously cached description is modified, or a new description created. 

Hope this helps, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/d02c8a0a-c1c4-4b46-bd3e-bd99bfeb3cban%40googlegroups.com.

Vicky Phillips

unread,
Apr 13, 2022, 7:00:46 PM4/13/22
to AtoM Users
Thanks Dan.  I'm currently on leave at the moment but will speak with our developer when I'm back to see if we can test setting up multiple workers.

Thanks for your help,
Vicky
Reply all
Reply to author
Forward
0 new messages