500 error with Gearman: arUpdateEsIoDocumentsJob

189 views
Skip to first unread message

Steve Lapommeray

unread,
Nov 29, 2021, 2:02:11 PM11/29/21
to AtoM Users
Hello,

When trying to update an authority record, this error occurs: " No Gearman worker available that can handle the job arUpdateEsIoDocumentsJob"

Restarting the atom-worker does not resolve the issue. The gearmand and memcached services are running.

Deleting an authority record works successfully, but editing doesn't. This is on AtoM 2.6.4 under RHEL 7 and PHP 7.4.25.

Any idea what would cause this? Please let me know if you need more information.

Thanks,
Steve

Dan Gillean

unread,
Dec 1, 2021, 9:37:43 AM12/1/21
to ICA-AtoM Users
Hi Steve, 

FYI we don't test CentOS or RHEL installations and there are some known issues with PHP 7.4 in AtoM 2.6 (which we've addressed for 2.7) so it is possible there is an installation dependency issue at play, but I'm hoping it's something simpler than this. Let's try to rule out some of the easier solutions first. 

My first guess would be that the Elasticsearch service is unavailable. Can you try restarting it?
  • sudo systemctl restart elasticsearch
After that, you can try using the search:status task to check on the current status of your ES instance: 
If any of the entities are not fully indexed, try manually repopulating the search index, with: 
The job scheduler does have its own error log you can check - in Ubuntu the following command can be used; I'm not sure offhand if this will work the same in RHEL: 
Another thing to note in the section linked above: if you have followed our recommended installation instructions, the job scheduler has a fail counter to prevent it from being caught in an infinite loop of failing and trying to restart. This fail counter limits restart attempts to 3 times in 24 hours, before you must run the additional command to reset this counter:
  • sudo systemctl reset-failed atom-worker
So, if you've attempted to restart the atom-worker without checking its status, it's possible it hasn't actually restarted if the fail counter had reached its limit! 

Other standard maintenance tasks you could try that might help if nothing above has: 

Ensure the filesystem permissions are properly set: 
Rebuild the nested set to ensure that hierarchical relationships are properly stored in the relational database. When an actor is updated, a background job is automatically run to ensure that all related descriptions are updated and properly indexed. This could possibly fail if the nested set is corrupted. You can rebuild it with: 
Generate slugs - if one of your records timed out during the save process, then it's possible it was left without a slug in the database, which can cause unexpected failures. Running the following task with no additional option parameters will only generate slugs for records missing them: 
Clear the application cache and restart PHP-FPM - ensure you're seeing the latest and not a cached version of a record: 
It's always good practice when trying to troubleshoot an error to clear your browser cache as well, or else test in an incognito / private browser window. 

Let us know if any of that helps! And if not, please share any relevant information you find in the atom-worker logs. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/66797597-6330-4389-86d5-5cc08ac5901dn%40googlegroups.com.

Steve Lapommeray

unread,
Dec 7, 2021, 2:51:31 PM12/7/21
to AtoM Users
Hi Dan,

Thank you for your response. I'm working on getting permission to run the journalctl command on our server, but I did get some error messages while populating the search index.

"Couldn't find information object (id: XXXXXX)
Please, contact an administrator."

There were 5 of those messages. Are they anything to worry about? Is there something that I need to do to fix this?

Thanks,
Steve

Dan Gillean

unread,
Dec 7, 2021, 4:19:35 PM12/7/21
to ICA-AtoM Users
Hi Steve, 

Unfortunately, yes - this may indicate that there is some data corruption affecting these records that you'll need to address. See: 
In our Troubleshooting guide, we have some queries that can be used to identify the most common forms of data corruption in descriptions, and suggestions on how to resolve the issue. See: 
As always, I strongly recommend making a backup of your database before trying any of the INSERT queries we recommend to resolve some of the issues, depending on what you find! 

In all, there may still be more than one issue occurring here - fixing these corrupted information objects may not automatically fix the arUpdateEsIoDocumentsJob issue that started this post. However, addressing this now will certainly help you avoid problems in the future. Let me know what you find and how things go (and if you get access to the atom-worker journal output), and we can go from there. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Steve Lapommeray

unread,
Dec 8, 2021, 11:37:36 AM12/8/21
to AtoM Users
Hi Dan,

Here is the atom-worker journal output:

-- Logs begin at Thu 2021-11-25 05:24:22 EST. --
Dec 06 16:42:25 <server> systemd[1]: Started Gearman worker for AtoM.
Dec 06 16:42:25 <server> su[85641]: (to nginx) root on none
Dec 07 13:41:17 <server> systemd[1]: Stopping Gearman worker for AtoM...
Dec 07 13:41:17 <server> atom-worker[127730]: kill: sending signal to 85643 failed: No such process
Dec 07 13:41:17 <server> systemd[1]: atom-worker.service: control process exited, code=exited status=1
Dec 07 13:41:17 <server> systemd[1]: Stopped Gearman worker for AtoM.
Dec 07 13:41:17 <server> systemd[1]: Unit atom-worker.service entered failed state.
Dec 07 13:41:17 <server> systemd[1]: atom-worker.service failed.
Dec 07 13:41:20 <server> systemd[1]: Started Gearman worker for AtoM.
Dec 07 13:41:20 <server> su[127751]: (to nginx) root on none

Do you see anything there?

Thanks,
Steve

Steve Lapommeray

unread,
Dec 8, 2021, 12:06:03 PM12/8/21
to AtoM Users
Hi Dan,

The troubleshooting steps resolved the issue. Thank you very much!

Steve

Reply all
Reply to author
Forward
0 new messages