500 error with Gearman: arUpdateEsIoDocumentsJob

Steve Lapommeray

unread,

Nov 29, 2021, 2:02:11 PM11/29/21

to AtoM Users

Hello,

When trying to update an authority record, this error occurs: " No Gearman worker available that can handle the job arUpdateEsIoDocumentsJob"

Restarting the atom-worker does not resolve the issue. The gearmand and memcached services are running.

Deleting an authority record works successfully, but editing doesn't. This is on AtoM 2.6.4 under RHEL 7 and PHP 7.4.25.

Any idea what would cause this? Please let me know if you need more information.

Thanks,

Steve

Dan Gillean

unread,

Dec 1, 2021, 9:37:43 AM12/1/21

to ICA-AtoM Users

Hi Steve,

FYI we don't test CentOS or RHEL installations and there are some known issues with PHP 7.4 in AtoM 2.6 (which we've addressed for 2.7) so it is possible there is an installation dependency issue at play, but I'm hoping it's something simpler than this. Let's try to rule out some of the easier solutions first.

My first guess would be that the Elasticsearch service is unavailable. Can you try restarting it?

sudo systemctl restart elasticsearch

After that, you can try using the search:status task to check on the current status of your ES instance:

php symfony search:status
See: https://www.accesstomemory.org/docs/latest/admin-manual/maintenance/cli-tools/#check-the-status-of-your-elasticsearch-index

If any of the entities are not fully indexed, try manually repopulating the search index, with:

php symfony search:populate
See: https://www.accesstomemory.org/docs/latest/admin-manual/maintenance/populate-search-index/

The job scheduler does have its own error log you can check - in Ubuntu the following command can be used; I'm not sure offhand if this will work the same in RHEL:

sudo journalctl -f -u atom-worker
See the bottom part of this section: https://www.accesstomemory.org/docs/latest/admin-manual/installation/asynchronous-jobs/#systemd-ubuntu-18-04

Another thing to note in the section linked above: if you have followed our recommended installation instructions, the job scheduler has a fail counter to prevent it from being caught in an infinite loop of failing and trying to restart. This fail counter limits restart attempts to 3 times in 24 hours, before you must run the additional command to reset this counter:

sudo systemctl reset-failed atom-worker

So, if you've attempted to restart the atom-worker without checking its status, it's possible it hasn't actually restarted if the fail counter had reached its limit!

Other standard maintenance tasks you could try that might help if nothing above has:

Ensure the filesystem permissions are properly set:

sudo chown -R www-data:www-data /usr/share/nginx/atom
See: https://www.accesstomemory.org/docs/latest/admin-manual/installation/linux/ubuntu-bionic/#filesystem-permissions

Rebuild the nested set to ensure that hierarchical relationships are properly stored in the relational database. When an actor is updated, a background job is automatically run to ensure that all related descriptions are updated and properly indexed. This could possibly fail if the nested set is corrupted. You can rebuild it with:

php symfony propel:build-nested-set
See: https://www.accesstomemory.org/docs/latest/admin-manual/maintenance/cli-tools/#rebuild-the-nested-set

Generate slugs - if one of your records timed out during the save process, then it's possible it was left without a slug in the database, which can cause unexpected failures. Running the following task with no additional option parameters will only generate slugs for records missing them:

php symfony propel:generate-slugs
See: https://www.accesstomemory.org/docs/latest/admin-manual/maintenance/cli-tools/#generate-slugs

Clear the application cache and restart PHP-FPM - ensure you're seeing the latest and not a cached version of a record:

It's always good practice when trying to troubleshoot an error to clear your browser cache as well, or else test in an incognito / private browser window.

Let us know if any of that helps! And if not, please share any relevant information you find in the atom-worker logs.

Cheers,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056

@accesstomemory

he / him

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/66797597-6330-4389-86d5-5cc08ac5901dn%40googlegroups.com.

Steve Lapommeray

unread,

Dec 7, 2021, 2:51:31 PM12/7/21

to AtoM Users

Hi Dan,

Thank you for your response. I'm working on getting permission to run the journalctl command on our server, but I did get some error messages while populating the search index.

"Couldn't find information object (id: XXXXXX)
Please, contact an administrator."

There were 5 of those messages. Are they anything to worry about? Is there something that I need to do to fix this?

Thanks,

Steve

Dan Gillean

unread,

Dec 7, 2021, 4:19:35 PM12/7/21

to ICA-AtoM Users

Hi Steve,

Unfortunately, yes - this may indicate that there is some data corruption affecting these records that you'll need to address. See:

https://www.accesstomemory.org/en/docs/2.6/admin-manual/maintenance/troubleshooting/#why-do-i-get-warnings-when-populating-the-search-index

In our Troubleshooting guide, we have some queries that can be used to identify the most common forms of data corruption in descriptions, and suggestions on how to resolve the issue. See:

https://www.accesstomemory.org/docs/latest/admin-manual/maintenance/troubleshooting/#troubleshooting-data-corruption

As always, I strongly recommend making a backup of your database before trying any of the INSERT queries we recommend to resolve some of the issues, depending on what you find!

In all, there may still be more than one issue occurring here - fixing these corrupted information objects may not automatically fix the arUpdateEsIoDocumentsJob issue that started this post. However, addressing this now will certainly help you avoid problems in the future. Let me know what you find and how things go (and if you get access to the atom-worker journal output), and we can go from there.

Cheers,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056

@accesstomemory

he / him

To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/14696fa9-1b17-4de6-bcd0-72237f6bd1aen%40googlegroups.com.

Steve Lapommeray

unread,

Dec 8, 2021, 11:37:36 AM12/8/21

to AtoM Users

Hi Dan,

Here is the atom-worker journal output:

-- Logs begin at Thu 2021-11-25 05:24:22 EST. --
Dec 06 16:42:25 <server> systemd[1]: Started Gearman worker for AtoM.
Dec 06 16:42:25 <server> su[85641]: (to nginx) root on none
Dec 07 13:41:17 <server> systemd[1]: Stopping Gearman worker for AtoM...
Dec 07 13:41:17 <server> atom-worker[127730]: kill: sending signal to 85643 failed: No such process
Dec 07 13:41:17 <server> systemd[1]: atom-worker.service: control process exited, code=exited status=1
Dec 07 13:41:17 <server> systemd[1]: Stopped Gearman worker for AtoM.
Dec 07 13:41:17 <server> systemd[1]: Unit atom-worker.service entered failed state.
Dec 07 13:41:17 <server> systemd[1]: atom-worker.service failed.
Dec 07 13:41:20 <server> systemd[1]: Started Gearman worker for AtoM.
Dec 07 13:41:20 <server> su[127751]: (to nginx) root on none

Do you see anything there?

Thanks,

Steve

Steve Lapommeray

unread,

Dec 8, 2021, 12:06:03 PM12/8/21

to AtoM Users

Hi Dan,

The troubleshooting steps resolved the issue. Thank you very much!

Steve

Reply all

Reply to author

Forward