Hi Isabel,
It sounds like one of the jobs has exhausted all available resources and stalled. There are two ways you can deal with this.
Option 1 - kill a specific stalled job; keep the queue
If you want to keep all the other jobs in the queue that are waiting behind the stalled one, then you will need to use SQL to kill only the specific problem job. This involves accessing the MySQL command prompt, which in turn means knowing the credentials used during installation. The following link has information on how to find those credentials again if needed, as well as how to access the MySQL command prompt:
From there, you can now use SQL to first look up the ID of the job that has stalled, and then use that ID to kill the specific job:
You will likely want to reset the atom-worker fail counter, and then restart the worker after killing the stalled job - instructions for that are included at the end of the section linked above, as well as here:
More explanation of what the fail counter is, and how to check the status of the atom-worker, is also included below in a separate section, for reference.
At this point, the job scheduler *should* pick up other jobs that were in the queue and continue with them. Keep in mind that you may still run into issues - for example, if one of the queued jobs was to publish a series that has since been deleted, etc... In that case, depending on what the issue is, you can either try restarting the job scheduler again or (as in the example I gave, where the job points to records that no longer exist) use the same process as above to kill the problem job.
Option 2 - clear the entire queue
The second option is much easier, but will also clear ALL jobs - not just those unfinished jobs in the queue, but also the history of completed jobs shown in the user interface. This means that you would need to manually re-run whatever process initiated the previously queued jobs for them to run again - for example, if one of them was a job to update the publication status of a bunch of descendant records, then you'd likely need to go update the publication status of the parent series or fonds again.
If you want to preserve the history of the previously completed jobs, AtoM does have a CSV export for job history in the user interface that you can use first -see:
To clear all jobs:
Run the following command from AtoM's root installation directory:
Fail counter and checking the worker status
You can always check the status of the atom-worker and make sure it's running properly with the following command:
- sudo systemctl status atom-worker
Restarting is similar:
- sudo systemctl restart atom-worker
If the worker isn't running after a restart, this probably means that the fail counter limit has been reached. The atom-worker will automatically try to restart and then repeat a job after a failure, but to prevent the worker from getting caught in an infinite loop when an issue can't be resolved this way, a limit is added in the configuration: after 3 attempts in 24 hours or less, then the fail limit has been reached, and the fail counter needs to be reset to zero before restarting will work:
- sudo systemctl reset-failed atom-worker
This command sets the internal fail counter back to zero, so restarting the atom-worker should work again after running this.
Hope this helps!