Hi Carolyn,
Wow! No job in AtoM should run for that long. When the job scheduler runs out of resources (such as memory, etc) it can freeze in the "running" state - but usually that doesn't mean the job will ever finish if left to continue as-is, and often this means that other queued jobs will not run either. I suspect that part of the problem might arise from leaving your job scheduler in this state for so long while continuing to use the application. Given how many long-running processes rely on the job scheduler to complete, I am a bit worried about what else might not have terminated correctly. I will offer some suggestions to check about that below as well.
In any case:
The actual issue at hand sounds to me like a filesystem permissions issue. If your permissions are set correctly, then you may need to just be explicit about the user. During installation, we set the permissions like so:
This ensures that the www-data user owns and can write to all files below the root AtoM installation directory.
The first thing you can try would be to run the jobs:clear command again, but explicitly specifying the www-data user, like so:
- sudo -u www-data php symonfy jobs:clear
If that works, great! Remember that and try the same approach in the future if a common command is not working as expected - particularly when you see "could not open" or "could not write to" in the related error message.
If not, then I would suggest that you try re-running the permissions command above, and then repeat the jobs:clear command as I have supplied it and see if it resolves things.
If that does help to clear the queue, then in terms of resetting the job scheduler:
In 2.5 and later, we have added a restart limit to the atom-worker configuration. By default, if the atom-worker runs into a problem that halts it, it will attempt to restart itself. However, to prevent this from causing an endless loop that consumes all resources when the worker encounters a problem that a restart won't resolve, the configuration sets a limit of 3 retry attempts in 24 hours before the limit is reached and must be manually reset. So, let's reset that fail counter, and THEN restart the job scheduler:
Finally, in terms of general maintenance and health checking:
I would strongly suggest that you run most of the general maintenance tasks in AtoM, as these can hopefully help resolve any as-yet undiagnosed issues arising from your job scheduler being frozen for a year. These include:
I would also suggest that someone try running the SQL query we recommend to check for common forms of data corruption - in case any important long-running background operation that needed the job scheduler didn't properly complete in that time. See:
Hopefully nothing will be amiss! But better to check now than to wait for an issue to surface.
Good luck, and let us know how it goes!