500 error on php symfony jobs:clear

144 views
Skip to first unread message

Carolyn Sullivan

unread,
Oct 24, 2022, 3:09:48 PM10/24/22
to AtoM Users
Hello,

In my institution's atom instance, we have jobs from about a year ago that are STILL running, which I'm fairly certain is why we currently are getting 500 type errors.

I tried restarting the atom-worker and running php symfony jobs:clear  
When I do the latter, I get the following error:

pam_library@CWS-P125:/usr/share/nginx/atom$ php symfony jobs:clear

Unable to open the log file "/usr/share/nginx/atom/log/qubit_cli.log" for writin                                                                                                                                                             g.
<!DOCTYPE html>
<html>
  <head>
    <title>Error</title>
    <link rel="stylesheet" type="text/css" href="symfony/plugins/arDominionPlugi                                                                                                                                                             n/css/main.css"/>
  </head>
  <body class="yui-skin-sam admin error">

    <div id="wrapper" class="container">

      <section class="admin-message" id="error-404">

        <h2>
          <img alt="" src="symfony/images/logo.png"/>
          Oops! An Error Occurred
        </h2>

        <p>
          Sorry, something went wrong.<br />
          The server returned a <code>500 Internal Server Error</code>.
        </p>

        <div class="tips">
          <p>
            Try again a little later or ask in the <a href="http://groups.google                                                                                                                                                             .ca/group/ica-atom-users">discussion group</a>.<br />
            <a href="javascript:history.go(-1)">Back to previous page.</a>
          </p>
        </div>

      </section>

    </div>

  </body>
</html>

Can you please help me understand why this is happening and how to get rid of the error?

Thanks,
Carolyn.

Dan Gillean

unread,
Oct 24, 2022, 3:47:47 PM10/24/22
to ica-ato...@googlegroups.com
Hi Carolyn, 

Wow! No job in AtoM should run for that long. When the job scheduler runs out of resources (such as memory, etc) it can freeze in the "running" state - but usually that doesn't mean the job will ever finish if left to continue as-is, and often this means that other queued jobs will not run either. I suspect that part of the problem might arise from leaving your job scheduler in this state for so long while continuing to use the application. Given how many long-running processes rely on the job scheduler to complete, I am a bit worried about what else might not have terminated correctly. I will offer some suggestions to check about that below as well. 

In any case: 

The actual issue at hand sounds to me like a filesystem permissions issue. If your permissions are set correctly, then you may need to just be explicit about the user. During installation, we set the permissions like so: 
This ensures that the www-data user owns and can write to all files below the root AtoM installation directory. 

The first thing you can try would be to run the jobs:clear command again, but explicitly specifying the www-data user, like so: 
  • sudo -u www-data php symonfy jobs:clear
If that works, great! Remember that and try the same approach in the future if a common command is not working as expected - particularly when you see "could not open" or "could not write to" in the related error message. 

If not, then I would suggest that you try re-running the permissions command above, and then repeat the jobs:clear command as I have supplied it and see if it resolves things. 

If that does help to clear the queue, then in terms of resetting the job scheduler: 

In 2.5 and later, we have added a restart limit to the atom-worker configuration. By default, if the atom-worker runs into a problem that halts it, it will attempt to restart itself. However, to prevent this from causing an endless loop that consumes all resources when the worker encounters a problem that a restart won't resolve, the configuration sets a limit of 3 retry attempts in 24 hours before the limit is reached and must be manually reset. So, let's reset that fail counter, and THEN restart the job scheduler: 
Finally, in terms of general maintenance and health checking: 

I would strongly suggest that you run most of the general maintenance tasks in AtoM, as these can hopefully help resolve any as-yet undiagnosed issues arising from your job scheduler being frozen for a year. These include: 
I would also suggest that someone try running the SQL query we recommend to check for common forms of data corruption - in case any important long-running background operation that needed the job scheduler didn't properly complete in that time. See: 
Hopefully nothing will be amiss! But better to check now than to wait for an issue to surface. 

Good luck, and let us know how it goes! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/634353e2-d938-4c33-b4bc-382dc43b9356n%40googlegroups.com.

Carolyn Sullivan

unread,
Oct 24, 2022, 3:59:58 PM10/24/22
to AtoM Users
Oh gosh, thanks for all the advice and definitely going to take your advice on the maintenance!  I actually went into the SQL directory and killed the 26 idling jobs one by one, and it's functioning now, huzzah... but I definitely need to learn more about running this application!  Thanks again!
Reply all
Reply to author
Forward
0 new messages