atom-worker systemd service (Gearman?) errors

130 views
Skip to first unread message

Patrick Goetz

unread,
Jul 22, 2019, 9:09:52 AM7/22/19
to AtoM Users

I set up the systemd unit file for presumably the Gearman worker manager as per the instructions provided here: https://www.accesstomemory.org/en/docs/2.5/admin-manual/installation/asynchronous-jobs/#installation-asynchronous-jobs

I say presumably because there's no mention of Geraman in the unit file, so I'm not quite sure how this works, but just copied the text from the page above.

This worked at first, but a few days ago the editor configuring the site complained that she was getting internal server errors again when attempting to edit documents.

Upon running
# systemctl status atom-worker

I saw these error messages:
Jul 10 20:02:50 atom php[22284]: 2019-07-10 13:02:50 > Updating "Papers of Emeline Bowne 1902-1986 1922-1951" and descendant
Jul 10 20:02:50 atom php[22284]: 2019-07-10 13:02:50 > Job finished.
Jul 11 06:13:22 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted
Jul 11 06:13:22 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted
Jul 11 06:13:22 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted
Jul 11 06:13:22 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted
Jul 11 06:13:23 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted
Jul 11 06:13:23 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted
Jul 11 06:13:23 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted
Jul 22 12:33:26 atom systemd[1]: atom-worker.service: Failed to reset devices.list: Operation not permitted

Not sure what's going on here, and Google didn't help.  We are running atoM in an LXD container, and the only reference I can find has to do with Unprivileged containers attempting to modify the devices cgroup configuration, but I'm not sure why Gearman would be attempting to do something like this.

Restarting the service:
# systemctl restart atom-worker


Seems to resolve the issue, but I'd prefer not to have to continuously restart the service for the editors.


José Raddaoui

unread,
Jul 23, 2019, 12:20:11 PM7/23/19
to AtoM Users
Hi Patrick,

We're currently experiencing some issues in the normal AtoM worker configuration, which you could follow in this Redmine ticket.

I don't know a lot about LXD containers but I'm not sure if the "Failed to reset devices.list: Operation not permitted" error messages are actually causing the worker crash. I guess you already saw them, but I found some useful information in these links:


Best regards.

Patrick Goetz

unread,
Jul 24, 2019, 12:29:37 PM7/24/19
to AtoM Users

Hi José -

Thanks for responding.  The LXD container installation seems to be working most of the time, so I don't think this is a systemic problem related to that.  I'm not terribly familiar with symfony, so couldn't follow all the details.  For example when I try to run the command referenced a the top of the ticket I get:

root@atom:~# su - www-data
www
-data@atom:~$ php symfony jobs:worker
Could not open input file: symfony

But in any case, as mentioned in the bug report,  there's nothing at all in /var/log/gearman-job-server/gearmand.log, and even more oddly, when I look in the nginx logs it appears that the failed edits succeeded even though the editor got an internal server error when attempting to make these edits:

<snip>
127.0.0.1 - - [11/Jul/2019:20:37:13 +0000] "POST /index.php/bowne-emeline-1896-1993/edit HTTP/1.1" 500 878 "http://catalog.episcopalarchives.org/index.php/bowne-emeline-1896-1993/edit" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:15 +0000] "GET /index.php/bowne-emeline-1896-1993/edit HTTP/1.1" 200 19049 "http://catalog.episcopalarchives.org/index.php/bowne-emeline-1896-1993" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:16 +0000] "GET /index.php/term/add?taxonomy=%2Findex.php%2Factor-occupations&linkExisting=true HTTP/1.1" 200 5037 "http://catalog.episcopalarchives.org/index.php/bowne-emeline-1896-1993/edit" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:16 +0000] "GET /index.php/repository/add?linkExisting=true HTTP/1.1" 200 20351 "http://catalog.episcopalarchives.org/index.php/bowne-emeline-1896-1993/edit" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:16 +0000] "POST /index.php/user/clipboardStatus HTTP/1.1" 200 77 "http://catalog.episcopalarchives.org/index.php/bowne-emeline-1896-1993/edit" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:17 +0000] "GET /index.php/term/add?taxonomy=%2Findex.php%2Fm9rb-afgb-8r9h&linkExisting=true HTTP/1.1" 200 5016 "http://catalog.episcopalarchives.org/index.php/term/add?taxonomy=%2Findex.php%2Factor-occupations&linkExisting=true" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:17 +0000] "POST /index.php/user/clipboardStatus HTTP/1.1" 200 77 "http://catalog.episcopalarchives.org/index.php/term/add?taxonomy=%2Findex.php%2Factor-occupations&linkExisting=true" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:17 +0000] "POST /index.php/user/clipboardStatus HTTP/1.1" 200 77 "http://catalog.episcopalarchives.org/index.php/repository/add?linkExisting=true" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
127.0.0.1 - - [11/Jul/2019:20:37:18 +0000] "GET /index.php/term/add?taxonomy=%2Findex.php%2Fm9rb-afgb-8r9h&linkExisting=true HTTP/1.1" 200 5016 "http://catalog.episcopalarchives.org/index.php/term/add?taxonomy=%2Findex.php%2Fm9rb-afgb-8r9h&linkExisting=true" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
<snip>

I'm going to go ahead and upgrade from 2.5 to 2.5.1, although it sounds like this isn't going to affect this issue.

Dan Gillean

unread,
Jul 24, 2019, 12:39:19 PM7/24/19
to ICA-AtoM Users
Hi Patrick, 

Hopefully Radda can follow up with some further specific suggestions for you, but first a brief suggestion: 

Generally whenever I see the Could not open input file: symfony error reported by users, it is because the command is not being run from AtoM's root installation directory - in an installation following our recommended instructions, this is typically /usr/share/nginx/atom

All commands that begin with php symfony must be run from AtoM's root directory, where the Symfony code lives, in order to be run. If you are not already there, try changing directories and running the command again?

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/0990a6fc-8771-4898-bce2-720ada357a92%40googlegroups.com.

José Raddaoui

unread,
Jul 24, 2019, 1:22:13 PM7/24/19
to AtoM Users
Hi Patrick,

I can see the 500 error in the logs from the first POST request to "/index.php/bowne-emeline-1896-1993/edit". As you said, restarting the worker fixed the problem so it was probably down at that moment. We're working on improving the AtoM worker service configuration in the Redmine ticket I linked, you could try to improve that configuration by your self in the meantime by adding restart rules to the service file located in "/usr/lib/systemd/system/atom-worker.service" and restarting the service.

As Dan comments, to run any AtoM task you need to execute it from the AtoM folder. However, the task you're trying to execute will actually register a different (temporary) worker in the Gearman server, which will be able to take jobs while it's running, but it won't solve the issue in the other worker running as a service.

You may also find more information about why the worker failed with "sudo journalctl -u atom-worker", pressing enter until you get to the end.

Best regards.

Karl Goetz

unread,
Jul 24, 2019, 8:38:38 PM7/24/19
to José Raddaoui, ica-ato...@googlegroups.com
On Wed, 24 Jul 2019 10:22:12 -0700 (PDT)
José Raddaoui <jrad...@artefactual.com> wrote:

> Hi Patrick,
>
> I can see the 500 error in the logs from the first POST request to
> "/index.php/bowne-emeline-1896-1993/edit". As you said, restarting
> the worker fixed the problem so it was probably down at that moment.
> We're working on improving the AtoM worker service configuration in
> the Redmine ticket I linked, you could try to improve that
> configuration by your self in the meantime by adding restart rules to
> the service file located in
> "/usr/lib/systemd/system/atom-worker.service" and restarting the
> service.

Hi Jose,

I assume "adding restart rules" means changing the service
file so restart=no some other value?
What is the drawback which would make that a bad default?

The main reason I can come up with is a possible DoS on the server -
perhaps that could be mittigated by adding RestartSec.

Karl.

>
> As Dan comments, to run any AtoM task you need to execute it from the
> AtoM folder. However, the task you're trying to execute will actually
> register a different (temporary) worker in the Gearman server, which
> will be able to take jobs while it's running, but it won't solve the
> issue in the other worker running as a service.
>
> You may also find more information about why the worker failed with
> "sudo journalctl -u atom-worker", pressing enter until you get to the
> end.
>
> Best regards.
>
> On Monday, July 22, 2019 at 3:09:52 PM UTC+2, Patrick Goetz wrote:
> >
> >
> > I set up the systemd unit file for presumably the Gearman worker
> > manager as per the instructions provided here:
> > https://www.accesstomemory.org/en/docs/2.5/admin-manual/installation/asynchronous-jobs/#installation-asynchronous-jobs
> >
> > I say presumably because there's no mention of Geraman in the unit
> > file, so I'm not quite sure how this works, but just copied the
> > text from the page above.
> >
> > This worked at first, but a few days ago the editor configuring the
> > site complained that she was getting internal server errors again
> > when attempting to edit documents.
> >
> > Upon running
> > # systemctl status atom-worker
> >

> >
> > Restarting the service:
> > # systemctl restart atom-worker
> >
> >
> > Seems to resolve the issue, but I'd prefer not to have to
> > continuously restart the service for the editors.
> >

--
Karl Goetz
Technical Services Officer - eResearch, Information Technology Services
University of Tasmania & Tasmanian Partnership for Advanced Computing

Mail: University of Tasmania, Private Bag 69, Hobart, Tasmania 7001
Delivery: TT Flynn Street, Sandy Bay, Tasmania 7005



University of Tasmania Electronic Communications Policy (December, 2014).
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.

José Raddaoui

unread,
Jul 25, 2019, 11:40:17 AM7/25/19
to AtoM Users
Hi Karl,

Yes, that's correct. I'm not an expert on the matter, but I can't see any downside to set restart to `always` with an interval. The log may get big if the restart process gets into an infinite loop, but I think you could avoid that in the service configuration with a limit or configure journalctl to limit that log size. The proposed configuration from this thread seems like a good solution to me:


We also had respawn rules for the Upstart service:


I don't know why they were removed in the systemctl service or why the issue is happening more in this new version, but the latter may be related to the new dependencies recommended in this release (Bionic, MySQL 5.7, etc.).

Best regards.

On Monday, July 22, 2019 at 3:09:52 PM UTC+2, Patrick Goetz wrote:

Karl Goetz

unread,
Jul 25, 2019, 11:44:05 PM7/25/19
to ica-ato...@googlegroups.com
On Thu, 25 Jul 2019 08:40:16 -0700 (PDT)
José Raddaoui <jrad...@artefactual.com> wrote:

> Hi Karl,
>
> Yes, that's correct. I'm not an expert on the matter, but I can't see
> any downside to set restart to `always` with an interval. The log may
> get big if the restart process gets into an infinite loop, but I
> think you could avoid that in the service configuration with a limit
> or configure journalctl to limit that log size. The proposed
> configuration from this thread seems like a good solution to me:

[...]

Hi Jose,
I've repoted this as a bug in the atom bugtracker, hopefully it helps
give the issue some focus.

https://github.com/artefactual/atom/issues/933

thanks,
Karl.

> Best regards.
>
> On Monday, July 22, 2019 at 3:09:52 PM UTC+2, Patrick Goetz wrote:
> >
> >
> > I set up the systemd unit file for presumably the Gearman worker
> > manager as per the instructions provided here:
> > https://www.accesstomemory.org/en/docs/2.5/admin-manual/installation/asynchronous-jobs/#installation-asynchronous-jobs
> >
> > I say presumably because there's no mention of Geraman in the unit
> > file, so I'm not quite sure how this works, but just copied the
> > text from the page above.
> >


raddao...@gmail.com

unread,
Jul 26, 2019, 6:45:30 AM7/26/19
to AtoM Users
Thanks Karl!

I've related it with the Redmine ticket and I'll keep both updated.

Best regards.

Karl Goetz

unread,
Jul 28, 2019, 9:04:30 PM7/28/19
to raddao...@gmail.com, ica-ato...@googlegroups.com
On Fri, 26 Jul 2019 03:45:29 -0700 (PDT)
raddao...@gmail.com wrote:

> Thanks Karl!
>
> I've related it with the Redmine ticket and I'll keep both updated.
>
> Best regards.
>
> El lunes, 22 de julio de 2019, 15:09:52 (UTC+2), Patrick Goetz
> escribió:
> >
> >
> > I set up the systemd unit file for presumably the Gearman worker
> > manager as per the instructions provided here:

Thanks!
Reply all
Reply to author
Forward
0 new messages