3.18.1: duplicate cf-exexd processes

44 views
Skip to first unread message

Xander Cage

unread,
Feb 7, 2022, 8:11:01 AM2/7/22
to help-cfengine

hi,

since upgrading to 3.18.1 aix watchdog triggers every 5 minutes on duplicate cf-execd
processes. this is an all upgraded systems.

root@aixgkvdbp01c1: /var/cfengine # cat watchdog.log
Mon Feb  7 14:04:00 CET 2022 Initiating watchdog 37355794
Mon Feb  7 14:04:00 CET 2022 Found        2 cf-execd processes running
Mon Feb  7 14:04:00 CET 2022 Found 1 symptoms, threshold (0) breached.
Mon Feb  7 14:04:01 CET 2022 Initiating apoptosis
Mon Feb  7 14:04:02 CET 2022 Initiating anastasis
Mon Feb  7 14:05:00 CET 2022 Initiating watchdog 42205442
Mon Feb  7 14:05:00 CET 2022 Aborting execution of watchdog 42205442, existing watchdog process 37355794 running
Mon Feb  7 14:05:07 CET 2022 Collected artifacts stored in /var/cfengine/watchdog-archives/2022-02-07_1644239107.tar.gz
Mon Feb  7 14:05:07 CET 2022 DONE watchdog 37355794

root@aixgkvdbp01c1: /var/cfengine/watchdog-archives # ls -ltr
total 119120
-rw-r--r--    1 root     system       186340 Feb 07 13:30 2022-02-07_1644237011.tar.gz
-rw-r--r--    1 root     system       132066 Feb 07 13:34 2022-02-07_1644237252.tar.gz
-rw-r--r--    1 root     system       132231 Feb 07 13:40 2022-02-07_1644237608.tar.gz
-rw-r--r--    1 root     system       129700 Feb 07 13:44 2022-02-07_1644237853.tar.gz
-rw-r--r--    1 root     system       130020 Feb 07 13:49 2022-02-07_1644238151.tar.gz
-rw-r--r--    1 root     system       130122 Feb 07 13:54 2022-02-07_1644238453.tar.gz
-rw-r--r--    1 root     system       128401 Feb 07 13:59 2022-02-07_1644238749.tar.gz
-rw-r--r--    1 root     system       134926 Feb 07 14:05 2022-02-07_1644239107.tar.gz

any idea?

wbr

chris

Nicolas Charles

unread,
Feb 7, 2022, 8:16:26 AM2/7/22
to Xander Cage, help-cfengine

Hi Xander,

I've noticed that as well, it is introduced by the following change in 3.18

 cf-execd now runs cf-agent from a child process instead of a
      thread on POSIX systems (ENT-6182)
Regards
Nicolas
--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/help-cfengine/d9f6431e-c269-41d5-b60f-a1054c208b5dn%40googlegroups.com.

Xander Cage

unread,
Feb 7, 2022, 8:21:19 AM2/7/22
to help-cfengine
ah...ok,  a classic *g*, will keep the watchdog disabled until someone from the cfengine team replies.

Xander Cage

unread,
Feb 8, 2022, 2:30:17 AM2/8/22
to help-cfengine

vratislav...@northern.tech

unread,
Feb 8, 2022, 8:30:31 AM2/8/22
to help-c...@googlegroups.com
On Mon, 2022-02-07 at 23:30 -0800, Xander Cage wrote:
> ticket created:
>
> https://tracker.mender.io/browse/CFE-3915
Thanks!

--
Vratislav
> https://groups.google.com/d/msgid/help-cfengine/026a4442-1f51-488d-bc4f-75e35c69ae3bn%40googlegroups.com
> .

Nick Anderson

unread,
Feb 9, 2022, 10:48:51 AM2/9/22
to help-cfengine

Marco Marongiu

unread,
Mar 13, 2022, 4:57:00 PM3/13/22
to help-c...@googlegroups.com

Hello there

I am hitting a related problem in Debian 11.

I have just ported by policies from 3.15.5 to 3.18.1. I got both CFEngine and systemd complaining about the duplicated process so I applied Nick's patch, and now I have a different problem.

I have not completely understood what's going on. Is it cf-agent that starts a new cf-execd? Or is it cf-execd that clones itself and then spawns cf-agent, but then doesn't terminate? Whatever it is, when the new agent run starts, one cf-execd is killed (by what?), then the agent starts, and then systemd complains again about the duplicated cf-execd... I am trying to figure out, but my brain is already in bed tonight.

Anyone who knows better about the changes around cf-execd in 3.18 and understands what the heck is going on here? And, especially, how to fix this?


Mar 13 21:43:54 minardi systemd[1]: cf-execd.service: Main process exited, code=killed, status=9/KILL Mar 13 21:43:54 minardi systemd[1]: cf-execd.service: Failed with result 'signal'. Mar 13 21:43:54 minardi systemd[1]: cf-execd.service: Unit process 60482 (cf-execd) remains running after unit stopped. Mar 13 21:43:54 minardi systemd[1]: cf-execd.service: Unit process 60483 (sh) remains running after unit stopped. Mar 13 21:43:54 minardi systemd[1]: cf-execd.service: Unit process 60503 (cf-agent) remains running after unit stopped. Mar 13 21:43:54 minardi systemd[1]: cf-execd.service: Consumed 1.083s CPU time. Mar 13 21:43:54 minardi cf-agent[60503]: CFEngine(agent)  R: [INFO abort_agent_if_disabled/methods] The agent is starting. To disable, create flag file /etc/cfengine/disable Mar 13 21:43:54 minardi cf-agent[60503]: CFEngine(agent)  R: [INFO main/methods] Promises version 5.28.2 from branch master Mar 13 21:43:54 minardi cf-agent[60503]: CFEngine(agent)  R: [INFO main/methods] Our policy server is 52.211.82.124 Mar 13 21:43:55 minardi cf-agent[60503]: CFEngine(agent)  R: [INFO apply_roles/methods] Applying roles: standard_node, cfdevnode, home_pc. Mar 13 21:44:00 minardi cf-agent[60503]: CFEngine(agent)  R: [INFO main/methods] Custom configuration completed Mar 13 21:44:04 minardi systemd[1]: cf-execd.service: Scheduled restart job, restart counter is at 2. Mar 13 21:44:04 minardi systemd[1]: Stopped CFEngine Execution Scheduler. Mar 13 21:44:04 minardi systemd[1]: cf-execd.service: Consumed 4.924s CPU time. Mar 13 21:44:04 minardi systemd[1]: cf-execd.service: Found left-over process 60518 (cf-execd) in control group while starting unit. Ignoring. Mar 13 21:44:04 minardi systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 13 21:44:04 minardi systemd[1]: Started CFEngine Execution Scheduler.

Ciao!
-- bronto


Xander Cage

unread,
Mar 14, 2022, 4:26:21 AM3/14/22
to help-cfengine
before 3.18 cf-execd runs the agent in an thread, now cf-execd replicates itself in a new process (hence duplicate processes) and starts the agent from this process, at least in my understanding.

vratislav...@northern.tech

unread,
Mar 15, 2022, 7:08:30 AM3/15/22
to help-c...@googlegroups.com
On Mon, 2022-03-14 at 01:26 -0700, Xander Cage wrote:
> before 3.18 cf-execd runs the agent in an thread, now cf-execd replicates itself in a new process
> (hence duplicate processes) and starts the agent from this process, at least in my understanding.
And it is correct. cf-execd forks (duplicates) itself and the forked (child) process spawns cf-agent
and terminates. However, terminated processes need to be reaped by their parent processes and the
original (parent) cf-execd process only does that once per minute. So every time cf-execd spawns
cf-agent, there's a dead/inactive/zombie/defunct cf-execd for a minute.

Hope the bit of extra explanation helps.

--
Vratislav
> --
> You received this message because you are subscribed to the Google Groups "help-cfengine" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> help-cfengin...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/help-cfengine/fd7a9d2e-dca5-4f48-b5c8-17d070e3dec1n%40googlegroups.com
> .

Marco Marongiu

unread,
Mar 15, 2022, 8:26:11 AM3/15/22
to Vratislav Podzimek, help-cfengine
Hi Vratislav, all


Hope the bit of extra explanation helps.


yes, it helps, thank you.

But...

As you know, systemd is a bit bossy and likes to have everything under its control. This unexpected process that pops up and later disappears upsets it a tiny bit, but enough that it feels compelled to spew log messages all around about this process that, according to its opinion, should not be there.

But it may not be just about spewing log messages. In fact, I am not sure that systemd is just content of "observing" that there is a process that should not be there. If you look at the logs I posted, you will see that systemd forces a restart of cf-execd, probably to try and rectify a situation that doesn't match systemd's own desired state. On my laptop, cf-execd was restarted tens of times in a few hours. Although harmless (or is it?), that's not a desirable behaviour.

I would suggest that this change also requires some tweaks in the systemd unit files, what do you think?

Ciao
-- bronto

vratislav...@northern.tech

unread,
Mar 15, 2022, 2:20:26 PM3/15/22
to help-cfengine
On Tue, 2022-03-15 at 13:25 +0100, Marco Marongiu wrote:
>
> But...
>
> As you know, systemd is a bit bossy and likes to have everything under its control. This
> unexpected process that pops up and later disappears upsets it a tiny bit, but enough that it
> feels compelled to spew log messages all around about this process that, according to its opinion,
> should not be there.
>
> But it may not be just about spewing log messages. In fact, I am not sure that systemd is just
> content of "observing" that there is a process that should not be there. If you look at the logs I
> posted, you will see that systemd forces a restart of cf-execd, probably to try and rectify a
> situation that doesn't match systemd's own desired state. On my laptop, cf-execd was restarted
> tens of times in a few hours. Although harmless (or is it?), that's not a desirable behaviour.
>
> I would suggest that this change also requires some tweaks in the systemd unit files, what do you
> think?
I tried installing and bootstrapping a Debian 11 host with CFEngine 3.18.1 (Enterprise) and I don't
see any problem:

root@ip-172-31-23-195:~# systemctl status cf-execd
● cf-execd.service - CFEngine Execution Scheduler
Loaded: loaded (/lib/systemd/system/cf-execd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-03-15 17:42:53 UTC; 26min ago
Main PID: 931 (cf-execd)
Tasks: 1 (limit: 17979)
Memory: 6.6M
CPU: 7.314s
CGroup: /system.slice/cf-execd.service
└─931 /var/cfengine/bin/cf-execd --no-fork

Mar 15 17:42:53 ip-172-31-23-195 systemd[1]: Started CFEngine Execution Scheduler.

root@ip-172-31-23-195:~# journalctl -u cf-execd.service
-- Journal begins at Tue 2022-03-15 17:38:48 UTC, ends at Tue 2022-03-15 18:08:53 UTC. --
Mar 15 17:42:53 ip-172-31-23-195 systemd[1]: Started CFEngine Execution Scheduler.

systemd is actually fine with daemons forking and reaping their child processes. The
cf-execd.service says:

[Service]
Type=simple
ExecStart=/var/cfengine/bin/cf-execd --no-fork
Restart=always
RestartSec=10
KillMode=process

so systemd knows it should care about the /var/cfengine/bin/cf-execd process itself and cf-execd is
told to NOT fork when it starts (i.e. to run in foreground).

What I've seen in your logs was that something else was force-killing cf-execd and then systemd
complained there was a child process left over. That's because cf-execd was force-killed and could
not reap its child process. My bet would be that it was cf-agent killing the cf-execd process
because it was running with a policy saying that there should only be a single cf-execd process
running. With 3.18.1 MPF policy, cf-agent runs expect 1 or 2 cf-execd processes, systemd expects the
main process to be running (and forking) and the service is just running fine (as can be seen
above).

--
Vratislav
Reply all
Reply to author
Forward
0 new messages