standard_services not idempotent on Linux with systemd?

144 views
Skip to first unread message

Marco Marongiu

unread,
Jul 3, 2020, 8:52:01 AM7/3/20
to help-cfengine

Hello there

TL;DR: I have encountered a weird behaviour of services promises on Debian Linux 10, and was wondering if that's a general problem on all systemd-based Linux distribution or it's rather peculiar to Debian 10 or my setup. What I see is that services promises with policy start, stop, enable or disable are not idempotent and the corresponding systemctl commands are issued over and over at each agent run.

Those who are interested in the details of my findings and investigations can continue reading.

Ciao
bronto



I have been playing with services promises lately on Debian 10 systems (systemd), focusing on which services should absolutely be running or not running (systemctl start/stop), and which should/shouldn't be activated at boot (enable/disable). All these states have in common is that you expect them to be idempotent: if a service is enabled, you don't want CFEngine to issue commands to enable it at every agent run; if a service is running, you don't want CFEngine to issue a command to start it. You got the idea.

Initially, I was just using the standard services promises/bundles as provided by CFEngine. To my surprise, they were not idempotent: e.g. the ssh service was enabled and running, but the services promise would still hammer the system at every run with systemctl start ssh and systemctl enable ssh.

Initially, I didn't investigate this behaviour in depth, believing that CFEngine was issuing the commands at every run to avoid an easy race condition(*), so I started implementing my own checks using systemctl is-active/is-enabled and applying the services promise only if that was actually the case. It started becoming sort of a rabbit hole, and I realised that I was better off creating a custom service_method for systemd services. To do that, I decided to build upon the code I had already written, and the code of bundle agent standard_services.

I started looking into bundle agent standard_services, and noticed some interesting facts:

  • standard_services actually tries to be idempotent, checking the state of a service through systemctl show before taking any action, but something doesn't work in there and systemctl start/stop/enable/disable commands are issued anyway;
  • standard_services doesn't use is-active/is-enabled to determine the current state of the system; only is-enabled is used as a fall back, and the promise refers to CFE-2923; the issue in jira is not accessible for a user with reduced privileges (like mine) so I can't look into the details there;
  • the service focuses on start, stop, reload, restart, while I am focusing on start, stop, enable, disable;
  • start/stop implicitly trigger enable/disable respectively; I can understand where this comes from, but I'd rather like the promises do what they mean and not have any implicit side effect: start must mean start, not start and enable; if I want to start and enable, I'd rather promise both things explicitly.

So I am running a custom built bundle agent systemd_services now, but I'm really interested in understanding if the behaviour I am seeing in standard_services is due to a bug and, in that case, if the bug is debian specific, or if something else is wrong. I can't see that the systemd part of the code of standard_services has changed a lot compared to CFEngine 3.7, so could it be that the output of the systemctl show command has changed significantly enough that the bundle doesn't work correctly any more and needs an update?

(*) if you check if an action should be performed before actually performing it, there is an interval of time where the state of the service could change; in these cases, you're better off skipping the check and performing the action regardless, if it's safe to do so.


craig.c...@northern.tech

unread,
Jul 6, 2020, 11:20:08 AM7/6/20
to help-cfengine
Hello,

Which version are you working with?

Replying to your questions:
  • standard_services actually tries to be idempotent, checking the state of a service through systemctl show before taking any action, but something doesn't work in there and systemctl start/stop/enable/disable commands are issued anyway;
Can you re-run with --debug mode and check the output? Here is an example for a service that seems to work well for me on debian10:

```
   debug: Evaluating function: execresult("$(call_systemctl) $(systemd_properties) show $(service)","noshell")
   debug: GetExecOutput got 'LoadState=loaded
ActiveState=active
UnitFileState=enabled
CanStart=yes
CanStop=yes
CanReload=no'
```
  • standard_services doesn't use is-active/is-enabled to determine the current state of the system; only is-enabled is used as a fall back, and the promise refers to CFE-2923; the issue in jira is not accessible for a user with reduced privileges (like mine) so I can't look into the details there;
The ticket you mention, CFE-2923, was migrated to a private ticket, sorry about that. The gist of that is that non-native services don't work quite right and so `systemctl is-enabled <service>` must be used in the policy instead of the existing `systemctl show <service>`. So if you are working with non-native services this may be an issue to fix.
  • the service focuses on start, stop, reload, restart, while I am focusing on start, stop, enable, disable;
  • start/stop implicitly trigger enable/disable respectively; I can understand where this comes from, but I'd rather like the promises do what they mean and not have any implicit side effect: start must mean start, not start and enable; if I want to start and enable, I'd rather promise both things explicitly.
Right, start/stop are linked to enable/disable. This is something that could be changed as desired. Would be great if you want to create a ticket and discuss the change.



Can you include some logs of your initial policy and results?

I tried the following autorun service and it seemed to properly detect "active" state and not try to restart the service every agent run.

```
bundle agent my_services
{
  meta:
    "tags" slist => { "autorun" };

  vars:
    "services" slist => { "chrony", "snmpd" };
  services:
    "$(services)"
      service_policy => "start";
}
```

There are some reports promises you can get in `--verbose` agent run mode.

If I stop snmpd and run the agent:

```
# systemctl stop snmpd
# cf-agent -K --verbose > ~/log
# grep snmp ~/log
 verbose: P:    Promiser/affected object: 'snmpd'
 verbose: P:    Stack path: /default/autorun/methods/'autorun'/default/services/services/'snmpd'[2]
 verbose: P:    Promiser/affected object: 'snmpd'
 verbose: P:    Stack path: /default/autorun/methods/'autorun'/default/services/services/'snmpd'[2]
 verbose: B: BEGIN bundle standard_services( {"snmpd","start"})
 verbose: execresult ran '/bin/systemctl --no-ask-password --global --system -pLoadState,CanStop,UnitFileState,ActiveState,LoadState,CanStart,CanReload show snmpd' successfully
 verbose: P:    From parameterized bundle: standard_services( {"snmpd","start"})
 verbose: P:    Stack path: /default/autorun/methods/'autorun'/default/services/services/'snmpd'/default/standard_services/commands/'/bin/systemctl --no-ask-password --global --system -q start snmpd'[1]
    info: Executing 'no timeout' ... '/bin/systemctl --no-ask-password --global --system -q start snmpd'
 verbose: Finished command related to promiser '/bin/systemctl --no-ask-password --global --system -q start snmpd' -- succeeded
    info: Completed execution of '/bin/systemctl --no-ask-password --global --system -q start snmpd'
 verbose: P:    From parameterized bundle: standard_services( {"snmpd","start"})
 verbose: P:    Stack path: /default/autorun/methods/'autorun'/default/services/services/'snmpd'/default/standard_services/reports/'standard_services: using systemd layer to start snmpd'[1]
R: standard_services: using systemd layer to start snmpd
 verbose: P: END services promise (snmpd)
```

A second run after that shows that the policy doesn't try to start the already running service.

```
# cf-agent -K --verbose > ~/log
# grep snmp ~/log
 verbose: P:    Promiser/affected object: 'snmpd'
 verbose: P:    Stack path: /default/autorun/methods/'autorun'/default/services/services/'snmpd'[2]
 verbose: P:    Promiser/affected object: 'snmpd'
 verbose: P:    Stack path: /default/autorun/methods/'autorun'/default/services/services/'snmpd'[2]
 verbose: B: BEGIN bundle standard_services( {"snmpd","start"})
 verbose: execresult ran '/bin/systemctl --no-ask-password --global --system -pLoadState,CanStop,UnitFileState,ActiveState,LoadState,CanStart,CanReload show snmpd' successfully
 verbose: P:    From parameterized bundle: standard_services( {"snmpd","start"})
 verbose: P:    Stack path: /default/autorun/methods/'autorun'/default/services/services/'snmpd'/default/standard_services/reports/'standard_services: using systemd layer to start snmpd'[1]
R: standard_services: using systemd layer to start snmpd
 verbose: P: END services promise (snmpd)
```

Notice that the report in the second run still says "standard_services: using systemd layer to start snmpd". This is just an informational message about what mechanism would be used to manage the service, not that it was actually attempted to be started with the systemctl command.

I would be curious if you get similar results with the same or similar policy.

Maybe the services you are managing have different behavior?

Thanks,
Craig

Nick Anderson

unread,
Jul 8, 2020, 2:04:02 PM7/8/20
to help-cfengine
Hey Marco,

You might be interested in this pull request which adjusts the behavior slightly.

Marco Marongiu

unread,
Jul 10, 2020, 4:48:47 PM7/10/20
to craig.c...@northern.tech, help-cfengine
Hi Craig

I need some time to go through your questions and to set up a simple
test policy that shows the problem. Meanwhile, I'll answer what I can
already answer here, see inline:

> Which version are you working with?

3.15.2 LTS

> Replying to your questions:
>
> standard_services actually tries to be idempotent, checking the state of a service through systemctl show before taking any action, but something doesn't work in there and systemctl start/stop/enable/disable commands are issued anyway;
>
> Can you re-run with --debug mode and check the output? Here is an example for a service that seems to work well for me on debian10:
>
> ```
> debug: Evaluating function: execresult("$(call_systemctl) $(systemd_properties) show $(service)","noshell")
> debug: GetExecOutput got 'LoadState=loaded
> ActiveState=active
> UnitFileState=enabled
> CanStart=yes
> CanStop=yes
> CanReload=no'
> ```

I believe this is run and there is no problem, but since the service
was started/enabled over and over, then maybe the right classes are
not set? E.g.: the output of systemctl has changed since the policy
was written?

> The ticket you mention, CFE-2923, was migrated to a private ticket, sorry about that. The gist of that is that non-native services don't work quite right and so `systemctl is-enabled <service>` must be used in the policy instead of the existing `systemctl show <service>`. So if you are working with non-native services this may be an issue to fix.

Can you please define non-native? Just to be sure we have a common
understanding.

> Right, start/stop are linked to enable/disable. This is something that could be changed as desired. Would be great if you want to create a ticket and discuss the change.

I could do that. Just need one quiet day, which hasn\t materialised
since I came back from vacation :(


> Can you include some logs of your initial policy and results?

Will try to put together a policy that shows the problem.


> A second run after that shows that the policy doesn't try to start the already running service.

I was working with ssh: I wanted to ensure it was both enabled and
running, and systemctl enable/start was triggered at every agent run.

Ciao
-- bronto

Marco Marongiu

unread,
Jul 10, 2020, 4:50:36 PM7/10/20
to Nick Anderson, help-cfengine
> You might be interested in this pull request which adjusts the behavior slightly.
>
> https://github.com/cfengine/masterfiles/pull/1798/files

Roger that.

-- M

craig.c...@northern.tech

unread,
Jul 10, 2020, 5:07:48 PM7/10/20
to help-cfengine
Hello,

I believe this is run and there is no problem, but since the service
was started/enabled over and over, then maybe the right classes are
not set? E.g.: the output of systemctl has changed since the policy
was written?

If you can double-check the output of the systemctl command and check that against the policy that would be a good test.
 
Can you please define non-native? Just to be sure we have a common
understanding.

There are many mentions of non-native systemd services. The basic gist that I get is that they are services without proper unit description files. So it is a way that systemctl can manage sysv style init scripts.

Strangely through another post I found the line of code that checks if the unit file exists and if it doesn't log an info message when not quiet that you are dealing with a non-native service.



I could do that. Just need one quiet day, which hasn\t materialised
since I came back from vacation :(

Glad you had some vacation! :+1:

I was working with ssh: I wanted to ensure it was both enabled and
running, and systemctl enable/start was triggered at every agent run.

So is ssh for you a sysv init script or a proper systemd unit? I assume stock debian 10 and not a custom ssh install?

Cheers,
Craig

Marco Marongiu

unread,
Jul 21, 2020, 5:28:53 PM7/21/20
to craig.c...@northern.tech, help-cfengine
Hello Craig, all.

I finally had the time to put together an almost self-contained test file. "almost" because it loads the standard library from /var/cfengine/inputs/lib/stdlib.cf but, that aside, it's fully stand-alone.

The test policy includes:
- my service method for systemd services: it's a stripped down version of standard_services, with a few changes;
- two separate bundles to enable the ssh service in Debian, with standard_services and with my systemd_services respectively

Tests:
- have a Debian 10 with ssh installed, enabled and running
- run the test_systemd_services bundle as many times as you like: the command systemctl enable ssh will not be run
- run the test_standard_services bundle as many times as you like: the command systemctl enable ssh will run every time

Example:
root@cfengine-client:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
root@cfengine-client:~# systemctl status ssh
* ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2020-05-24 16:09:04 UTC; 1 months 27 days ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 549 (sshd)
    Tasks: 1 (limit: 536)
   Memory: 6.9M
   CGroup: /system.slice/ssh.service
           `-549 /usr/sbin/sshd -D


root@cfengine-client:~# cf-agent -KIC -f /var/cfengine/inputs/services/test.cf -b test_systemd_services
    info: Using command line specified bundlesequence
R: test_systemd_services: Service ssh was kept
root@cfengine-client:~# cf-agent -KIC -f /var/cfengine/inputs/services/test.cf -b test_systemd_services
    info: Using command line specified bundlesequence
R: test_systemd_services: Service ssh was kept
root@cfengine-client:~#
root@cfengine-client:~#
root@cfengine-client:~#
root@cfengine-client:~# cf-agent -KIC -f /var/cfengine/inputs/services/test.cf -b test_standard_services
    info: Using command line specified bundlesequence
    info: Executing 'no timeout' ... '/bin/systemctl --no-ask-password --global --system enable ssh'
  notice: Q: ".../systemctl --no": Synchronizing state of ssh.service with SysV service script with /lib/systemd/systemd-sysv-install.
Q: ".../systemctl --no": Executing: /lib/systemd/systemd-sysv-install enable ssh
    info: Last 2 quoted lines were generated by promiser '/bin/systemctl --no-ask-password --global --system enable ssh'
    info: Completed execution of '/bin/systemctl --no-ask-password --global --system enable ssh'
R: test_standard_services: Service ssh was repaired
root@cfengine-client:~# cf-agent -KIC -f /var/cfengine/inputs/services/test.cf -b test_standard_services
    info: Using command line specified bundlesequence
    info: Executing 'no timeout' ... '/bin/systemctl --no-ask-password --global --system enable ssh'
  notice: Q: ".../systemctl --no": Synchronizing state of ssh.service with SysV service script with /lib/systemd/systemd-sysv-install.
Q: ".../systemctl --no": Executing: /lib/systemd/systemd-sysv-install enable ssh
    info: Last 2 quoted lines were generated by promiser '/bin/systemctl --no-ask-password --global --system enable ssh'
    info: Completed execution of '/bin/systemctl --no-ask-password --global --system enable ssh'
R: test_standard_services: Service ssh was repaired
root@cfengine-client:~# 


That's all I can do for today. I hope it helps.

Good night, ciao!
-- bronto


--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/help-cfengine/072040fb-321c-4b3e-a7ed-3ff17247a6c4n%40googlegroups.com.
test.cf

craig.c...@northern.tech

unread,
Jul 23, 2020, 4:21:39 PM7/23/20
to help-cfengine
Thanks for the simplified test! I tried it out and reproduced the results you got.

We should be able to dig into this more soon. Just wanted to let you know we appreciate your effort in providing a test case! :)

-Craig

Marco Marongiu

unread,
Jul 23, 2020, 5:31:18 PM7/23/20
to craig.c...@northern.tech, help-cfengine
Great stuff, thanks Craig!

Ciao
-- bronto 


Marco Marongiu

unread,
Feb 13, 2021, 9:03:37 AM2/13/21
to craig.c...@northern.tech, help-cfengine
Hello

I am waking up this old thread just to let you know that I have created the bug CFE-3584 because of the actions enable/disable implicitly triggered by CFEngine upon start/stop.

Ciao
-- bronto

Reply all
Reply to author
Forward
0 new messages