The trap or challenge of convergence

80 views
Skip to first unread message

Christian Linden

unread,
Jun 22, 2016, 7:13:43 AM6/22/16
to help-cfengine
Hi,

I want to force a reboot by CFEngine after the converging patch process =)
The patch promise does the smt-registration, repository creation and then it needs two or three agent runs to put all patches on the system.

Then I need to reboot. How do I do that? Grepping the syslog for "Reboot as soon as possible." (zypper tells that) ?!

Thanks!
Chris

Christian Linden

unread,
Jun 22, 2016, 8:14:33 AM6/22/16
to help-cfengine
Can I do that by using: (https://docs.cfengine.com/latest/reference-standard-library-monitor.html)


body file control
{
    inputs => { "monitor.cf" }
}

+

body match_value scan_log(line)
{
      select_line_matching => "$(line)";
      track_growing_file => "true";
}

?

How can I set a class if the line shows up?

Thanks!
Chris


Nick Anderson

unread,
Jun 22, 2016, 8:55:09 AM6/22/16
to help-c...@googlegroups.com
On 06/22/2016 07:14 AM, Christian Linden wrote:
> How can I set a class if the line shows up?

You can use `regline()` to see if a given regular expression matches a
line in the file.

https://docs.cfengine.com/lts/reference-functions-regline.html

signature.asc

Nick Anderson

unread,
Jun 22, 2016, 9:02:55 AM6/22/16
to Christian Linden, help-cfengine
On 06/22/2016 06:13 AM, Christian Linden wrote:
> The patch promise does the smt-registration, repository creation and
> then it needs two or three agent runs to put all patches on the system.

> Then I need to reboot. How do I do that? Grepping the syslog for "Reboot
> as soon as possible." (zypper tells that) ?!

You need to be able to identify the condition correctly first. After a
reboot your syslog will likely still contain that same string, and you
could send yourself into a reboot loop.

Maybe there are some magical zypper commands that can help identify it.
Maybe you can parse your syslog entries and get the timestamp of the
last "Reboot as soon as possible" message and see if that timestamp is
newer than pid 1 (indicating it was emitted after a reboot).

A module might be your friend.
strftime() might be useful


signature.asc

Aleksey Tsalolikhin

unread,
Jun 22, 2016, 11:05:51 AM6/22/16
to Nick Anderson, Christian Linden, help-cfengine
You can also set a persistent class after rebooting -- the present of the persistent class would indicate that the reboot has already been done.

Or, you could touch a file on the filesystem (say, in /var/cfengine/state) to indicate that the reboot has already been done.  

You'd  need to clear the persistent class or the file flag before the next patch/reboot cycle.



--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To post to this group, send email to help-c...@googlegroups.com.
Visit this group at https://groups.google.com/group/help-cfengine.
For more options, visit https://groups.google.com/d/optout.



--
Aleksey Tsalolikhin
Founder and Chief Trainer

Christian Linden

unread,
Jun 22, 2016, 11:59:08 AM6/22/16
to Nick Anderson, help-cfengine

> After a reboot your syslog will likely still contain that same string, and you
> could send yourself into a reboot loop.

jep..

Will check out your hints next week, thanks.

signature.asc

Christian Linden

unread,
Jun 22, 2016, 12:01:10 PM6/22/16
to Aleksey Tsalolikhin, Nick Anderson, help-cfengine
but the persistent class will be gone if cfengine3 will be restarted, right?

Alex Georgopoulos

unread,
Jun 22, 2016, 12:02:55 PM6/22/16
to help-cfengine, nick.a...@cfengine.com
I'm going to suggest you don't go down this road.  This could get extremely painful if you have a bug.

Christian Linden

unread,
Jun 22, 2016, 12:15:45 PM6/22/16
to Alex Georgopoulos, help-cfengine, nick.a...@cfengine.com
I’m sure it could! But not which road?

c

Neil Watson

unread,
Jun 22, 2016, 12:24:47 PM6/22/16
to help-cfengine
If you search this group you'll find a discussion about this started by
Martin in the past year. Some distros can leave a flag of some kind that
a reboot is required. I suggest research your OS and see what if offers;
when you know that you'll know how to make CFEngine handle it.

--
Neil H Watson
CFEngine reporting: https://github.com/neilhwatson/delta_reporting
CFEngine policy: https://github.com/neilhwatson/evolve_cfengine_freelib
CFEngine and vim: https://github.com/neilhwatson/vim_cf3

Alex Georgopoulos

unread,
Jun 22, 2016, 12:26:51 PM6/22/16
to Christian Linden, help-cfengine, Nick Anderson

Automating reboots is not worth it in my opinion.  If things go wrong you will have a quite a hard time troubleshooting things.  I would set a flag/check on your monitoring server to let you know what machines need to be rebooted.  When you know all your machines are patched you can use something like clusterssh to reboot them with a few keystrokes.  You don't want to explain to your boss why the servers keep randomly rebooting.  Rebooting is not always guaranteed to work as expected.  I've seen more issues issuing a simple reboot command than any other command I have run.   You want to be mindful of the machines you are rebooting and make sure they come back. You don't want machines getting stuck on reboot when you are not watching.  You will be fighting fires all the time.

Natxo Asenjo

unread,
Jun 22, 2016, 12:40:29 PM6/22/16
to help-cfengine
hi Alex,

On Wed, Jun 22, 2016 at 6:26 PM, Alex Georgopoulos <ageo...@gmail.com> wrote:

Automating reboots is not worth it in my opinion.  If things go wrong you will have a quite a hard time troubleshooting things.  I would set a flag/check on your monitoring server to let you know what machines need to be rebooted.  When you know all your machines are patched you can use something like clusterssh to reboot them with a few keystrokes.  You don't want to explain to your boss why the servers keep randomly rebooting.  Rebooting is not always guaranteed to work as expected.  I've seen more issues issuing a simple reboot command than any other command I have run.   You want to be mindful of the machines you are rebooting and make sure they come back. You don't want machines getting stuck on reboot when you are not watching.  You will be fighting fires all the time.

I agree that rebooting is a potentially disrupting operation, but I wonder how are guys with thousand of nodes coping with this problem. You surely cannot do this like you describe on that scale (unless you have a team just for patching and rebooting, which I do not think is very likely).

If you only have a few hundreds of nodes, that approach is doable (I know because that is what I do, I use a nagios check that warns when the installed kernel is newer than the running one in centos), but that can be quite a pain as well when a few hundred nodes decide they have a critical notification and the monitoring console gets all red.

So I am quite interested in knowing how other cfengineers are solving this issue.

Thanks.
 
--
Groeten,
natxo

Neil Watson

unread,
Jun 22, 2016, 1:08:04 PM6/22/16
to help-cfengine
I think in large scale you monitor capacity and not host up time. Thus
if you reboot 300 hosts and capacity is still good then no alarms.

Alex Georgopoulos

unread,
Jun 22, 2016, 1:55:59 PM6/22/16
to help-cfengine
Writing a script to reboot 1 or 10000 machines is the same amount of work.  I used to reboot 100 machines at a time in batches by myself.  I could have done more but there was always a few that didn't come back.  

One thing that matters is how resilient your environment is to a host down.  If losing 2 or 3 hosts out of a large batch is not critical event and you can wait to the morning to address it then automating reboots becomes a more viable solution.   

er...@oco.nnor.org

unread,
Jun 22, 2016, 2:32:50 PM6/22/16
to help-cfengine, lindo...@gmail.com
If you have a tmpfs mounted (or can afford to mount a small one),
touching a file there and checking for its existence is guaranteed to
avoid reboot loops (so long as the condition that places it there is not
triggered erroneously).

Any time based check will be dangerous, unless you never run ntp and
your hardware clock is perfect.

Eric

Aleksey Tsalolikhin

unread,
Jun 22, 2016, 4:18:36 PM6/22/16
to Alex Georgopoulos, Christian Linden, help-cfengine, Nick Anderson
Yeah, I was going to mention "you can reboot with Ansible" but wasn't sure if that would be appropriate.  =)  Might want to have more control the first time and then as you get more confident with the procedure, you can give me and more control to the automation...

There is a diagram on this in Mark's http://markburgess.org/blog_cyborg.html

--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To post to this group, send email to help-c...@googlegroups.com.
Visit this group at https://groups.google.com/group/help-cfengine.
For more options, visit https://groups.google.com/d/optout.

Christian Linden

unread,
Jun 22, 2016, 4:19:46 PM6/22/16
to er...@oco.nnor.org, help-cfengine
Lovely inputs based on experiences =)
I like! 

Chris

Neil Watson

unread,
Jun 22, 2016, 4:24:27 PM6/22/16
to help-cfengine
On Wed, Jun 22, 2016 at 11:32:50AM -0700, er...@oco.nnor.org wrote:
> If you have a tmpfs mounted (or can afford to mount a small one),
> touching a file there and checking for its existence is guaranteed to

Clever idea ++

Ted Zlatanov

unread,
Jun 23, 2016, 8:57:28 AM6/23/16
to help-c...@googlegroups.com
On Wed, 22 Jun 2016 18:40:27 +0200 Natxo Asenjo <natxo....@gmail.com> wrote:

NA> I agree that rebooting is a potentially disrupting operation, but I wonder
NA> how are guys with thousand of nodes coping with this problem. You surely
NA> cannot do this like you describe on that scale (unless you have a team just
NA> for patching and rebooting, which I do not think is very likely).

I'd look at Google's SRE book http://shop.oreilly.com/product/0636920041528.do

Automating the node lifecycle touches several processes:

* provisioning (including service registration)
* hardening
* monitoring and backups
* rebooting (a tiny piece of the puzzle)
* decommissioning

It usually makes sense to think of all these holistically, rather than
attack each one separately. For instance, rebooting a machine should not
trigger alerts, and should temporarily unregister its services.

In addition, treating nodes as disposable infrastructure and thinking at
the cluster level, possibly with the help of container technology,
simplifies these processes immensely and tends to cut business costs.

Ted

Nick Anderson

unread,
Jun 23, 2016, 5:25:41 PM6/23/16
to Christian Linden, Aleksey Tsalolikhin, Nick Anderson, help-cfengine
On 06/22/2016 11:01 AM, Christian Linden wrote:
> but the persistent class will be gone if cfengine3 will be restarted, right?

No, persistent classes persist across reboots. That information is
stored locally, and anything you can't inspect directly to answer the
question leaves room for uncertainty about its provenance.

I would start with simply defining a class to indicate you need a
reboot. Go through some cycles where you manually reboot using whatever
tools but just for hosts that you know have that reboot class defined.
The more you do it the more your confidence will grow and then you can
start evaluating doing automatic reboots.

signature.asc

Natxo Asenjo

unread,
Jun 23, 2016, 5:58:05 PM6/23/16
to help-cfengine

hi Ted,

On Thu, Jun 23, 2016 at 2:56 PM, Ted Zlatanov <t...@lifelogs.com> wrote:
On Wed, 22 Jun 2016 18:40:27 +0200 Natxo Asenjo <natxo....@gmail.com> wrote:

NA> I agree that rebooting is a potentially disrupting operation, but I wonder
NA> how are guys with thousand of nodes coping with this problem. You surely
NA> cannot do this like you describe on that scale (unless you have a team just
NA> for patching and rebooting, which I do not think is very likely).

I'd look at Google's SRE book http://shop.oreilly.com/product/0636920041528.do

Automating the node lifecycle touches several processes:

* provisioning (including service registration)
* hardening
* monitoring and backups
* rebooting (a tiny piece of the puzzle)
* decommissioning

very good points, and interesting reading stuff.

I am looking at integrating at ways of automatically adding hosts to our monitoring solution using its api. And the reboots can automatically trigger a notification downtime.

Thanks for the tips.
 
--
Groeten,
natxo

Christian Linden

unread,
Jun 27, 2016, 11:56:19 AM6/27/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com

No, persistent classes persist across reboots. That information is
stored locally,

That's good to know!
The problem is this one: "/usr/bin/zypper --non-interactive patch --auto-agree-with-licenses --with-interactive" classes => default:results("namespace", "patched");
The fulfillment takes a couple of agent runs and I don't know which the last one is. Just thinking about counting the occurence in the syslog, writing "1", "2", "3" into my
count.file and after 3 (or 4 I've to check) occurs in the file the reboot can be triggered =)

c

Nick Anderson

unread,
Jun 27, 2016, 12:08:51 PM6/27/16
to Christian Linden, help-cfengine, ale...@verticalsysadmin.com, nick.a...@cfengine.com
On 06/27/2016 10:56 AM, Christian Linden wrote:
> That's good to know!
> The problem is this one: "/usr/bin/zypper --non-interactive patch
> --auto-agree-with-licenses --with-interactive" classes =>
> default:results("namespace", "patched");
> The fulfillment takes a couple of agent runs and I don't know which the
> last one is. Just thinking about counting the occurence in the syslog,
> writing "1", "2", "3" into my
> count.file and after 3 (or 4 I've to check) occurs in the file the
> reboot can be triggered =)

I think counting a specific number of executions is a bad way to
identify the state.

It would be better I think to query for information that indicates your
done. perhaps by inspecting something like the output of something like
zypper patch-check.



signature.asc

Christian Linden

unread,
Jun 28, 2016, 6:35:54 AM6/28/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
good idea, thanks!

c

Christian Linden

unread,
Jul 5, 2016, 7:21:50 AM7/5/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
Hi,

just to share my solution and to may get a qa.

I see that working and have no doubts to use it, should I?

Chris




mike.w...@verticalsysadmin.com

unread,
Jul 5, 2016, 10:51:31 PM7/5/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
Chris, just a warning; I see you using "execresut()" to populate a variable: Are you aware that the command given in there is run at least three times (and usually more) for each single run of the agent?

It's even run during cf-promises.

I get three times during cf-promises and four times during cf-agent (even without -K) using the following test bundle.

Try:

bundle agent main {
  vars:
    "somevar"
      string => execresult( "/bin/date | tee -a /tmp/testfile.txt", "useshell" );
}

Wherever possible I avoid "execresult" and if I must use it, I do so with a class guard.  Just a tip, when you start being concerned about performance.  :)

--Mike Weilgart

Christian Linden

unread,
Jul 6, 2016, 7:16:23 AM7/6/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
Thanks a lot for that hint, Mike!
Maybe it's avoidable by ifelapsed as well; I try to check that but:

bundle agent execr
{
vars:
    "somevar"
      string => execresult( "/bin/date | tee -a /tmp/execresult_testfile.txt", "useshell" ),
        action => ifelapsed("30");
}

returns: error: Undefined body ifelapsed with type action 
and if I add it:
bundle agent execr
{
vars:
    "somevar"
      string => execresult( "/bin/date | tee -a /tmp/execresult_testfile.txt", "useshell" ),
        action => ifelapsed("30");
}

body action if_elapsed(x)
{
      ifelapsed => "$(x)";
      expireafter => "$(x)";
}

it returns:  Duplicate definition of body if_elapsed with type action
=(

I get on 3.7.2 core three evaluations as well:
 cf-promises -v execr.cf |grep execresult
 verbose: execresult ran '/bin/date | tee -a /tmp/execresult_testfile.txt' successfully
 verbose: Caching result for function 'execresult(/bin/date | tee -a /tmp/execresult_testfile.txt,useshell)'
 verbose: execresult ran '/bin/date | tee -a /tmp/execresult_testfile.txt' successfully
 verbose: Caching result for function 'execresult(/bin/date | tee -a /tmp/execresult_testfile.txt,useshell)'
 verbose: execresult ran '/bin/date | tee -a /tmp/execresult_testfile.txt' successfully
 verbose: Caching result for function 'execresult(/bin/date | tee -a /tmp/execresult_testfile.txt,useshell)'

and they seem to get cached =)

Thanks again!
Chris

Alex Georgopoulos

unread,
Jul 6, 2016, 6:45:06 PM7/6/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
That looks like a very heavy policy.  You will be checking your patch level every 5 minutes?  You will be doing a regline to a file every 5 minutes?  Also the class expression that compares to uptime seems weird to me. I could also see a potential race condition where your rebooted_repaired class does not get set because the machine reboots too quickly and doesn't have time for the lmdb to update.  I would not put that into production as is.

Christian Linden

unread,
Jul 7, 2016, 7:01:25 AM7/7/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
Thank you for qa'ing, Alex.
- putting classes/time guards ahead the check and regline makes sense in terms of performance and usefulness indeed.
- "time2reboot" expression => islessthan("30","$(sys.uptime)"); looks fine to me and works as desired.
- race condition: jep, I thought about it and was kind of surprised that it was set on all my tests. But I will run a script with a sleep 10 ahead, that should provide the required time, right?
The rebooted_repaired class will be set as soon as the script is found an run, right?

Thanks again!
Chris

Alex Georgopoulos

unread,
Jul 7, 2016, 2:12:56 PM7/7/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
so time2reboot is basically always set unless it just rebooted right?  I would rather have a class that said recently_rebooted and not_recently_rebooted.
vars:
"reboot_time" string => "60";
classes
:
"not_recently_rebooted" expression => islessthan("$(reboot_time)","$(sys.uptime)");
"recently_rebooted" expression=>isgreaterthan("$(reboot_time)","$(sys.uptime)");



When you call reboot things are just going to start getting killed by the init system.  Your cfengine run may get killed for all you know.   It seems to me you would want to ensure things finish up before you reboot.  I would use 'shutdown -r 30' or something to give it a little time for things to finish up and it also give you a chance to cancel that reboot if your policy goes haywire.  

If it was me I would use returnszero for your patch check.   This, to me, give you more viability and control over what is going on.   You can use that class to help gate your reboot command.

classes:

"checked_patches" expression => returnszero("/usr/bin/zypper lp >> /var/cfengine/state/patch.check","useshell");

I'm sure there are other things that could be tweaked but I still stress being very careful before releasing this.  I would probably log a potential reboot to syslog or to reports before I actually went live and watched things for a few weeks to make sure it would do the right thing.

Christian Linden

unread,
Jul 8, 2016, 5:36:40 AM7/8/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
so time2reboot is basically always set unless it just rebooted right?  

time2reboot is set as soon as the server is up for 30 minutes. I will add if_elapsed as this doesn't need to be evaluated on each agent run.
 
I would rather have a class that said recently_rebooted and not_recently_rebooted.
vars:
"reboot_time" string => "60";
classes
:
"not_recently_rebooted" expression => islessthan("$(reboot_time)","$(sys.uptime)");
"recently_rebooted" expression=>isgreaterthan("$(reboot_time)","$(sys.uptime)");

That's nice but whatfor do you need both classes?  

 
  I would use 'shutdown -r 30' 

absolutely better!! 
 

If it was me I would use returnszero for your patch check.   This, to me, give you more viability and control over what is going on.   You can use that class to help gate your reboot command.

classes:

"checked_patches" expression => returnszero("/usr/bin/zypper lp >> /var/cfengine/state/patch.check","useshell");


This won't work as wether or not there are patches available zypper lp will return 0. I need the output to check for "No updates found." in there. 

Luckily I can test in a testlab for a couple of weeks, thanks, Alex!

Chris

Alex Georgopoulos

unread,
Jul 8, 2016, 1:48:28 PM7/8/16
to help-cfengine, lindo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com


On Friday, July 8, 2016 at 2:36:40 AM UTC-7, Christian Linden wrote:
so time2reboot is basically always set unless it just rebooted right?  

time2reboot is set as soon as the server is up for 30 minutes. I will add if_elapsed as this doesn't need to be evaluated on each agent run.
 
I would rather have a class that said recently_rebooted and not_recently_rebooted.
vars:
"reboot_time" string => "60";
classes
:
"not_recently_rebooted" expression => islessthan("$(reboot_time)","$(sys.uptime)");
"recently_rebooted" expression=>isgreaterthan("$(reboot_time)","$(sys.uptime)");

That's nice but whatfor do you need both classes?  

So all your machines, after 30 minutes, are going to report time2reboot all the time even when there are no patches?  From experience this will lead people to the wrong conclusion about your systems and say, "hey you need to reboot".  Then you will have to explain why it's doing that.  It's too specific to the bundle and doesn't describe the system state.   I use classes to describe state then use that to take action.  It's just clearer, to me, when you use them in your class expressions.  In addition having two classes properly enumerates what the system state is in without having to worry about negative knowledge.   !recently_rebooted will always be true until it's evaluated for example.  

 
  I would use 'shutdown -r 30' 

absolutely better!! 
 

If it was me I would use returnszero for your patch check.   This, to me, give you more viability and control over what is going on.   You can use that class to help gate your reboot command.

classes:

"checked_patches" expression => returnszero("/usr/bin/zypper lp >> /var/cfengine/state/patch.check","useshell");


This won't work as wether or not there are patches available zypper lp will return 0. I need the output to check for "No updates found." in there. 

Luckily I can test in a testlab for a couple of weeks, thanks, Alex!

Chris


What happens if zipper is being used by another process?  Regardless, it will still work because instead of a report writing to the file your check command is writing to the file using the >>.  It could be made a little more robust by piping the output to a staging file then tee it into place if the 'checked_patches" class is set.  

Christian Linden

unread,
Jul 11, 2016, 12:47:00 PM7/11/16
to Alex Georgopoulos, help-cfengine, Aleksey Tsalolikhin, nick.a...@cfengine.com
> So all your machines, after 30 minutes, are going to report time2reboot all the time even when there are no patches?

Jep, time2reboot is set independently.

> From experience this will lead people to the wrong conclusion about your systems and say, "hey you need to reboot". Then you will have to explain why it's doing that. It's too specific to the bundle and doesn't describe the system state.

That’s correct.

> I use classes to describe state then use that to take action. It's just clearer, to me, when you use them in your class expressions. In addition having two classes properly enumerates what the system state is in without having to worry about negative knowledge. !recently_rebooted will always be true until it's evaluated for example.

Makes sense.
>
> What happens if zipper is being used by another process? Regardless, it will still work because instead of a report writing to the file your check command is writing to the file using the >>. It could be made a little more robust by piping the output to a staging file then tee it into place if the 'checked_patches" class is set.

Means that you were not going to use the „checked_patches“ class itself but use this way to get a more robust output from the command?

Thanks a lot!
Chris




Alex Georgopoulos

unread,
Jul 11, 2016, 5:00:51 PM7/11/16
to help-cfengine, ageo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
The checked_patches class lets me know I ran that command for one and then I could use that class to gate the regline check.  Or I would probably use that to move/tee the output of that command to the actual placewhere the regline check reads the file.

If this were me I would probably do something like the following class progression.

checked_patches > appended_patch_state_file > checked_patch_state_file > (patches_applied|no_patches_applied) kind of thing

Then you could have something like the following.

classes:
                "rebooted_node" expression => fileexists("/var/cfengine/state/rebooted_node"), scope => "bundle";
                "not_rebooted_node" not => fileexists("/var/cfengine/state/rebooted_node"), scope => "bundle";

commands:

patches_applied.not_recently_rebooted.not_rebooted_node.!shutdown_in_progress::
"/sbin/shutdown -r 30",
if_repaired("shutdown_in_progress");

mike.w...@verticalsysadmin.com

unread,
Jul 11, 2016, 6:32:13 PM7/11/16
to help-cfengine, ageo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
So Alex, I'm curious: When would the file "/var/cfengine/state/rebooted_node" be removed?

Or is this a one-shot-only policy?

Trying to get reboots into a system designed for convergence is conceptually tricky.

I might go at it from a conceptual approach like so:

/var/cfengine/state/patches_installed promises to be older than the system uptime.  If it is not, the repair method is to restart the box.

Packages (your package manager) promise to be fully patched.  If they are not, the repair method is to install patches and touch /var/cfengine/state/patches_installed.

If there were a cheap test to check for the timestamp when patches were last installed (directly from the package management database rather than the bespoke patches_installed file), that might be even better.  Or it might not.

What do you think of that?

--Mike Weilgart

Alex Georgopoulos

unread,
Jul 11, 2016, 8:52:11 PM7/11/16
to help-cfengine, ageo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
Good question, as I didn't write the original policy I would guess it's some manual thing to remove.  I guess you could look at the file age of that file and remove it after N days if you wanted.  I'm guessing this policy is meant for newly provisioned hosts but I would have to ask Christian.  

I know ubuntu/debian has some way as it lets me know when I log in what is going on.

Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-48-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Mon Jul 11 22:23:58 UTC 2016

  System load:  0.04               Processes:           143
  Usage of /:   32.0% of 19.55GB   Users logged in:     0
  Memory usage: 8%                 IP address for eth0: 10.0.0.16
  Swap usage:   0%

  Graph this data and manage this system at:

  Get cloud support with Ubuntu Advantage Cloud Guest:

41 packages can be updated.
11 updates are security updates.


*** System restart required ***


I'm guessing there are some api's there or some way to know.  Cannot speak for SuSE or RedHat as I'm a little out of touch with those distros lately.

Christian Linden

unread,
Jul 12, 2016, 6:54:11 AM7/12/16
to Alex Georgopoulos, help-cfengine, Aleksey Tsalolikhin, nick.a...@cfengine.com
The checked_patches class lets me know I ran that command for one and then I could use that class to gate the regline check.

>> ok

Or I would probably use that to move/tee the output of that command to the actual placewhere the regline check reads the file.

>> It’s redirected in the returnszero-command, so there’s no more need for any move/tee.


>
> If this were me I would probably do something like the following class progression.
>
> checked_patches > appended_patch_state_file > checked_patch_state_file > (patches_applied|no_patches_applied) kind of thing
>
> Then you could have something like the following.
>
> classes:
> "rebooted_node" expression => fileexists("/var/cfengine/state/rebooted_node"), scope => "bundle";
> "not_rebooted_node" not => fileexists("/var/cfengine/state/rebooted_node"), scope => "bundle";
>
> commands:
>
> patches_applied.not_recently_rebooted.not_rebooted_node.!shutdown_in_progress::
> "/sbin/shutdown -r 30",
> if_repaired("shutdown_in_progress“);

>> sounds reasonable but what’s about having both states: shutdown_in_progress AND rebooted_node?
Because if the shutdown is fired there’s $(reboot_time) left for further agent runs (or evaluations in the same run) and the „rebooted_node“ file will be created as soon as shutdown_in_progress. This state will be haywire, won’t it? As the rebooted_node class will be set before the reboot was fulfilled.

Thanks so much, this is great help! =)

Chris

Christian Linden

unread,
Jul 12, 2016, 7:06:23 AM7/12/16
to Alex Georgopoulos, help-cfengine, Aleksey Tsalolikhin, nick.a...@cfengine.com
Hi Alex, hi Mike,


So Alex, I'm curious: When would the file "/var/cfengine/state/rebooted_node" be removed?

I didn’t plan it yet as I don’t see any case where it will hurt.
As Alex says this is a policy for newly provisioned hosts.
The file could be removed after the period according to the companies policy regarding patch cycles; because then new patches will be installed and a reboot may make sense again. This is the problem of the operations team later =) That’s actually not my job. Sure I need to think about it as well to be able to consult later. 

Or is this a one-shot-only policy?

What’s that exactly? How often will a one-shot-only policy be run by the agent?

Thanks a lot for your hint regarding the time comparison in either way! 
On redhat one can get it via grep Updated /var/log/yum.log | tail -1 | cut -d' ' -f 1-2


Chris

Alex Georgopoulos

unread,
Jul 12, 2016, 2:07:18 PM7/12/16
to help-cfengine, ageo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
 I would say to use the $(reboot_time) to set the shutdown_in_progress as a persistent class with time of $(reboot_time).  Then it will be active during the delay.  

Having shutdown_in_progress AND rebooted_node shouldn't trigger the reboot line again.  I would use that shutdown_in_progress and rebooted_node classes as a way to no longer evaluate the bundles.  

Christian Linden

unread,
Jul 12, 2016, 4:17:46 PM7/12/16
to Alex Georgopoulos, help-cfengine, Aleksey Tsalolikhin, nick.a...@cfengine.com

> Having shutdown_in_progress AND rebooted_node shouldn't trigger the reboot line again. I would use that shutdown_in_progress and rebooted_node classes as a way to no longer evaluate the bundles.

Ok, but that’s a bad (description of a) state: class is set that the box is rebooted but it’s not yet. It’s an untrue state.

c

Alex Georgopoulos

unread,
Jul 12, 2016, 4:46:41 PM7/12/16
to help-cfengine, ageo...@gmail.com, ale...@verticalsysadmin.com, nick.a...@cfengine.com
so pick a new class name something like reboot_trigger_file_present  

Christian Linden

unread,
Jul 12, 2016, 5:18:22 PM7/12/16
to Alex Georgopoulos, help-cfengine, Aleksey Tsalolikhin, nick.a...@cfengine.com
=)

Reply all
Reply to author
Forward
0 new messages