--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To post to this group, send email to help-c...@googlegroups.com.
Visit this group at https://groups.google.com/group/help-cfengine.
For more options, visit https://groups.google.com/d/optout.
Automating reboots is not worth it in my opinion. If things go wrong you will have a quite a hard time troubleshooting things. I would set a flag/check on your monitoring server to let you know what machines need to be rebooted. When you know all your machines are patched you can use something like clusterssh to reboot them with a few keystrokes. You don't want to explain to your boss why the servers keep randomly rebooting. Rebooting is not always guaranteed to work as expected. I've seen more issues issuing a simple reboot command than any other command I have run. You want to be mindful of the machines you are rebooting and make sure they come back. You don't want machines getting stuck on reboot when you are not watching. You will be fighting fires all the time.
Automating reboots is not worth it in my opinion. If things go wrong you will have a quite a hard time troubleshooting things. I would set a flag/check on your monitoring server to let you know what machines need to be rebooted. When you know all your machines are patched you can use something like clusterssh to reboot them with a few keystrokes. You don't want to explain to your boss why the servers keep randomly rebooting. Rebooting is not always guaranteed to work as expected. I've seen more issues issuing a simple reboot command than any other command I have run. You want to be mindful of the machines you are rebooting and make sure they come back. You don't want machines getting stuck on reboot when you are not watching. You will be fighting fires all the time.
--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To post to this group, send email to help-c...@googlegroups.com.
Visit this group at https://groups.google.com/group/help-cfengine.
For more options, visit https://groups.google.com/d/optout.
On Wed, 22 Jun 2016 18:40:27 +0200 Natxo Asenjo <natxo....@gmail.com> wrote:
NA> I agree that rebooting is a potentially disrupting operation, but I wonder
NA> how are guys with thousand of nodes coping with this problem. You surely
NA> cannot do this like you describe on that scale (unless you have a team just
NA> for patching and rebooting, which I do not think is very likely).
I'd look at Google's SRE book http://shop.oreilly.com/product/0636920041528.do
Automating the node lifecycle touches several processes:
* provisioning (including service registration)
* hardening
* monitoring and backups
* rebooting (a tiny piece of the puzzle)
* decommissioning
No, persistent classes persist across reboots. That information is
stored locally,
vars:
"reboot_time" string => "60";
classes:
"not_recently_rebooted" expression => islessthan("$(reboot_time)","$(sys.uptime)");
"recently_rebooted" expression=>isgreaterthan("$(reboot_time)","$(sys.uptime)");
classes:
"checked_patches" expression => returnszero("/usr/bin/zypper lp >> /var/cfengine/state/patch.check","useshell");
so time2reboot is basically always set unless it just rebooted right?
I would rather have a class that said recently_rebooted and not_recently_rebooted.
vars:
"reboot_time" string => "60";
classes:
"not_recently_rebooted" expression => islessthan("$(reboot_time)","$(sys.uptime)");
"recently_rebooted" expression=>isgreaterthan("$(reboot_time)","$(sys.uptime)");
I would use 'shutdown -r 30'
If it was me I would use returnszero for your patch check. This, to me, give you more viability and control over what is going on. You can use that class to help gate your reboot command.
classes:
"checked_patches" expression => returnszero("/usr/bin/zypper lp >> /var/cfengine/state/patch.check","useshell");
so time2reboot is basically always set unless it just rebooted right?time2reboot is set as soon as the server is up for 30 minutes. I will add if_elapsed as this doesn't need to be evaluated on each agent run.I would rather have a class that said recently_rebooted and not_recently_rebooted.
vars:
"reboot_time" string => "60";
classes:
"not_recently_rebooted" expression => islessthan("$(reboot_time)","$(sys.uptime)");
"recently_rebooted" expression=>isgreaterthan("$(reboot_time)","$(sys.uptime)");That's nice but whatfor do you need both classes?
I would use 'shutdown -r 30'absolutely better!!If it was me I would use returnszero for your patch check. This, to me, give you more viability and control over what is going on. You can use that class to help gate your reboot command.
classes:
"checked_patches" expression => returnszero("/usr/bin/zypper lp >> /var/cfengine/state/patch.check","useshell");This won't work as wether or not there are patches available zypper lp will return 0. I need the output to check for "No updates found." in there.Luckily I can test in a testlab for a couple of weeks, thanks, Alex!Chris
So Alex, I'm curious: When would the file "/var/cfengine/state/rebooted_node" be removed?
Or is this a one-shot-only policy?
Thanks a lot for your hint regarding the time comparison in either way!