"locking" puppet runs

201 views
Skip to first unread message

Jonathan Gazeley

unread,
May 8, 2014, 1:34:08 PM5/8/14
to puppet...@googlegroups.com
We run our puppet agents as daemons with runinterval set to 30m. This
works fine for the mostpart, but recently we took on three MariaDB
Galera cluster nodes. In order for the cluster to survive, at least one
node must be running at all times, i.e. you must not restart the MariaDB
service on all three nodes at the same time, e.g. by pushing a change to
the config file with puppet.

With the runinterval set to 30m, it is unlikely that all three nodes
will restart their MariaDB service at the same time but not impossible.
It would be a Bad Thing if I pushed a config change, all three nodes did
their puppet run at the same time, and the database cluster disappeared.

Is there a way of incorporating some sort of locking to ensure that only
one out of the three nodes can run puppet agent simultaneously? I have
half an idea based around using PuppetDB and something that checks a
variable in a run stage which is executed before main, which could cause
the run to abort if another node has locked a row.

I suppose it would be possible to disable the agent from running as a
daemon and use cron, and the cron job could easily use a MySQL handle as
a locking device. But it still doesn't stop me from simply sshing to
each of the nodes and forcing a puppet run, and breaking the cluster.

Has anyone done anything like this before? Hope to have some interesting
ideas from you all :)

Cheers,
Jonathan

Felix Frank

unread,
May 8, 2014, 1:40:27 PM5/8/14
to puppet...@googlegroups.com
Hi,

for this and other reasons, I have found cron to be more powerful than
the background agent. You can take fine grained control over when or
when not to run.

For your specific problem, would it make sense to create a schedule for
all of your mysql configuration items, and have that schedule be
different on each cluster node, so that the possible times of
application are disjoint?

Note that this may make it Very Hard to deploy a config simultaneously
using --test if you really have to. If the general approach does sound
appealing to you, I figure this is likely solvable using a special
environment on your master that overrides the schedules for that special
occasion.

HTH,
Felix

Dirk Heinrichs

unread,
May 8, 2014, 1:58:52 PM5/8/14
to puppet...@googlegroups.com
Am 08.05.2014 15:34, schrieb Jonathan Gazeley:

I suppose it would be possible to disable the agent from running as a daemon and use cron, and the cron job could easily use a MySQL handle as a locking device. But it still doesn't stop me from simply sshing to each of the nodes and forcing a puppet run, and breaking the cluster.

Has anyone done anything like this before? Hope to have some interesting ideas from you all :)

You could
  1. enable splay on the client node
  2. use mcollective to orchestrate the agent runs. For example: "Update config file on all MariaDB servers, but only one at a time."
See http://www.slideshare.net/PuppetLabs/presentation-16281121 for some more information.

HTH...

    Dirk
--

Dirk Heinrichs, Senior Systems Engineer, Engineering Solutions
Recommind GmbH, Von-Liebig-Straße 1, 53359 Rheinbach
Tel: +49 2226 1596666 (Ansage) 1149
Email: d...@recommind.com
Skype: dirk.heinrichs.recommind
www.recommind.com

jcbollinger

unread,
May 9, 2014, 7:33:57 PM5/9/14
to puppet...@googlegroups.com


On Thursday, May 8, 2014 8:58:52 AM UTC-5, Dirk Heinrichs wrote:
Am 08.05.2014 15:34, schrieb Jonathan Gazeley:

I suppose it would be possible to disable the agent from running as a daemon and use cron, and the cron job could easily use a MySQL handle as a locking device. But it still doesn't stop me from simply sshing to each of the nodes and forcing a puppet run, and breaking the cluster.

Has anyone done anything like this before? Hope to have some interesting ideas from you all :)

You could
  1. enable splay on the client node

No, that would at best be unhelpful.  Splay well tend to spread out the client load seen by the master over time, but it contributes nothing to avoiding runs on specific clients coinciding.  In fact, it might increase the likelihood of specific pairs (or triples) of clients' runs coinciding.
 
  1. use mcollective to orchestrate the agent runs. For example: "Update config file on all MariaDB servers, but only one at a time."

That, on the other hand, could be just the ticket, provided that it is acceptable to run the agent only via that mechanism (which itself could perhaps be triggered via cron to give automation).  Even that, however, would not actively prevent client runs from coinciding if someone manually ran the agent on one of the sensitive systems.


Consider this, however: puppet already employs a lock file to prevent multiple catalog runs from overlapping on the same system.  What Jonathan asks is simply an extension of that mechanism.  It could be achieved relatively easily if the systems in question shared the same lock file, and it turns out that the lock file name and location are configurable.  If the configured location where on a shared filesystem accessible to all the machines involved then I think the requested behavior would fall out pretty naturally.

Note, however, that nothing is foolproof.  A sufficiently authorized person could still override the lock file (simply by deleting it, for example) to allow multiple catalog runs to proceed simultaneously.  At some point you just have to decide that your safeguards are good enough.


John

Jonathan Gazeley

unread,
May 12, 2014, 8:55:57 AM5/12/14
to puppet...@googlegroups.com
On 09/05/14 20:33, jcbollinger wrote:


On Thursday, May 8, 2014 8:58:52 AM UTC-5, Dirk Heinrichs wrote:
Am 08.05.2014 15:34, schrieb Jonathan Gazeley:

I suppose it would be possible to disable the agent from running as a daemon and use cron, and the cron job could easily use a MySQL handle as a locking device. But it still doesn't stop me from simply sshing to each of the nodes and forcing a puppet run, and breaking the cluster.

Has anyone done anything like this before? Hope to have some interesting ideas from you all :)

You could
  1. enable splay on the client node

No, that would at best be unhelpful.  Splay well tend to spread out the client load seen by the master over time, but it contributes nothing to avoiding runs on specific clients coinciding.  In fact, it might increase the likelihood of specific pairs (or triples) of clients' runs coinciding.

Yes, I read about the splay option and it seems like it wouldn't help in this case.


 
  1. use mcollective to orchestrate the agent runs. For example: "Update config file on all MariaDB servers, but only one at a time."

That, on the other hand, could be just the ticket, provided that it is acceptable to run the agent only via that mechanism (which itself could perhaps be triggered via cron to give automation).  Even that, however, would not actively prevent client runs from coinciding if someone manually ran the agent on one of the sensitive systems.

We are in the very early stages of looking at what MCollective can do for us. It looks promising but at the time of writing we have basically zero experience with it :)



Consider this, however: puppet already employs a lock file to prevent multiple catalog runs from overlapping on the same system.  What Jonathan asks is simply an extension of that mechanism.  It could be achieved relatively easily if the systems in question shared the same lock file, and it turns out that the lock file name and location are configurable.  If the configured location where on a shared filesystem accessible to all the machines involved then I think the requested behavior would fall out pretty naturally.

This crossed my mind. NFS-mounted lock files seem like a disaster waiting to happen, though. I suppose the puppetmaster could host the NFS share and then if the puppetmaster or the network is down, the node wouldn't have been able to check in anyway.

I was also wondering if PuppetDB could be used for this way, or even a general purpose database with lock rows.


Note, however, that nothing is foolproof.  A sufficiently authorized person could still override the lock file (simply by deleting it, for example) to allow multiple catalog runs to proceed simultaneously.  At some point you just have to decide that your safeguards are good enough.

Indeed. There are only two of us that work full-time on puppet infrastructure and neither of us would delete a lock file without good reason. It just has to prevent accidentally starting a run by mistake, or cron/puppetd starting a run automatically. There's a lot to be said for "good enough" in the world of IT operations.

Cheers,
Jonathan



John

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/3f93cc53-c867-4976-b071-b7ce5838417c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jcbollinger

unread,
May 12, 2014, 1:32:19 PM5/12/14
to puppet...@googlegroups.com

jcbollinger

unread,
May 12, 2014, 1:44:18 PM5/12/14
to puppet...@googlegroups.com


On Monday, May 12, 2014 3:55:57 AM UTC-5, Jonathan Gazeley wrote:

This crossed my mind. NFS-mounted lock files seem like a disaster waiting to happen, though. I suppose the puppetmaster could host the NFS share and then if the puppetmaster or the network is down, the node wouldn't have been able to check in anyway.

I was also wondering if PuppetDB could be used for this way, or even a general purpose database with lock rows.



For what it's worth, Puppet appears to use straight O_EXCL locking, which, indeed, does not work reliably on NFS file systems.  It might be amusing to file an RFE ticket for this, but that won't get you a solution in time to serve your present need.

I'd be surprised if PuppetDB would serve this purpose.  It's an interesting problem.


John

Reply all
Reply to author
Forward
0 new messages