Puppet Actions in Parallel?

2,524 views
Skip to first unread message

Jon Forrest

unread,
Mar 4, 2012, 10:53:23 PM3/4/12
to puppet...@googlegroups.com
As many learning Puppet for the first time, the fact that the
order of actions is undefined unless specific metaparameters
like 'require' are used. Fine.

This got me to thinking. The GNU make program has the "-j"
option, which allows make to start more than one action
in parallel if the actions are at the same dependency level.
I've used this option on a 48-core machine to great benefit.

So, why can't there be a similar option in the puppet agent?
I can easily imagine how this could substantially reduce the
length of time for a puppet run.

(The make "-j" option allows an optional numeric value, which, if
given, is the maximum number of actions that can be run
in parallel. If no numeric is given, then there's no limit
to the number of parallel actions).

I did a quick review of the Puppet manual but I didn't see
anything like this. Am I missing something? Is this a good
idea?

Cordially,
Jon Forrest

Brian Troutwine

unread,
Mar 4, 2012, 11:12:57 PM3/4/12
to puppet...@googlegroups.com
An interesting thought. If you enable graph output in your puppet.conf

http://docs.puppetlabs.com/references/stable/configuration.html#graph

you can identify those sub-graphs that could run in parallel pretty
easily, just by eyeballing it. Parallel execution is wonderful for an
optimizing compiler because you're strongly CPU limited. I'd be
willing to bet that a parallel puppet agent would be of less use: once
the catalog is compiled on the master, puppet's execution time is
going to be limited by network and disk IO. Probably. Running puppet
agent through DTrace/SystemTap would be instructive.

Also, some resources are implicitly exclusive: two Packages with 'apt'
providers cannot be run in parallel because Debian/Ubuntu keeps a
global lock on... hmm, I've forgotten what, exactly. It's there,
though. Ensuring that implicitly parallel resources block one another
would increase the implementation complexity of Puppet, of end-user
modules or, as is likely, both.

Aside from all other considerations, multi-core parallelism in MRI is
not so great. JRuby's better, being hosted on the JVM, but, well,
MRI's RAM consumption is already bad enough.

> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To post to this group, send email to puppet...@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
>

--
Brian L. Troutwine

Jon Forrest

unread,
Mar 5, 2012, 12:17:02 AM3/5/12
to puppet...@googlegroups.com
On 3/4/2012 8:12 PM, Brian Troutwine wrote:

(First of all, sorry for the jumbled first paragraph. I was
trying to do two things at once).

> An interesting thought. If you enable graph output in your puppet.conf
>
> http://docs.puppetlabs.com/references/stable/configuration.html#graph
>
> you can identify those sub-graphs that could run in parallel pretty
> easily, just by eyeballing it. Parallel execution is wonderful for an
> optimizing compiler because you're strongly CPU limited. I'd be
> willing to bet that a parallel puppet agent would be of less use: once
> the catalog is compiled on the master, puppet's execution time is
> going to be limited by network and disk IO. Probably. Running puppet
> agent through DTrace/SystemTap would be instructive.

My intuition, albeit as a novice, makes me question this.
Just look at how well parallel make improves things.
There would probably be a point of diminishing returns,
or even negative returns, but that's true for "make -j" also.
Just "puppet -j2" could result in enough of a benefit to
make this approach worthwhile.

> Also, some resources are implicitly exclusive: two Packages with 'apt'
> providers cannot be run in parallel because Debian/Ubuntu keeps a
> global lock on... hmm, I've forgotten what, exactly. It's there,
> though. Ensuring that implicitly parallel resources block one another
> would increase the implementation complexity of Puppet, of end-user
> modules or, as is likely, both.

I see what you mean. The same is true for yum, which is what
I'm most familiar with. Maybe puppet could do something so
that these kind of inherent sequential processes with yum/apt (and
anything else) could be reflected in the graph.

> Aside from all other considerations, multi-core parallelism in MRI is
> not so great. JRuby's better, being hosted on the JVM, but, well,
> MRI's RAM consumption is already bad enough.

I'm talking about process-level parallelism so I don't think
the virtual machine implementation matters here.

Jon

Andrew Pennebaker

unread,
Feb 14, 2014, 11:52:59 PM2/14/14
to puppet...@googlegroups.com, nob...@gmail.com
This is a fantastic idea! Any progress on this?

Trevor Vaughan

unread,
Feb 15, 2014, 11:00:30 AM2/15/14
to puppet...@googlegroups.com, nob...@gmail.com
Wow, that's quite some dead thread resurrection!

I remember this being discussed in the past and part of the issue is that so many parts of what Puppet is doing are I/O bound so I'm not really sure what the parallelism would gain you outside of destroying your I/O channels.

That said, I think it would be nice to have supported for those that do have systems that can take advantage of it. We may even see that a -j2 would give just enough balance between I/O destruction and system acceleration.

Trevor






--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/b836c960-09bc-48f5-8b5f-407d42233b10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
tvau...@onyxpoint.com

-- This account not approved for unencrypted proprietary information --

Jon Forrest

unread,
Feb 15, 2014, 12:35:56 PM2/15/14
to puppet...@googlegroups.com, nob...@gmail.com


On Friday, February 14, 2014 8:52:59 PM UTC-8, Andrew Pennebaker wrote:
This is a fantastic idea! Any progress on this?

I'm the original poster. It's been a while since I posted this.
iIt didn't generate nearly as much interest as
I was hoping. There was some concern that even though
various actions were at the same level in the dependency graph,
that they would have side effects that required serial execution.
One example of this is that there can only be one apt-get install at
a time. There are probably others.

So, it wouldn't be easy for Puppet to know which actions
it should do safely in parallel.

If any other the experts have anything to add I'd love
to hear it.

Jon Forrest
 

Trevor Vaughan

unread,
Feb 16, 2014, 11:43:53 AM2/16/14
to puppet...@googlegroups.com, nob...@gmail.com
It seems like that should be a relatively easy issue to overcome conceptually anyway.

* Add a metaparamater 'parallel', that defaults to 'true' or 'false' depending on the specific native type being managed.
** For instance, YUM and DEB packages (probably most packages) would be mandatory 'false' and non-overridable.
** File types would default to 'true' and be overridable.

In this way, explicit ordering would always work and items that don't need explicit ordering could be done in parallel.

This partially comes back to some of the subgraph discussions that we had back in 2010. Once you boil your tree down to a bunch of independent subgraphs, you can execute all of them in parallel so long as all of your resource ordering is correct and internal consistency is met.

In the Red Hat family case: yumrepo -> yum (yum serial) -> explicit dependencies.

Could it cause issues? Yes! But that's why these types of things are off by default. For those of us that have the power and want the capability, let's get the gains, for those that need a more risk-averse environment, don't enable it.

I fear that, with the move toward making everything more procedural by default, this may not happen but it would still be nice to have.

Trevor


--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

jcbollinger

unread,
Feb 17, 2014, 9:20:37 AM2/17/14
to puppet...@googlegroups.com


On Sunday, February 16, 2014 10:43:53 AM UTC-6, Trevor Vaughan wrote:

[...] this may not happen but it would still be nice to have.




Well, I think the question that killed this thread the first time boils down to "would it really?".  The speculation at the time was that parallel execution would produce disappointing wall-time gains, based on the assertion that the catalog application process is largely I/O bound.  There were also some assertions that Ruby doesn't do shared-memory parallelism very well.  Nobody reported any actual analysis of any of that, though.

Whatever benefit there might be needs to be weighed against the costs, which include not just the direct costs of developing the feature, but also the ongoing costs of added code complexity and increased maintenance burden.

Were I PL, I would be very hesitant to devote resources to such a speculative project as I think this would be.  Were I a user interested in such a feature and having time available, I might consider having a go the project myself.  Working code trumps predictive analysis every time.


John

Deepak Giridharagopal

unread,
Feb 17, 2014, 9:35:57 AM2/17/14
to puppet...@googlegroups.com, nob...@gmail.com
On Sat, Feb 15, 2014 at 9:00 AM, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
Wow, that's quite some dead thread resurrection!

I remember this being discussed in the past and part of the issue is that so many parts of what Puppet is doing are I/O bound so I'm not really sure what the parallelism would gain you outside of destroying your I/O channels.

That said, I think it would be nice to have supported for those that do have systems that can take advantage of it. We may even see that a -j2 would give just enough balance between I/O destruction and system acceleration.

There was a recent, pretty in-depth discussion in puppet-dev about a related concept: batched application of resources:

 

Jon Forrest

unread,
Feb 18, 2014, 12:02:11 PM2/18/14
to puppet-users
On Mon, Feb 17, 2014 at 6:20 AM, jcbollinger <John.Bo...@stjude.org> wrote:

> Well, I think the question that killed this thread the first time boils down
> to "would it really?". The speculation at the time was that parallel
> execution would produce disappointing wall-time gains, based on the
> assertion that the catalog application process is largely I/O bound. There
> were also some assertions that Ruby doesn't do shared-memory parallelism
> very well. Nobody reported any actual analysis of any of that, though.

Right. Without such analysis it's hard to know if this idea is worth following
up on today.

But, one thing to keep in mind is that systems are always changing. An I/O bound
system of today might not be I/O bound tomorrow as technological improvements
appear. Having a computer with available resources unable to apply to resources
to a Puppet run (or anything else) is wasteful. In time, the lack of
client parallelization
could be a competitive weakness as Puppet competes in the marketplace.
(I don't know
what the status of client parallelization is in the competition right now).

Jon Forrest

Michael Shaw

unread,
Jan 30, 2017, 7:40:24 AM1/30/17
to Puppet Users, jlfo...@berkeley.edu
The very clear use case for me is in the puppet management of infrastructure.

Take the case of puppet being used to automate the creation of a database server. The Command to initiate the creation takes around 2 seconds to complete.  The process to actually create the database can take 5 minutes.  This has a problem when you want to create a suitable provider.

You can complete the creation immediately after the creation command completes.  nice and quick.  However you cannot have anything else in your puppet manifest that depends on this database server actually existing and being accessible.

Or you could block on the creation, polling every 5 seconds to confirm the current state of the creation task.  This is robust, and ensures that when the creation task completes, the database is available for other puppet actions.  This is great if you are only managing 1 database server.  What about if there are 5 or 6? the creation of these happens sequentially, creating a very long running script.

The best outcome would be to have a way to write  a provider to say that all instances of this type are independent, and we can run X in parallel.

Kind regards,

Michael

Henrik Lindberg

unread,
Jan 30, 2017, 12:06:40 PM1/30/17
to puppet...@googlegroups.com
On 30/01/17 12:58, Michael Shaw wrote:
> The very clear use case for me is in the puppet management of
> infrastructure.
>
> Take the case of puppet being used to automate the creation of a
> database server. The Command to initiate the creation takes around 2
> seconds to complete. The process to actually create the database can
> take 5 minutes. This has a problem when you want to create a suitable
> provider.
>
> You can complete the creation immediately after the creation command
> completes. nice and quick. However you cannot have anything else in
> your puppet manifest that depends on this database server actually
> existing and being accessible.
>
> Or you could block on the creation, polling every 5 seconds to confirm
> the current state of the creation task. This is robust, and ensures
> that when the creation task completes, the database is available for
> other puppet actions. This is great if you are only managing 1 database
> server. What about if there are 5 or 6? the creation of these happens
> sequentially, creating a very long running script.
>
> The best outcome would be to have a way to write a provider to say that
> all instances of this type are independent, and we can run X in parallel.
>
> Kind regards,
>
> Michael
>

We are very much aware of use cases like this where the parallelism is
mostly waiting from puppet's perspective and waiting is by far much
faster to do in parallel :-)

It is however very difficult to implement given the current way the
types and providers framework is implemented. To start with, it is
complex to make different providers talk to each other and communicate
about a dynamic (evolving) intermediate state. To make it work you would
need to model the different steps in the workflow as separate resources
thus enabling waiting on an earlier step to finish (join point) if you
need that.
You could also implement the provider as an external service that does
the actual work and you would have providers in puppet talk to this
asynchronous service (a *lot* of work).

A complicating factor is that Puppet is inherently single-threaded.
There are many constructs that are not safe for multithreading.

I would very much like us to have a types & providers framework that can
do this but it requires a major new take and a new implementation.

- henrik
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-users...@googlegroups.com
> <mailto:puppet-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/951c8a89-662d-4da0-9154-44443c6ee4cf%40googlegroups.com
> <https://groups.google.com/d/msgid/puppet-users/951c8a89-662d-4da0-9154-44443c6ee4cf%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

Reply all
Reply to author
Forward
0 new messages