puppet catalog compilation job queue idea

Patrick Hemmer

unread,

Jan 9, 2014, 7:23:23 PM1/9/14

to puppet...@googlegroups.com

There's been an idea floating in my mind for quite a while now about using a job queue for compiling puppet catalogs. I just mentioned the idea on IRC and a few people really liked the idea, so I thought I'd bring it up here and get other thoughts on it.

The idea is that instead of puppet masters compiling catalogs on demand, they would operate out of a job queue:

When a node's cert is signed, the compilation queue gets a job for that node.
When a compile job finishes, that node gets added to the queue at the very end. This results in the puppet master constantly compiling catalogs.
When a catalog changes, the puppet master notifies the node that it needs to fetch the updated catalog and do a puppet run.
When an exported resource changes, any nodes which collect that resource should get their compilation jobs moved to the front of the queue. (This should be optional, as the behavior might not be desired)
Nodes would still run puppet out of a cron job, but they would use the cached catalog, not request one from the master.
If any of the facts used in the catalog change, the node would notify the master and the compilation job would move to the front of the queue.

In the world of cloud computing, this becomes extremely beneficial, though it still is advantageous for traditional datacenter environments:

Puppet runs are computationally expensive for the client. With this design, the client could have its cron job set to a very infrequent value (once per hour or such). This way small instances without much available resources wont waste them on puppet runs that don't change anything.
The masters could be very beefy instances with a lot of CPU. By constantly generating catalogs, you ensure that the system's resources aren't sitting idle and being wasted.
By using a queuing mechanism, you can determine how loaded the masters are. If the oldest job in the queue is X minutes old, you can spin up another master.
With the current method of generating catalogs on demand, if a lot of nodes request a catalog at the same time, it can cause compilations to go very slow. If they go too slow, the client will get a timeout. The client will then go to the master again to request another catalog when the first generation completed fine, it just took a while. With queuing, the master can compile exactly X amount of catalogs at the same time. It can even be configured to only start a new compilation job if the system load is less than X.
Since puppet has to serve up both files and catalogs, if all the available processes are used compiling catalogs, requests for files end up hanging. By moving catalog compilations to a background worker, file requests will be faster.
(you can implement a workaround for this: I have 2 puppet master pools, catalog requests get routed to one pool, everything else to the other pool, but this isn't a simple or standard design)

Now most of this could be done on top of puppet. You could create a system which reads from a queue, performs a `puppet master compile`, notifies the client if it changes, etc. But there are a few sticky points which it wouldn't be able to do (just features, none would prevent the system from working):

There is no way of determining which facts were used in a catalog compilation. A client could issue a catalog compilation request when any fact changes, but facts like `uptime_seconds` always change, so it would always result in a catalog compilation. The proper way to handle this would be to monitor when a fact variable is read. Then add a list of "facts used" and their value to the resulting catalog. Then the puppet agent can use that to see if the facts used have changed.
This may not be a significant issue though. If the `puppet agent` cron job is set to something very infrequent, it won't be requesting catalog compilations very often.
The catalog doesn't indicate whether a resource was collected from another node. So when an exported resource changes, you wouldn't be able to find nodes which collect those exported resources and recompile those catalogs. Now this isn't a big deal since the current method of using cron jobs can't even do this.
The catalog does indicate that a resource is exported, so you could look for nodes with resources of the same type & title as the exported resource. But it's possible for a resource to have the same type & title of an exported resource as long as the exported resource isn't collected during the same catalog compilation, so this method might end up recompiling catalogs which don't use the exported resource.

Puppet could probably be monkey patched to address these and add the data to the catalog. The script system could then extract the info, while the puppet agent would just ignore the data.

Thoughts? I might be up for coding this, but it's would be low on the priority list and I wouldn't get to it for a long time.

-Patrick

jcbollinger

unread,

Jan 10, 2014, 3:57:03 PM1/10/14

to puppet...@googlegroups.com

The key idea here seems to be to improve the master's average response time for catalog requests by pre-compiling and caching catalogs, then serving those cached catalogs on demand. There is a secondary idea of the master automatically keeping its cache fresh, and a tertiary idea of the master hinting to agents when it thinks their most recently-retrieved catalog may be out of date. There is a separate, but coupled idea that the agent might keep track of when significant facts change, and thereby be better informed from its side whether its catalog is out of date.

Fundamentally, this is really about catalog caching. The specific suggestion to implement the master's part of it via a continuously cycling queue of compilation jobs is mostly an implementation detail. Although it's hard to argue with the high-level objectives, I'm not enthusiastic about the proposed particulars.

As far as I can determine, the main point of a continuously-cycling queue of compile jobs is to enable the master to detect changes in the compiled results. That's awfully brute-force. Supposing that manifests, data, and relevant node facts change infrequently, compilation results also will rarely change, and therefore continuously recompiling catalogs would mostly be a waste of resources. The master could approach the problem a lot more intelligently by monitoring the local artifacts that contribute to catalog compilation -- ideally on a per-catalog basis -- to inform it when to invalidate cached catalogs. Exported resource collections present a bit of a challenge in that regard, but one that should be surmountable.

Furthermore, if we accept that master-side catalog caching will be successful in the first place, then it is distinctly unclear that compiling catalogs prospectively would provide a net advantage over compiling them on demand. Prospective compilation's only real advantage over on-demand compilation is better load leveling in the face of clumped catalog requests (but only where multiple requests in the clump cannot be served from cache). On the other hand, prospective compilation will at least occasionally yield slower catalog service as a catalog request has to wait on other catalog compilations. Also, prospective compilation will sometimes devote resources to compiling catalogs that ultimately go unused.

John

Patrick

unread,

Jan 12, 2014, 8:31:03 PM1/12/14

to puppet...@googlegroups.com

The key idea is also about not wasting resources. The clients waste a lot of resources doing puppet runs that change nothing. The puppet master is generally a box meant to compile catalogs. It's going to have the resources, and so it should spend it's time doing just that so the clients don't waste their time.

You can have a hundred clients frequently doing puppet runs because they can't know if the catalog has been updated (which also uses the masters's resources), or you can have a few masters chewing through the catalogs and notify the clients when they need to run. Which do you think is more efficient?

Fundamentally, this is really about catalog caching. The specific suggestion to implement the master's part of it via a continuously cycling queue of compilation jobs is mostly an implementation detail. Although it's hard to argue with the high-level objectives, I'm not enthusiastic about the proposed particulars.

As far as I can determine, the main point of a continuously-cycling queue of compile jobs is to enable the master to detect changes in the compiled results. That's awfully brute-force. Supposing that manifests, data, and relevant node facts change infrequently, compilation results also will rarely change, and therefore continuously recompiling catalogs would mostly be a waste of resources. The master could approach the problem a lot more intelligently by monitoring the local artifacts that contribute to catalog compilation -- ideally on a per-catalog basis -- to inform it when to invalidate cached catalogs. Exported resource collections present a bit of a challenge in that regard, but one that should be surmountable.

There is no way around it. You can't just rely upon facts or exported resources to change before re-compiling a catalog. Lets say you have a function that generates different results based on some external variable. o fact or exported resource has changed, but the catalog will be different. While this may not be common, I think it would be a bad idea to build an implementation that can't handle it.

However it would be trivial to change it so that instead of constantly compiling catalogs, it re-compiles any catalog that is more than X minutes old. This would be equivalent to the puppet agent cron job on a client.

Furthermore, if we accept that master-side catalog caching will be successful in the first place, then it is distinctly unclear that compiling catalogs prospectively would provide a net advantage over compiling them on demand. Prospective compilation's only real advantage over on-demand compilation is better load leveling in the face of clumped catalog requests (but only where multiple requests in the clump cannot be served from cache). On the other hand, prospective compilation will at least occasionally yield slower catalog service as a catalog request has to wait on other catalog compilations.

I completely lost you. Why would it be slower? Pre-compiling catalogs means the client doesn't have to wait at all. Instead of having to wait 30 seconds for a catalog to be compiled, it'll take less than 1 second, whatever the response time is to fetch it from the cache and transfer it over the network.

Also, prospective compilation will sometimes devote resources to compiling catalogs that ultimately go unused.

That's the goal. If resources are unused, then they're being wasted.

John

jcbollinger

unread,

Jan 13, 2014, 11:17:00 AM1/13/14

to puppet...@googlegroups.com

On Sunday, January 12, 2014 7:31:03 PM UTC-6, Patrick wrote:

On Friday, January 10, 2014 3:57:03 PM UTC-5, jcbollinger wrote:

The key idea here seems to be to improve the master's average response time for catalog requests by pre-compiling and caching catalogs, then serving those cached catalogs on demand. There is a secondary idea of the master automatically keeping its cache fresh, and a tertiary idea of the master hinting to agents when it thinks their most recently-retrieved catalog may be out of date. There is a separate, but coupled idea that the agent might keep track of when significant facts change, and thereby be better informed from its side whether its catalog is out of date.

The key idea is also about not wasting resources. The clients waste a lot of resources doing puppet runs that change nothing. The puppet master is generally a box meant to compile catalogs. It's going to have the resources, and so it should spend it's time doing just that so the clients don't waste their time.
You can have a hundred clients frequently doing puppet runs because they can't know if the catalog has been updated (which also uses the masters's resources), or you can have a few masters chewing through the catalogs and notify the clients when they need to run. Which do you think is more efficient?

Continuously recompiling catalogs on the master is not a good way to approach the issue of client-side resource consumption.

For one thing, the client already caches a copy of its most recently applied catalog. It would be straightforward to have the client check its own cached catalog against the new one to see whether anything changed. That would be simpler to implement, and would spread out the workload more evenly over the whole site.

More importantly, however, a change in catalog is by no means the only thing that requires the agent to act. One of its key responsibilities is to maintain the node's target state in the face of unwanted changes applied by other processes. Checking whether it needs to do that for any declared resource is an essential component of each catalog run -- indeed, that is the only component in the common case that nothing does need to be changed. In other words, that's usually what consumes the bulk of the agent's runtime. There are only two ways to reduce that, both of which are largely orthogonal to catalog changes:

Improve nodes' catalogs to be less burdensome, and
Increase the agent's run internal.

Note in particular that (2) is not much related timeliness of obtaining changed catalogs. If rapid response to catalog changes is essential then you already need some means to trigger catalog runs outside of or instead of scheduled periodic runs, and in that case the scheduled run interval isn't even relevant to catalog freshness. On the other hand, if rapid response to catalog changes is not essential then that's not a big factor in choosing a run interval.

Moreover, where catalog runtimes are large, there are usually a lot of gains available via improving catalogs (item 1). That's usually the best solution to excessive agent runtimes.

Fundamentally, this is really about catalog caching. The specific suggestion to implement the master's part of it via a continuously cycling queue of compilation jobs is mostly an implementation detail. Although it's hard to argue with the high-level objectives, I'm not enthusiastic about the proposed particulars.

As far as I can determine, the main point of a continuously-cycling queue of compile jobs is to enable the master to detect changes in the compiled results. That's awfully brute-force. Supposing that manifests, data, and relevant node facts change infrequently, compilation results also will rarely change, and therefore continuously recompiling catalogs would mostly be a waste of resources. The master could approach the problem a lot more intelligently by monitoring the local artifacts that contribute to catalog compilation -- ideally on a per-catalog basis -- to inform it when to invalidate cached catalogs. Exported resource collections present a bit of a challenge in that regard, but one that should be surmountable.
There is no way around it. You can't just rely upon facts or exported resources to change before re-compiling a catalog. Lets say you have a function that generates different results based on some external variable. o fact or exported resource has changed, but the catalog will be different. While this may not be common, I think it would be a bad idea to build an implementation that can't handle it.

It would be possible to keep a digest for each catalog compilation of everything that could possibly change the result, even function calls. At some point it would cease to provide much performance advantage to do so, however, and before that it would probably become prohibitive to develop and maintain. So I'll accept that it's not practical for the master to determine whether a cached catalog is stale, except by recompiling.

But how, then, is it any more reasonable to serve a cached catalog that was built via a continuously-cycling catalog compiler? The master still cannot guarantee that it is up to date. We have just established that demand-built catalogs are the only ones that we can be confident in being up to date as of the time of the request.

However it would be trivial to change it so that instead of constantly compiling catalogs, it re-compiles any catalog that is more than X minutes old. This would be equivalent to the puppet agent cron job on a client.

Roughly. It would also compile catalogs for nodes that have been taken out of service, nodes with an intentionally longer run interval (or that run the agent itself only on demand), and probably for other nodes I haven't thought of that otherwise would not have catalogs compiled for them. And the master still could not serve those catalogs with confidence that they were up to date.

Furthermore, if we accept that master-side catalog caching will be successful in the first place, then it is distinctly unclear that compiling catalogs prospectively would provide a net advantage over compiling them on demand. Prospective compilation's only real advantage over on-demand compilation is better load leveling in the face of clumped catalog requests (but only where multiple requests in the clump cannot be served from cache). On the other hand, prospective compilation will at least occasionally yield slower catalog service as a catalog request has to wait on other catalog compilations.
I completely lost you. Why would it be slower? Pre-compiling catalogs means the client doesn't have to wait at all. Instead of having to wait 30 seconds for a catalog to be compiled, it'll take less than 1 second, whatever the response time is to fetch it from the cache and transfer it over the network.

The client has to wait if its facts have changed, unless it's willing to accept a known-stale catalog. It also has to wait (or should) if the master knows its cached catalog is stale, but the recompilation job hasn't yet reached the front of the queue. Even if the request gets pushed to the front of the queue in such cases (which doesn't necessarily work well when the master is hit by a bunch of requests at nearly the same time), the continuously-running compiler almost certainly is initially busy compiling some other node's catalog. The client therefore has to wait longer than it otherwise would.

And of course, that all assumes that the master actually can tell whether a node's cached catalog is stale, which I've agreed is impractical.

Also, prospective compilation will sometimes devote resources to compiling catalogs that ultimately go unused.
That's the goal. If resources are unused, then they're being wasted.

If resources go to building something that goes unused then they are certainly wasted. Some resources, however, are conserved for future use if they go unused now, or can be used for other purposes if not allocated to catalog compilation, or carry a monetary, environmental, or other cost to use that would be better avoided where possible.

John

Felix Frank

unread,

Jan 19, 2014, 5:48:29 PM1/19/14

to puppet...@googlegroups.com

Hi,

you both raise a couple of good points.

All things considered, I lean towards John's point of view. There's much
to say for on-demand compilation.

- resource use scales with number of agents
- scaling can be influenced via intervals
- admins can predict the need for recompilation and trivially schedule
it by having agents check in

Yes, there can be peak load when many agents check in during a short
interval. I can't really see a way around that. I would consider it more
problematic to make it impossible to trigger a couple of quick
recompilations for a large number of nodes (think ad hoc changes to
larger cluster or cloud).

Cheers,
Felix

SG Madurai

unread,

Mar 6, 2016, 8:50:32 PM3/6/16

to Puppet Users

Hi,

did we further explore options/alternatives with respect to pre-caching of catalogs by puppet master to improve agent run times.
Or Is it like this was never looked at after this discussion.

I see an accepted ticket from Robin Bowes: https://projects.puppetlabs.com/issues/4486
and trying to see the current status of this 'Accepted' feature request.

jcbollinger

unread,

Mar 7, 2016, 9:29:05 AM3/7/16

to Puppet Users

On Sunday, March 6, 2016 at 7:50:32 PM UTC-6, SG Madurai wrote:

Hi,

did we further explore options/alternatives with respect to pre-caching of catalogs by puppet master to improve agent run times.
Or Is it like this was never looked at after this discussion.

I see an accepted ticket from Robin Bowes: https://projects.puppetlabs.com/issues/4486
and trying to see the current status of this 'Accepted' feature request.

As far as I can tell, JIRA ticket 4486 was never transitioned to the new ticket system. I have no explanation for that. Meanwhile, however, Puppet has gone through two major version number increments and substantial concomitant architectural changes, with a great deal of attention to performance of the master.

As far as I am aware, the master still does not serve catalogs from a cache on its side, but I don't find that particularly surprising, because my conclusion from the previous round of discussion was that doing so is not practical -- at least, not if one wants to be confident of serving catalogs equivalent to demand-built ones.

In any event, catalog compilation is rarely the determining factor in agent runtimes. This may be another reason why the feature request you referenced got little love, despite being accepted.

John

SG Madurai

unread,

Mar 7, 2016, 8:57:33 PM3/7/16

to Puppet Users

Hi John, Thank you for the update.

Pardon me if i am asking about things that have been clarified/ settled already.

From what i understand, agent run times are primarily determined by
- catalog compilation time at master
- the time for agent to apply catalog on its node

So was basically wondering if there is an option to separate these 2 functions and manage these 2 independent of each other (at times convenient for each of these activities)

If these concerns shouldn't arise with running multiple puppet masters w/ puppet db (or by imply upgrading...we are on v3.8 btw), then will explore that option first.

I couldn't be sure if these configuration options (multiple puppet masters w/ puppet db) by itself can take care of the issues we are facing with agent runs in our environment
(timeouts, slowness..)

We have one puppet master (v3.8) managing 150-200 nodes in an environment.

BEFORE actually implementing this setup (multiple puppetmasters w/ puppet db) in our environment, i wanted to see if this is all there is to do - to fix these timeouts/ delays we see in our agent runs.

PS:
I did read thru http://www.aosabook.org/en/puppet.html but I am still new to puppet.

Looking into what is the right way to config puppet for our env. I am yet to see if there are Performance tuning guides/ tips for a puppet managed environment.
Something insightful for deployment scenarios/ options - and relative advantages - will help.

I understand this is open source to start with, and can't expect *everything* from the community etc but just looking to skip some trial and error cycles where community insight already exists.

jcbollinger

unread,

Mar 8, 2016, 9:51:31 AM3/8/16

to Puppet Users

On Monday, March 7, 2016 at 7:57:33 PM UTC-6, SG Madurai wrote:

Hi John, Thank you for the update.

Pardon me if i am asking about things that have been clarified/ settled already.

From what i understand, agent run times are primarily determined by
- catalog compilation time at master
- the time for agent to apply catalog on its node

Both of those are contributors. The former is rarely a major one. There is also time spent by the agent computing facts, which is usually even less, but can be costly if costly custom facts are installed.

Also, catalog application often is not an agent-only activity, as it commonly involves the agent obtaining files from the master's file server. This can be very expensive for both the agent and the master.

So was basically wondering if there is an option to separate these 2 functions and manage these 2 independent of each other (at times convenient for each of these activities)

Nodes have as much control as they want to exercise of when and how often they perform catalog runs. If they run the agent in daemon mode then they can configure the run interval, but they also have the option of running it at the times they choose via a scheduler, such as cron, or on-demand either manually or via a remote-control system such as MCollective.

The master does perform some caching to speed catalog building, but as I already said, it is impractical for it to cache whole catalogs for direct service to clients. The problem here lies in determining accurately and efficiently when cached catalogs are stale.

If these concerns shouldn't arise with running multiple puppet masters w/ puppet db (or by imply upgrading...we are on v3.8 btw), then will explore that option first.

If your master(s) do not adequately serve the catalog request load, then the quickest solution is often to empower them by running more puppetmatser threads, adding CPU, adding RAM, increasing network bandwidth, and/or shutting down other services. "Shutting down other services" might include moving PuppetDB to a separate machine. Do also attend to the possibility of uneven load: some kinds of site configurations lend themselves to highly uneven load on the master, such that it sometimes gets transiently overloaded even though it has sufficient capacity for its average load.

If individual catalog compilations are taking a long time, then it is probably worthwhile investigating why that is. It may well be the case that you can realize substantial improvements by modifying your manifest set. If the master is bogged down at the file server then you are probably managing either large numbers of files or very large files, or both, in an inefficient way; this is an area where it is relatively easy to shoot yourself in the foot.

If none of those alternatives yield the catalog service bandwidth you need, then the next logical step is multiple masters.

I couldn't be sure if these configuration options (multiple puppet masters w/ puppet db) by itself can take care of the issues we are facing with agent runs in our environment
(timeouts, slowness..)

We have one puppet master (v3.8) managing 150-200 nodes in an environment.

That's a fairly substantial load for a single master, but whether it's at or beyond the capacity you should expect depends greatly on your manifests, data, and nodes.

In any event, you started off in the wrong direction by asking about agent run times. If agents' catalog requests are being serviced slowly, and especially if they sometimes time out, then your problem is likely to be an overtaxed master. Long catalog-building times can contribute to that, but so can the overall number of requests, uneven load, competing jobs, and other factors.

BEFORE actually implementing this setup (multiple puppetmasters w/ puppet db) in our environment, i wanted to see if this is all there is to do - to fix these timeouts/ delays we see in our agent runs.

As with any optimization problem, you are best off proceeding in a manner that is informed by data about what parts of the process are slowest. To that end, you could consider enabling the built-in profiler on the master. You should also look at the overall load on the machine -- are you maxing out your available CPU? your network bandwidth? your physical RAM?

Also, multiple masters and PuppetDB are separate considerations. If you are not using PuppetDB then you probably should be using it, especially if you rely on external resources, regardless of whether you have multiple masters. Also, if you are still running the m,aster as a Rack application -- for example, under Apache / Passenger -- then you should consider running it under Puppet Server instead.

John

R.I.Pienaar

unread,

Mar 8, 2016, 10:02:40 AM3/8/16

to puppet-users

I believe the thing thats happening here is called Direct Puppet, there
were some puppet conf talks about this you might want to look at the videos.

But it's around reworking the compile flow so you can pre-compile things, re-run
earlier compiled things, redo the static catalogs and even rewriting the compiler
in C++

There are stuff happening on Jira at the moment, but I'd guess lots of this will
be PE only if recent blogs are anything to go by

> that is informed by *data* about what parts of the process are slowest. To

> that end, you could consider enabling the built-in profiler on the master

> <https://puppetlabs.com/blog/tune-puppet-performance-profiler>. You should

> also look at the overall load on the machine -- are you maxing out your
> available CPU? your network bandwidth? your physical RAM?
>
> Also, multiple masters and PuppetDB are separate considerations. If you
> are not using PuppetDB then you probably should be using it, especially if
> you rely on external resources, regardless of whether you have multiple
> masters. Also, if you are still running the m,aster as a Rack application
> -- for example, under Apache / Passenger -- then you should consider
> running it under Puppet Server

> <http://docs.puppetlabs.com/puppetserver/1.1/> instead.
>
>
> John
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to puppet-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/7579faea-ed84-4f43-9867-d53722b99a6f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

SG Madurai

unread,

Mar 8, 2016, 8:20:51 PM3/8/16

to Puppet Users

Hi John, thank you for the explanation, these are very useful insights!

Ok, let me follow this through and see where i get it and update you all here.

SG Madurai

unread,

Mar 14, 2016, 5:03:14 AM3/14/16

to Puppet Users

Hi Pienaar, Thank you for noting that, will check this a little later - Direct Puppet does sound v interesting
(but as of now, we are on OSS so lets see)

Eric Sorenson

unread,

Mar 15, 2016, 1:53:43 PM3/15/16

to Puppet Users

The first and most significant chunk of the direct puppet work, namely a production-ready version of "static catalogs" is going out in Puppet 4.4.0. You can preview the documentation for it here: https://github.com/puppetlabs/puppet-docs/blob/master/source/puppet/4.4/reference/static_catalogs.md

Future work is around the other stuff you mentioned - especially precompilation.

--eric0

> to puppet-users+unsubscribe@googlegroups.com.

jcbollinger

unread,

Mar 16, 2016, 11:08:17 AM3/16/16

to Puppet Users

On Tuesday, March 15, 2016 at 12:53:43 PM UTC-5, Eric Sorenson wrote:

The first and most significant chunk of the direct puppet work, namely a production-ready version of "static catalogs" is going out in Puppet 4.4.0. You can preview the documentation for it here: https://github.com/puppetlabs/puppet-docs/blob/master/source/puppet/4.4/reference/static_catalogs.md

I was a bit skeptical, as you might guess from my other comments on this thread, but the static catalogs piece looks pretty cool. It's not quite what I expected from the name, but it makes a lot of sense, and it looks useful. I can see some directions one might go from there.

Future work is around the other stuff you mentioned - especially precompilation.

I guess I'll need to go read up on Direct Puppet. I see PUP-4889, but are there any flat(ter) documents for the pieces other than static catalogs? I'm particularly keen to know how the system is going to address the problem of catalog freshness, which seems like it would be in the "Catalog Metadata" regime.

Thanks,
John

J Payne

unread,

Mar 16, 2016, 3:02:11 PM3/16/16

to Puppet Users

On Thursday, January 9, 2014 at 6:23:23 PM UTC-6, Patrick Hemmer wrote:
> There's been an idea floating in my mind for quite a while now about using a job queue for compiling puppet catalogs. I just mentioned the idea on IRC and a few people really liked the idea, so I thought I'd bring it up here and get other thoughts on it.
>
>
> The idea is that instead of puppet masters compiling catalogs on demand, they would operate out of a job queue:

> When a node's cert is signed, the compilation queue gets a job for that node.When a compile job finishes, that node gets added to the queue at the very end. This results in the puppet master constantly compiling catalogs.When a catalog changes, the puppet master notifies the node that it needs to fetch the updated catalog and do a puppet run.When an exported resource changes, any nodes which collect that resource should get their compilation jobs moved to the front of the queue. (This should be optional, as the behavior might not be desired)Nodes would still run puppet out of a cron job, but they would use the cached catalog, not request one from the master.If any of the facts used in the catalog change, the node would notify the master and the compilation job would move to the front of the queue.

> In the world of cloud computing, this becomes extremely beneficial, though it still is advantageous for traditional datacenter environments:
> Puppet runs are computationally expensive for the client. With this design, the client could have its cron job set to a very infrequent value (once per hour or such). This way small instances without much available resources wont waste them on puppet runs that don't change anything.

> The masters could be very beefy instances with a lot of CPU. By constantly generating catalogs, you ensure that the system's resources aren't sitting idle and being wasted.By using a queuing mechanism, you can determine how loaded the masters are. If the oldest job in the queue is X minutes old, you can spin up another master.With the current method of generating catalogs on demand, if a lot of nodes request a catalog at the same time, it can cause compilations to go very slow. If they go too slow, the client will get a timeout. The client will then go to the master again to request another catalog when the first generation completed fine, it just took a while. With queuing, the master can compile exactly X amount of catalogs at the same time. It can even be configured to only start a new compilation job if the system load is less than X.Since puppet has to serve up both files and catalogs, if all the available processes are used compiling catalogs, requests for files end up hanging. By moving catalog compilations to a background worker, file requests will be faster.

> (you can implement a workaround for this: I have 2 puppet master pools, catalog requests get routed to one pool, everything else to the other pool, but this isn't a simple or standard design)
> Now most of this could be done on top of puppet. You could create a system which reads from a queue, performs a `puppet master compile`, notifies the client if it changes, etc. But there are a few sticky points which it wouldn't be able to do (just features, none would prevent the system from working):
> There is no way of determining which facts were used in a catalog compilation. A client could issue a catalog compilation request when any fact changes, but facts like `uptime_seconds` always change, so it would always result in a catalog compilation. The proper way to handle this would be to monitor when a fact variable is read. Then add a list of "facts used" and their value to the resulting catalog. Then the puppet agent can use that to see if the facts used have changed.

> This may not be a significant issue though. If the `puppet agent` cron job is set to something very infrequent, it won't be requesting catalog compilations very often.The catalog doesn't indicate whether a resource was collected from another node. So when an exported resource changes, you wouldn't be able to find nodes which collect those exported resources and recompile those catalogs. Now this isn't a big deal since the current method of using cron jobs can't even do this.

> The catalog does indicate that a resource is exported, so you could look for nodes with resources of the same type & title as the exported resource. But it's possible for a resource to have the same type & title of an exported resource as long as the exported resource isn't collected during the same catalog compilation, so this method might end up recompiling catalogs which don't use the exported resource.
> Puppet could probably be monkey patched to address these and add the data to the catalog. The script system could then extract the info, while the puppet agent would just ignore the data.
>
>
>
>
> Thoughts? I might be up for coding this, but it's would be low on the priority list and I wouldn't get to it for a long time.
>
>
>
>
> -Patrick

Nike

Reply all

Reply to author

Forward