Could use another set of eyes to assist

Jeremy

unread,

Sep 27, 2012, 12:56:28 PM9/27/12

to puppet...@googlegroups.com

I've got a puppet module I've written to support deploying a custom PHP web application that a client has developed. The actual application and the fact that it's deployed within AWS is not the problem or important to the issue. I'm looking to see if I someone else can think of a better way to implement what I have done that is more efficient and improve the catalog rendering time.

I've placed the module into a GitHub repository at (https://github.com/jbouse/puppet-app1). The app is deployed based on the contents of a YAML file that is retrieved by the class. Initially deploying the sites defined was not an issue and was actually very fast. The problems got introduced in handling the dependent components as defined in the YAML file. I wrote several parser functions in an effort to identify only the unique components and versions needed and generate titles that would not conflict. I fully admit and consider what I've done a hack and I'm really open to hearing some suggestions on how to make it less of one.

jcbollinger

unread,

Sep 28, 2012, 11:38:03 AM9/28/12

to puppet...@googlegroups.com

On Thursday, September 27, 2012 11:56:28 AM UTC-5, Jeremy wrote:

I've got a puppet module I've written to support deploying a custom PHP web application that a client has developed. The actual application and the fact that it's deployed within AWS is not the problem or important to the issue. I'm looking to see if I someone else can think of a better way to implement what I have done that is more efficient and improve the catalog rendering time.

I've placed the module into a GitHub repository at (https://github.com/jbouse/puppet-app1). The app is deployed based on the contents of a YAML file that is retrieved by the class. Initially deploying the sites defined was not an issue and was actually very fast. The problems got introduced in handling the dependent components as defined in the YAML file. I wrote several parser functions in an effort to identify only the unique components and versions needed and generate titles that would not conflict. I fully admit and consider what I've done a hack and I'm really open to hearing some suggestions on how to make it less of one.

I have been looking over your module, and so far I don't see any smoking gun that would clearly explain why catalog compilation takes so long, but I do have a guess (see below). The code seems generally clean, well-organized, and nicely documented, but perhaps I have a few comments that perhaps you will find helpful. In no particular order:

You use a lot of constructs of this form: "$somedir = inline_template("<%= File.join('${parent}', 'foo') %>")". That's a lot uglier and heavier than "$somedir = "${parent}/foo", and it gains you nothing: the template is always going to be evaluated on the master, which cannot run on Windows, so the file separator is always going to be '/'.
It is harmless, but unnecessary, for a File to explicitly 'require' other File resources representing its parent or other ancestor directory. Puppet will generate these relationships automatically if you do not specify them explicitly.
You evidently have a separate module "common" containing a definition named "common::archive::tar-gz". Names in your Puppet manifests should not contain the hyphen (-) -- it works in some places, in some versions, but not in others. You would be wise to avoid it altogether, perhaps by replacing it with an underscore (_).
If there is one thing to be most suspicious of, it would be your app1_deployment function's use of "YAML.load(open(args[0]))". Some of the other code in your module leads me to suspect that the file it's opening may not be local. (And if it were local, then you would probably just put the data in the hiera store you are using for other data.) If you are indeed retrieving that file over the network then the time to do so could easily dominate your compilation times, and network slowness or outage could make your compilations timeout or simply fail.

Good luck,

John

Jeremy T. Bouse

unread,

Sep 28, 2012, 11:53:32 AM9/28/12

to puppet...@googlegroups.com

John,

Your observations were pretty much on target. The common module does have the define that handles retrieving and extracting the tarballs to a target directory and has worked perfectly for quite some time. I was re-designing the module that deploys the web app from their old single tarball to a multi-tarball deployment model so now it just gets called more. I hadn't heard of the issue with the hyphen but I'll take it under advisement and adjust.

The use of the "YAML.load(open(args[0]))" call was in fact to support both local and network files. In this case I'm actually giving an authenticated S3 bucket URL to retrieve the file as the engineers releasing the code also upload the deployment YAML file to the S3 bucket. The tarballs that are deployed are also in the S3 bucket and also pass authenticated URLs in the catalog with an expiration equal to the catalog expiration time. I'd like to eventually modify it to include retrieving the deployment file and storing it locally only when it's been modified but want to keep it in S3 as it allows my Puppet master to operate as a blackbox that engineers have no access to. If I control the deployment file locally they claim I'm the bottleneck slowing them down so as long as I give them the means to update it and the process flow is error free and only problems encountered are when they screw up the deployment file contents accountability is maintainable.

My thought on the "smoking gun" is in having to make the parser function calls to try and determine the unique components and unique versions in the case of a component with multiple versions needing to be deployed. This was the quickest way I could find to get the deployment file format converted and ensure that I only defined a resource once avoiding the duplicate resource definition errors. As a result I'm calling the 2 functions which have to iterate through the entire YAML content merging then sorting for unique values separately.

jcbollinger

unread,

Sep 28, 2012, 5:37:43 PM9/28/12

to puppet...@googlegroups.com

On Friday, September 28, 2012 10:53:45 AM UTC-5, Jeremy wrote:

The use of the "YAML.load(open(args[0]))" call was in fact to support both local and network files. In this case I'm actually giving an authenticated S3 bucket URL to retrieve the file as the engineers releasing the code also upload the deployment YAML file to the S3 bucket. The tarballs that are deployed are also in the S3 bucket and also pass authenticated URLs in the catalog with an expiration equal to the catalog expiration time. I'd like to eventually modify it to include retrieving the deployment file and storing it locally only when it's been modified but want to keep it in S3 as it allows my Puppet master to operate as a blackbox that engineers have no access to. If I control the deployment file locally they claim I'm the bottleneck slowing them down so as long as I give them the means to update it and the process flow is error free and only problems encountered are when they screw up the deployment file contents accountability is maintainable.

Given your target environment I can imagine why S3 may be attractive, but if you yave not already done so then you should investigate whether it provides the performance guarantees (and real-life performance) necessary for the use to which you're putting it.

Have you considered pulling over the deployment file to the master on a periodic basis (such as via cron) so that it can always be local for your module?

My thought on the "smoking gun" is in having to make the parser function calls to try and determine the unique components and unique versions in the case of a component with multiple versions needing to be deployed. This was the quickest way I could find to get the deployment file format converted and ensure that I only defined a resource once avoiding the duplicate resource definition errors. As a result I'm calling the 2 functions which have to iterate through the entire YAML content merging then sorting for unique values separately.

How big are the real deployment files? I wouldn't think that parsing and processing even moderately large YAML files would be prohibitively expensive in itself, especially when compared to the work the master must perform to compile all the DSL code. In any case, you should be able to test that against real data by wrapping a test harness around the innards of your function.

Cheers,

John

Jeremy T. Bouse

unread,

Sep 29, 2012, 1:03:20 AM9/29/12

to puppet...@googlegroups.com

On Fri, Sep 28, 2012 at 5:37 PM, jcbollinger <John.Bo...@stjude.org> wrote:

On Friday, September 28, 2012 10:53:45 AM UTC-5, Jeremy wrote:

The use of the "YAML.load(open(args[0]))" call was in fact to support both local and network files. In this case I'm actually giving an authenticated S3 bucket URL to retrieve the file as the engineers releasing the code also upload the deployment YAML file to the S3 bucket. The tarballs that are deployed are also in the S3 bucket and also pass authenticated URLs in the catalog with an expiration equal to the catalog expiration time. I'd like to eventually modify it to include retrieving the deployment file and storing it locally only when it's been modified but want to keep it in S3 as it allows my Puppet master to operate as a blackbox that engineers have no access to. If I control the deployment file locally they claim I'm the bottleneck slowing them down so as long as I give them the means to update it and the process flow is error free and only problems encountered are when they screw up the deployment file contents accountability is maintainable.

Given your target environment I can imagine why S3 may be attractive, but if you yave not already done so then you should investigate whether it provides the performance guarantees (and real-life performance) necessary for the use to which you're putting it.

Have you considered pulling over the deployment file to the master on a periodic basis (such as via cron) so that it can always be local for your module?

As the puppet master is also a client I've thought about setting it up as a file to retrieve and store locally. Though I could probably write a script to check for updates and have it ran from cron more frequently to ensure that the web app server isn't dependent on the master having run it's update. Don't want to give engineers even more reason to say the process takes too long.

My thought on the "smoking gun" is in having to make the parser function calls to try and determine the unique components and unique versions in the case of a component with multiple versions needing to be deployed. This was the quickest way I could find to get the deployment file format converted and ensure that I only defined a resource once avoiding the duplicate resource definition errors. As a result I'm calling the 2 functions which have to iterate through the entire YAML content merging then sorting for unique values separately.

How big are the real deployment files? I wouldn't think that parsing and processing even moderately large YAML files would be prohibitively expensive in itself, especially when compared to the work the master must perform to compile all the DSL code. In any case, you should be able to test that against real data by wrapping a test harness around the innards of your function.

Looking at the report metrics I can see that successful runs show config retrieval taking up to 130 seconds but most common is around 110 seconds so not much difference. When it fails it usually fails with a "Could not retrieve catalog from remote server: execution expired" and a "Could not retrieve catalog; skipping run" error messages and then proceeds with the cached catalog. Currently the catalog has 370-390 resources defined with a change usually involving 170-180 resources.

Cheers,

John

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/AHoMhpuyhGkJ.

To post to this group, send email to puppet...@googlegroups.com.
To unsubscribe from this group, send email to puppet-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.

jcbollinger

unread,

Oct 1, 2012, 12:13:24 PM10/1/12

to puppet...@googlegroups.com

On Saturday, September 29, 2012 12:03:33 AM UTC-5, Jeremy wrote:

On Fri, Sep 28, 2012 at 5:37 PM, jcbollinger <John.Bo...@stjude.org> wrote:

[...]

How big are the real deployment files? I wouldn't think that parsing and processing even moderately large YAML files would be prohibitively expensive in itself, especially when compared to the work the master must perform to compile all the DSL code. In any case, you should be able to test that against real data by wrapping a test harness around the innards of your function.

Looking at the report metrics I can see that successful runs show config retrieval taking up to 130 seconds but most common is around 110 seconds so not much difference. When it fails it usually fails with a "Could not retrieve catalog from remote server: execution expired" and a "Could not retrieve catalog; skipping run" error messages and then proceeds with the cached catalog. Currently the catalog has 370-390 resources defined with a change usually involving 170-180 resources.

370-390 resources is not unreasonably large. It's somewhat surprising that so many changes happen each run (after the first), but that doesn't factor into catalog compilation time.

The timings you report are potentially important, however, because they're running right about at the default client-side timeout for catalog requests (120s). You could try setting the "configtimeout" configuration parameter to something a bit larger, say 150 (in the agent section). That doesn't answer the question of what is causing compilation to take that long, but it probably gets you a lot fewer timeouts.

I still maintain that loading a file over the network is a pretty likely performance-killer. I/O is in general far, far slower than computation, and network I/O is typically both slower and less consistent than local I/O. As with anything performance-related, however, there is no alternative to testing for determining reliable performance characteristics.

You may also want to check whether your master is under-resourced. The master typically consumes 100s of MB, and if it has to swap parts of that back and forth between physical and virtual memory then that will slow everything down. Also, if you're using the built-in "webrick" server then you should be aware that it doesn't scale especially well, especially for medium-large catalogs. It is single-threaded, so if two nodes request catalogs at the same time, then one has to wait for the master to serve the other first. The usual advice for that situation is to run the master via passenger.

John

Jeremy T. Bouse

unread,

Oct 1, 2012, 12:30:06 PM10/1/12

to puppet...@googlegroups.com

On Mon, Oct 1, 2012 at 12:13 PM, jcbollinger <John.Bo...@stjude.org> wrote:

On Saturday, September 29, 2012 12:03:33 AM UTC-5, Jeremy wrote:

On Fri, Sep 28, 2012 at 5:37 PM, jcbollinger <John.Bo...@stjude.org> wrote:

[...]

How big are the real deployment files? I wouldn't think that parsing and processing even moderately large YAML files would be prohibitively expensive in itself, especially when compared to the work the master must perform to compile all the DSL code. In any case, you should be able to test that against real data by wrapping a test harness around the innards of your function.

Looking at the report metrics I can see that successful runs show config retrieval taking up to 130 seconds but most common is around 110 seconds so not much difference. When it fails it usually fails with a "Could not retrieve catalog from remote server: execution expired" and a "Could not retrieve catalog; skipping run" error messages and then proceeds with the cached catalog. Currently the catalog has 370-390 resources defined with a change usually involving 170-180 resources.

370-390 resources is not unreasonably large. It's somewhat surprising that so many changes happen each run (after the first), but that doesn't factor into catalog compilation time.

The timings you report are potentially important, however, because they're running right about at the default client-side timeout for catalog requests (120s). You could try setting the "configtimeout" configuration parameter to something a bit larger, say 150 (in the agent section). That doesn't answer the question of what is causing compilation to take that long, but it probably gets you a lot fewer timeouts.

I've taken the suggestion and increased the agent configtimeout on the client machines to see if this helps decrease the execution timeouts that the engineer is seeing and complaining about.

I still maintain that loading a file over the network is a pretty likely performance-killer. I/O is in general far, far slower than computation, and network I/O is typically both slower and less consistent than local I/O. As with anything performance-related, however, there is no alternative to testing for determining reliable performance characteristics.

I'm working on a process to retrieve the deployment configuration file from the S3 bucket outside of Puppet control so I can process it locally and see if that improves the config generation time.

You may also want to check whether your master is under-resourced. The master typically consumes 100s of MB, and if it has to swap parts of that back and forth between physical and virtual memory then that will slow everything down. Also, if you're using the built-in "webrick" server then you should be aware that it doesn't scale especially well, especially for medium-large catalogs. It is single-threaded, so if two nodes request catalogs at the same time, then one has to wait for the master to serve the other first. The usual advice for that situation is to run the master via passenger.

This is a relatively small installation with only a handfull of clients. Still the master is running Apache with Passenger instead of Webrick and utilizing async queuing.

John

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.

To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/QHeykExDSRIJ.

Reply all

Reply to author

Forward