Some analysis of profiling information

Andy Parker

unread,

Oct 13, 2014, 1:08:24 PM10/13/14

to puppe...@googlegroups.com

Last week we got our first bit of data from profiling someone's master. There were three different masters profiled and over the weekend I analyzed one of them. Here are the results of that analsys:

=====================================================

8 processors

~18 hour time period (Wed Oct 08 16:04:51 to Thu Oct 09 10:39:02)

258816 requests

~3 requests/second

~39.6 hours processing requests (Wallclock time)

Request ratios:

16.12 file_metadata requests/catalog request

3.85 file_metadatas requests/catalog request

0.99 node requests/catalog request

0.004 file_content requests/catalog request

By request type:

35.4 hours spent generating catalogs

2 hours spent responding to file_metadata

1.4 hours spent responding to file_metadatas

0.7 hours spent responding to node

By request pipeline activity:

1 hour spent finding facts

0.9 hours spend finishing catalog

1.9 hours spent filtering the catalog

1.8 hours spent serializing to PSON

0.3 hours spent in non-include functions

Function time:

0.14 hours spent evaluating ERB (template + inline_template)

0.08 hours spent in hiera()

======================================================

It appears that the server is likely not very heavily taxed. The time period analyzed appears to be mostly a steady state. It seems like there might have been one filed changed across about 10% of the nodes (42 file_content requests and a reported fleet of 400 nodes). A catalog request takes 11 seconds on average and all other requests are less than a second. We would get the same capacity improvement by optimizing the "filter catalog" step as we would on file_metadata requests. I suspect optimizing "filter catalog" would be better since it likely generates a large amount of garbage (thereby increasing the need for GC) whereas file_metadata is IO bound.

An interesting thing is that there weren't any report submission requests in the data. I suspect that they are using a separate report server.

In this case, the time for really "compiling" a catalog is ~83% of the catalog request time. The other 20% is "finding facts", "finishing catalog", "serialization", and a few other miscellaneous tasks.

PSON serialization takes almost as much time as file_metadata requests.

There were no functions that stood out as absurdly expensive, although hiera and ERB are up there. Analysis of include() is much more complicated because it has nested calls. One include will evaluate code that might call other includes. This means that a simple addition of the include times double counts most of the time. I haven't tried to correct for that and so am not reporting on include times here.

The server is running 3.7.1 on ruby 1.8.7. I think it would get more of a performance boost from changing to ruby 2.1 than many optimizations we can perform. That isn't to say that there aren't things to tune, as Thomas found. It seems like we should concentrate on the basic evaluator code, trying to remove catalog copies (there are at least two that are done, "filtering" is one), work toward switching from PSON to JSON, and getting ruby 2.1 available everywhere (which is probably part of dropping ruby 1.8.7 support).

--

Andrew Parker

an...@puppetlabs.com

Freenode: zaphod42

Twitter: @aparker42

Software Developer

Join us at PuppetConf 2014, September 20-24 in San Francisco - www.puppetconf.com

profilep04.analysis

R.I.Pienaar

unread,

Oct 13, 2014, 1:39:55 PM10/13/14

to puppet-dev

This is interesting, am working on a talk about profiling catalog compiles - what tool
did you use to generate the report?

> *Join us at **PuppetConf 2014, **September 20-24 in San Francisco - *
> www.puppetconf.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to puppet-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CANhgQXuWvKT5Mj9GDahzw%3D6zoh8zYGUUan4J%2B4wYCgRvB4_bEA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Andy Parker

unread,

Oct 13, 2014, 3:04:34 PM10/13/14

to puppe...@googlegroups.com

On Mon, Oct 13, 2014 at 10:39 AM, R.I.Pienaar <r...@devco.net> wrote:

This is interesting, am working on a talk about profiling catalog compiles - what tool
did you use to generate the report?

The data is from --profile on a master. I wrote a little script on saturday that I used to generate the attached file and then I just had irb open and started doing some math to write up the analysis in the email. I hope that as we do this more we can get more tools that allow easier analysis.

I should have looked at Adrien's puppet-profile-parser[1], but got caught up in hacking on my own thing :) I've attached the script that I put together. You give it the name of the yaml file that Britt's data collection script[2] produces. We need to change that script since the logs that this data came from where about 1GB in size, which is a lot of YAML to try to load into memory at once.

1: https://github.com/adrienthebo/puppet-profile-parser

2: https://github.com/puppetlabs/puppet-master-profiler

To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/1798324844.10198.1413221991263.JavaMail.zimbra%40devco.net.

For more options, visit https://groups.google.com/d/optout.

--

Andrew Parker

an...@puppetlabs.com

Freenode: zaphod42

Twitter: @aparker42

Software Developer

Join us at PuppetConf 2014, September 20-24 in San Francisco - www.puppetconf.com

extract.rb

Reply all

Reply to author

Forward