Adding metrics in agent report - PUP-4634

31 views
Skip to first unread message

Romain F.

unread,
May 22, 2015, 7:26:48 AM5/22/15
to puppe...@googlegroups.com
Hi everyone,
Following the suggestion of Henrik L. we'll continue the discussion of PUP-4634 here.
It began with my suggestion of adding some timings in agent code in order to check steps which can take some time. And, thus, being able to troubleshoot pretty ugly things like this :

Info: Caching catalog for bulbi1.c-bulbi.tgcc.ccc.cea.fr in 6.48 seconds
Info: Convert catalog in 6.18 seconds
Info: Applying configuration version '1432224909'
...
Notice: Finished catalog run in 88.45 seconds
Notice: Send report in 11.17 seconds



These 3 metrics have been added using the Puppet::Util.benchmark tool. So it's only visible on the agent side, nothing transfered back in reports.

As R. I. Pienaar suggested (and he's probably right), it would be a big win to put those metrics in reports. And then, add a way to show them in agent logs (like --evaltrace).

So here's my questions : Which metrics to gather ? Does puppetDB need to be aware of those new metrics ?

Any feedback appreciated.

Cheers,

---
Romain F.

R.I.Pienaar

unread,
May 22, 2015, 8:53:10 AM5/22/15
to puppet-dev
hello,

----- Original Message -----
> From: "Romain F." <romain...@gmail.com>
> To: "puppet-dev" <puppe...@googlegroups.com>
> Sent: Friday, May 22, 2015 8:20:16 AM
> Subject: [Puppet-dev] Adding metrics in agent report - PUP-4634

> Hi everyone,
> Following the suggestion of Henrik L. we'll continue the discussion of
> PUP-4634 here.
> It began with my suggestion of adding some timings in agent code in order
> to check steps which can take some time. And, thus, being able to
> troubleshoot pretty ugly things like this :
>
> Info: Caching catalog for bulbi1.c-bulbi.tgcc.ccc.cea.fr in 6.48 seconds
> Info: Convert catalog in 6.18 seconds
> Info: Applying configuration version '1432224909'
> ...
> Notice: Finished catalog run in 88.45 seconds

these would be good to add to both last_run_summary.yaml and the reports indeed,
I mentioned this in the ticket but the report already contains a wealth of perf
data, I have a report parser you can use on the CLI that produce output like this:

https://github.com/ripienaar/puppet-reportprint/blob/master/SAMPLE.txt

So already there you have config retrieval for example, adding more metrics around
the saving and re-loading of the catalog would be handy. As well as firming up
the docs around these and explain exactly what they are - this might exist now but
last time I checked it didnt and it was a bit of guess work what they all do and
mean esp some in last_run_summary.yaml.


> Notice: Send report in 11.17 seconds

obv this could not be in the report but be handy in last_run_summary.yaml

It would be ideal if like the existing agent perf data this is always collected
and always stored in reports but optionally shown to the console.

I can't right now think of specific additions but whoever implements it should just
go through every major life cycle event and make sure its reported. One thing that
might be handy is some details around the number of HTTP requests that are being made
to measure and observe the impact of things like the HTTP connection pool.

Romain F.

unread,
May 22, 2015, 9:17:47 AM5/22/15
to puppe...@googlegroups.com

Le vendredi 22 mai 2015 14:53:10 UTC+2, R.I. Pienaar a écrit :
these would be good to add to both last_run_summary.yaml and the reports indeed,
I mentioned this in the ticket but the report already contains a wealth of perf
data, I have a report parser you can use on the CLI that produce output like this:

https://github.com/ripienaar/puppet-reportprint/blob/master/SAMPLE.txt

So already there you have config retrieval for example, adding more metrics around
the saving and re-loading of the catalog would be handy.  As well as firming up
the docs around these and explain exactly what they are - this might exist now but
last time I checked it didnt and it was a bit of guess work what they all do and
mean esp some in last_run_summary.yaml.

 
It sounds like it shows metrics of the catalog application, not really about catalog manipulation, facter or Indirection caching. And those steps can take a while.
This is what we wanted to monitor originally.


> Notice: Send report in 11.17 seconds

obv this could not be in the report but be handy in last_run_summary.yaml

It would be ideal if like the existing agent perf data this is always collected
and always stored in reports but optionally shown to the console.

 
Adding a bunch of report.add_times(:step_to_report, thinmark{block_of_something}) would do the job right ?
 
I can't right now think of specific additions but whoever implements it should just
go through every major life cycle event and make sure its reported.  One thing that
might be handy is some details around the number of HTTP requests that are being made
to measure and observe the impact of things like the HTTP connection pool.

+1
Too much handshakes can put heavy pressure on masters, this would implement a way to monitor this.

R.I.Pienaar

unread,
May 22, 2015, 9:20:09 AM5/22/15
to puppet-dev


----- Original Message -----
> From: "Romain F." <romain...@gmail.com>
> To: "puppet-dev" <puppe...@googlegroups.com>
> This is what we wanted to monitor originally.\

yes it's not complete at the moment, I am just saying more in the same way
would be good

>> > Notice: Send report in 11.17 seconds
>>
>> obv this could not be in the report but be handy in last_run_summary.yaml
>>
>> It would be ideal if like the existing agent perf data this is always
>> collected
>> and always stored in reports but optionally shown to the console.
>>
>>
> Adding a bunch of report.add_times(:step_to_report, thinmark
> {block_of_something}) would do the job right ?
>
>
>> I can't right now think of specific additions but whoever implements it
>> should just
>> go through every major life cycle event and make sure its reported. One
>> thing that
>> might be handy is some details around the number of HTTP requests that are
>> being made
>> to measure and observe the impact of things like the HTTP connection pool.
>>
>
> +1
> Too much handshakes can put heavy pressure on masters, this would implement
> a way to monitor this.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to puppet-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/d0283a1c-6681-4475-a644-caba3f61542c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages