Catalog Deserialization performance

34 views
Skip to first unread message

Romain F.

unread,
Jun 29, 2015, 10:48:05 AM6/29/15
to puppe...@googlegroups.com
Hi everyone,

I try to optimize our Puppet runs by running some benchmarks and patching the puppet core (if possible). But I have some difficulties around the catalog serialization/deserialization. 

In fact, in 3.7.5 or 3.8.x, the Config Retrieval takes roughly 7secs and only 4 secs is on the master side. Same fact in 4.2 but with 9 secs of config retrieval and still 4 secs on the master side. 

My first thoughts was "Okay, time to try MsgPack". No improvements.

I've instrumented a bit the code in the master branch around this, and I've found out that, on my 9secs of config retrieval, 3.61secs is lost in catalog deserialization, 2 secs is the catalog conversion.. But it's not the "real" deserialization (PSON to Hash) that takes ages, it's the creation of the Catalog object itself (Hash to catalog). Benchmarks shows that the time to deserialize MsgPack (or PSON) is negligible compared to the catalog deserialization time.

So here is my question : Is that a known issue ? Is there any reason of the regression in 4.x (Future parser creating more objects, ...) ?

Cheers,

Henrik Lindberg

unread,
Jun 29, 2015, 4:36:01 PM6/29/15
to puppe...@googlegroups.com
The parser=future setting only makes a difference when compiling the
catalog - the catalog itself does not contain more or different data
(except possibly using numbers instead of strings for some attributes).

The best way to optimize this is to write a benchmark using the
benchmark framework and measure the time it takes to deserialize a given
catalog. Then run that benchmark with Ruby profiling turned on.

There are quite a few things going on at the agent side in addition to
taking the catalog PSON and turning it into a catalog that it can apply
(loading types, resolving providers, etc). Make sure to benchmark these
separately if possible.

Regards
- henrik

> Cheers,
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-dev+...@googlegroups.com
> <mailto:puppet-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/a5bf7422-6119-43ee-ba11-44001c1ce097%40googlegroups.com
> <https://groups.google.com/d/msgid/puppet-dev/a5bf7422-6119-43ee-ba11-44001c1ce097%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

Trevor Vaughan

unread,
Jun 29, 2015, 4:41:53 PM6/29/15
to puppe...@googlegroups.com
If you get a profiling suite together (aka, bunch of random patches) could you release it?

I've been curious about this for quite some time but never quite got around to dealing with it.

My concern is very much client side performance since the more you managing a client, the less the client gets to do it's actual job.

Thanks,

Trevor

To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/mmsa75%24edc%241%40ger.gmane.org.

For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699

-- This account not approved for unencrypted proprietary information --

Henrik Lindberg

unread,
Jun 29, 2015, 10:23:42 PM6/29/15
to puppe...@googlegroups.com
On 2015-29-06 22:41, Trevor Vaughan wrote:
> If you get a profiling suite together (aka, bunch of random patches)
> could you release it?
>

It is not difficult actually. Look at the benchmarks in the puppet code
base. Many of them are suitable for profiling with a ruby profiler.
I don't think we have any benchmarks targeting the agent side though, so
the first thing to do (for someone) is to write one.

What is more difficult is coming up with a benchmark that does not
involve real/complex resources - but deserialization and up to actually
applying should be possible to work with in a simple way.

Profiling is then just running that benchmark with the ruby profiler
turned on and analyzing the result, make changes, run again... (repeat
until happy).

- henrik


> I've been curious about this for quite some time but never quite got
> around to dealing with it.
>
> My concern is very much client side performance since the more you
> managing a client, the less the client gets to do it's actual job.
>
> Thanks,
>
> Trevor
>
> On Mon, Jun 29, 2015 at 4:35 PM, Henrik Lindberg
> <henrik....@cloudsmith.com <mailto:henrik....@cloudsmith.com>>
> wrote:
> <mailto:puppet-dev%2Bunsu...@googlegroups.com>
> <mailto:puppet-dev+...@googlegroups.com
> <mailto:puppet-dev%2Bunsu...@googlegroups.com>>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/a5bf7422-6119-43ee-ba11-44001c1ce097%40googlegroups.com
> <https://groups.google.com/d/msgid/puppet-dev/a5bf7422-6119-43ee-ba11-44001c1ce097%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
>
> Visit my Blog "Puppet on the Edge"
> http://puppet-on-the-edge.blogspot.se/
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to puppet-dev+...@googlegroups.com
> <mailto:puppet-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/mmsa75%24edc%241%40ger.gmane.org.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Trevor Vaughan
> Vice President, Onyx Point, Inc
> (410) 541-6699
>
> -- This account not approved for unencrypted proprietary information --
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-dev+...@googlegroups.com
> <mailto:puppet-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoXoa3sShf%2Bs9n4Pn9NJvbhDxJLNEe%3DeyO%2BCdjuvHLoNEg%40mail.gmail.com
> <https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoXoa3sShf%2Bs9n4Pn9NJvbhDxJLNEe%3DeyO%2BCdjuvHLoNEg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Romain F.

unread,
Jun 30, 2015, 10:17:17 AM6/30/15
to puppe...@googlegroups.com
I've already benchmarked and profiled Catalog's from_data_hash and to_data_hash methods using the benchmark framework.
Most of the time is spent in the from_data_hash (we already knew it) but there is no big "pitfalls" where Ruby loses his time.

My callgrind file shows that the top 5 (in self time) is :
- Array.flatten (55000 calls)
- Array.each (115089 calls)
- Puppet::Resource.initialize (15000 calls)
- String.=~ (65045 calls)
- Hash[]= (115084 calls)

This top 5 is taking ~30% of the total time .

As you can see, it can be dificult to optimize this. IMHO, the "benchmark -> tweak -> benchmark" way of optimizing is not sufficient here. I think the way it (de)serialize a catalog needs a deep refactor.

Cheers,

Henrik Lindberg

unread,
Jun 30, 2015, 6:04:55 PM6/30/15
to puppe...@googlegroups.com
On 2015-30-06 16:17, Romain F. wrote:
> I've already benchmarked and profiled Catalog's from_data_hash and
> to_data_hash methods using the benchmark framework.
> Most of the time is spent in the from_data_hash (we already knew it) but
> there is no big "pitfalls" where Ruby loses his time.
>
> My callgrind file shows that the top 5 (in self time) is :
> - Array.flatten (55000 calls)
> - Array..each (115089 calls)
> - Puppet::Resource.initialize (15000 calls)
> - String.=~ (65045 calls)
> - Hash[]= (115084 calls)
>
> This top 5 is taking ~30% of the total time .
>
> As you can see, it can be dificult to optimize this. IMHO, the
> "benchmark -> tweak -> benchmark" way of optimizing is not sufficient
> here. I think the way it (de)serialize a catalog needs a deep refactor.
>

There is probably lots of duplicated work going on at the levels above
and those are causing those generic methods to light up (except
Puppet::Resource.initialize).

There is both the deserialization process as such to optimize, but also
the Resource implementation itself which is far from optimal.

The next thing would be to focus on Resource.initialize/from_data_hash

I think it is also relevant to establish some kind of "world record" -
say serializing and deserializing a hash using MsgPack; a hash of data
cannot be transported faster across the wire than that (unless also not
using Ruby objects to represent the data - with a lot of extra complexity).

I mean, a hash of some complexity will always consume quite a bit of
processing and memory to get across the wire. Is it hitting the "world
record" enough?

- henrik

> Cheers,
>
> Le mardi 30 juin 2015 04:23:42 UTC+2, henrik lindberg a écrit :
>
> On 2015-29-06 22:41, Trevor Vaughan wrote:
> > If you get a profiling suite together (aka, bunch of random patches)
> > could you release it?
> >
>
> It is not difficult actually. Look at the benchmarks in the puppet code
> base. Many of them are suitable for profiling with a ruby profiler.
> I don't think we have any benchmarks targeting the agent side
> though, so
> the first thing to do (for someone) is to write one.
>
> What is more difficult is coming up with a benchmark that does not
> involve real/complex resources - but deserialization and up to actually
> applying should be possible to work with in a simple way.
>
> Profiling is then just running that benchmark with the ruby profiler
> turned on and analyzing the result, make changes, run again... (repeat
> until happy).
>
> - henrik
>
>
> > I've been curious about this for quite some time but never quite got
> > around to dealing with it.
> >
> > My concern is very much client side performance since the more you
> > managing a client, the less the client gets to do it's actual job.
> >
> > Thanks,
> >
> > Trevor
> >
> > On Mon, Jun 29, 2015 at 4:35 PM, Henrik Lindberg
> > <henrik.....@cloudsmith.com <javascript:>
> <mailto:henrik....@cloudsmith.com <javascript:>>>
> > wrote:
> >
> > On 2015-29-06 16 <tel:2015-29-06%2016>:48, Romain F. wrote:
> >
> > Hi everyone,
> >
> > I try to optimize our Puppet runs by running some
> benchmarks and
> > patching the puppet core (if possible).. But I have some
> > an email to puppet-dev+...@googlegroups.com <javascript:>
> > <mailto:puppet-dev%2Bunsu...@googlegroups.com
> <javascript:>>
> > <mailto:puppet-dev+...@googlegroups.com
> <javascript:>
> > <mailto:puppet-dev%2Bunsu...@googlegroups.com
> <javascript:>>>.
> <https://groups.google.com/d/msgid/puppet-dev/a5bf7422-6119-43ee-ba11-44001c1ce097%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
> >
> >
> >
> > --
> >
> > Visit my Blog "Puppet on the Edge"
> > http://puppet-on-the-edge.blogspot.se/
> <http://puppet-on-the-edge.blogspot.se/>
> >
> > --
> > You received this message because you are subscribed to the
> Google
> > Groups "Puppet Developers" group.
> > To unsubscribe from this group and stop receiving emails from
> it,
> > send an email to puppet-dev+...@googlegroups.com <javascript:>
> > <mailto:puppet-dev%2Bunsu...@googlegroups.com
> <javascript:>>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/puppet-dev/mmsa75%24edc%241%40ger.gmane.org
> <https://groups.google.com/d/msgid/puppet-dev/mmsa75%24edc%241%40ger.gmane.org>.
>
> >
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> >
> >
> >
> >
> > --
> > Trevor Vaughan
> > Vice President, Onyx Point, Inc
> > (410) 541-6699
> >
> > -- This account not approved for unencrypted proprietary
> information --
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Puppet Developers" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to puppet-dev+...@googlegroups.com <javascript:>
> > <mailto:puppet-dev+...@googlegroups.com <javascript:>>.
> <https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoXoa3sShf%2Bs9n4Pn9NJvbhDxJLNEe%3DeyO%2BCdjuvHLoNEg%40mail.gmail.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
>
>
> --
>
> Visit my Blog "Puppet on the Edge"
> http://puppet-on-the-edge.blogspot.se/
> <http://puppet-on-the-edge.blogspot.se/>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-dev+...@googlegroups.com
> <mailto:puppet-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/5f33371f-98e3-429c-8906-8db54b1de717%40googlegroups.com
> <https://groups.google.com/d/msgid/puppet-dev/5f33371f-98e3-429c-8906-8db54b1de717%40googlegroups.com?utm_medium=email&utm_source=footer>.

Romain F.

unread,
Jul 1, 2015, 3:05:08 AM7/1/15
to puppe...@googlegroups.com


Le mercredi 1 juillet 2015 00:04:55 UTC+2, henrik lindberg a écrit :
On 2015-30-06 16:17, Romain F. wrote:
> I've already benchmarked and profiled Catalog's from_data_hash and
> to_data_hash methods using the benchmark framework.
> Most of the time is spent in the from_data_hash (we already knew it) but
> there is no big "pitfalls" where Ruby loses his time.
>
> My callgrind file shows that the top 5 (in self time) is :
> - Array.flatten (55000 calls)
> - Array..each (115089 calls)
> - Puppet::Resource.initialize (15000 calls)
> - String.=~ (65045 calls)
> - Hash[]= (115084 calls)
>
> This top 5 is taking ~30% of the total time .
>
> As you can see, it can be dificult to optimize this. IMHO, the
> "benchmark -> tweak -> benchmark" way of optimizing is not sufficient
> here. I think the way it (de)serialize a catalog needs a deep refactor.
>

There is probably lots of duplicated work going on at the levels above
and those are causing those generic methods to light up (except
Puppet::Resource.initialize).

There is both the deserialization process as such to optimize, but also
the Resource implementation itself which is far from optimal.

The next thing would be to focus on Resource.initialize/from_data_hash

Agreed, but I can't do that on my own in timely fashion. We just want puppet devs  to be aware that Puppet::Resource gets more features but performances is not following anyhow, and it's getting worse in puppet 4. I don't really now how it can be adressed.

I think it is also relevant to establish some kind of "world record" -
say serializing and deserializing a hash using MsgPack; a hash of data
cannot be transported faster across the wire than that (unless also not
using Ruby objects to represent the data - with a lot of extra complexity).

I mean, a hash of some complexity will always consume quite a bit of
processing and memory to get across the wire. Is it hitting the "world
record" enough?

That's not the point, like I said, this performance gap is coming from the creation the Graph itself, not the deserialization from PSON or MsgPack. In my case, in the 4 sec deserialization, 3.5 secs is the from_data_hash.


Romain
>      >         <mailto:puppet-dev+unsub...@googlegroups.com
>      > <mailto:puppet-dev+unsub...@googlegroups.com <javascript:>>.
> <mailto:puppet-dev+unsub...@googlegroups.com>.

Henrik Lindberg

unread,
Jul 2, 2015, 6:09:45 PM7/2/15
to puppe...@googlegroups.com
On 2015-01-07 9:05, Romain F. wrote:
>
> There is probably lots of duplicated work going on at the levels above
> and those are causing those generic methods to light up (except
> Puppet::Resource.initialize).
>
> There is both the deserialization process as such to optimize, but also
> the Resource implementation itself which is far from optimal.
>
> The next thing would be to focus on Resource.initialize/from_data_hash
>
>
> Agreed, but I can't do that on my own in timely fashion. We just want
> puppet devs to be aware that Puppet::Resource gets more features but
> performances is not following anyhow, and it's getting worse in puppet
> 4. I don't really now how it can be adressed.
>
>
We recently discussed how to keep an eye on performance regressions like
this one. We were very observant on the 4.x "future parser" changes and
that they performed well - I guess that work overshadowed performance
regressions in other areas.

I think the problem should be logged as a ticket with the findings so
far (big thanks you for taking the time and doing the detective work and
profiling!)


> I think it is also relevant to establish some kind of "world record" -
> say serializing and deserializing a hash using MsgPack; a hash of data
> cannot be transported faster across the wire than that (unless also not
> using Ruby objects to represent the data - with a lot of extra
> complexity).
>
> I mean, a hash of some complexity will always consume quite a bit of
> processing and memory to get across the wire. Is it hitting the "world
> record" enough?
>
> That's not the point, like I said, this performance gap is coming from
> the creation the Graph itself, not the deserialization from PSON or
> MsgPack. In my case, in the 4 sec deserialization, 3.5 secs is the
> from_data_hash.
>
ok, point taken, you already established that.
Reply all
Reply to author
Forward
0 new messages