puppet5 upgrade performance issues

152 views
Skip to first unread message

chris smith

unread,
Jan 9, 2018, 3:31:31 AM1/9/18
to Puppet Users
Hi there,

I recently did an upgrade from puppetserver 2.7.2 to puppetserver 5.0 and performance has bottomed out pretty terribly. Agents and puppetdb also got updated. Compiling the catalog on the server used to take 10-20 seconds, now they are taking 90-150 seconds and agent runs are taking 30+ minutes (used to be a couple of minutes).

The architecture is distributed, with:
 * a central ca, running puppetserver, puppetdb, postgres 9.6
 * separate puppetservers replicated in other datacentres. These are also running puppetdb, pointing writes to the central ca, but reads go to a locally replicated database

Other servers (agents) point to the replicated puppetservers to do all of the work.

The puppetservers were tuned (upped jvm memory, set max-instances).

The architecture hasn't changed since the upgrade.

The puppetservers are still running hiera3 configs, they haven't been converted yet (it's on the list, but from what I read it wasn't a showstopper). We have a reasonable amount of encrypted yaml files (using hiera-eyaml-gpg), though this was the same as pre-upgrade and hasn't changed significantly.

Since the upgrade, I've tried re-tuning the jvm settings, changing max-instances and not having much luck at all. I found the experimental dashboard on the puppetservers and they show that there are no free jruby's, but there has to be something I'm missing that's causing that to happen.

I'm lost on what to look at next, is there an easy way to peak inside jruby to see what's taking so long?

Any suggestions would be great.

Cheers,
Chris.

Matthaus Owens

unread,
Jan 9, 2018, 7:36:33 PM1/9/18
to Puppet Users
Chris,
To better help you, it would be great to know a few more things about
your installation. First question: are you running puppetserver 5.0.0
or something later in the 5.x series (and is it the same on all
servers)? Second, what version of the puppet-agent are on those
servers? puppetserver 5.1.3 included a fix for
https://tickets.puppetlabs.com/browse/SERVER-1922 which should improve
performance some.
To dig into what may be causing the compiles to be slower, I would
recommend first checking out the client metrics.
https://puppet.com/docs/puppetserver/5.1/http_client_metrics.html has
some details, and I would be interested in the client metrics that
page lists under the /puppet/v3/catalog. They are PuppetDB related
requests, and as that was also upgraded alongside puppetserver it
would be good to eliminate PuppetDB as a contributor. PuppetDB
slowness can show up as slow catalog compiles, which in turn will hold
jrubies for longer and might explain some of what you are seeing.
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/4d0dc37f-c07e-4f8c-8323-44a90d68b208%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Matthaus Owens

unread,
Jan 9, 2018, 7:40:48 PM1/9/18
to Puppet Users
Chris,
Hiera 3 + hiera-eyaml may also be contributing to the slowness. Here
is one ticket (related to SERVER-1922) that indicated moving to hiera
5 improved compile times substantially:
https://tickets.puppetlabs.com/browse/SERVER-1919

-Haus

Chris Smith

unread,
Jan 10, 2018, 3:11:51 AM1/10/18
to puppet...@googlegroups.com
Hi,

Thanks for your help.

On 10/01/18 06:36, Matthaus Owens wrote:
> Chris,
> To better help you, it would be great to know a few more things about
> your installation. First question: are you running puppetserver 5.0.0
> or something later in the 5.x series (and is it the same on all
> servers)? Second, what version of the puppet-agent are on those
> servers? puppetserver 5.1.3 included a fix for
> https://tickets.puppetlabs.com/browse/SERVER-1922 which should improve
> performance some.

Hm. Interesting, thanks. I'll check out what a 5.0 -> 5.1 upgrade will do.

>
> Hiera 3 + hiera-eyaml may also be contributing to the slowness. Here
> is one ticket (related to SERVER-1922) that indicated moving to hiera
> 5 improved compile times substantially:
> https://tickets.puppetlabs.com/browse/SERVER-1919

Also interesting but as noted on the last comment, a lot of the
structure was changed so that might not all have been hiera3 -> hiera5.

> To dig into what may be causing the compiles to be slower, I would
> recommend first checking out the client metrics.
> https://puppet.com/docs/puppetserver/5.1/http_client_metrics.html has
> some details, and I would be interested in the client metrics that
> page lists under the /puppet/v3/catalog. They are PuppetDB related
> requests, and as that was also upgraded alongside puppetserver it
> would be good to eliminate PuppetDB as a contributor. PuppetDB
> slowness can show up as slow catalog compiles, which in turn will hold
> jrubies for longer and might explain some of what you are seeing.

puppetservers are all the same.

We upgraded to:
# /opt/puppetlabs/server/bin/puppetserver -v
puppetserver version: 5.0.0

puppetdb is this, it should have been 5.0 as well but I stuffed it up.
# /opt/puppetlabs/server/bin/puppetdb -v
puppetdb version: 5.1.3


agents are all:
# /opt/puppetlabs/puppet/bin/puppet --version
5.0.0


The metrics say

{
"route-id": "puppet-v3-file_metadata-/*/",
"count": 9373,
"mean": 10217,
"aggregate": 95763941
},
{
"route-id": "puppet-v3-catalog-/*/",
"count": 828,
"mean": 94773,
"aggregate": 78472044
},
{
"route-id": "puppet-v3-node-/*/",
"count": 831,
"mean": 62709,
"aggregate": 52111179
},
{
"route-id": "puppet-v3-file_metadatas-/*/",
"count": 4714,
"mean": 9288,
"aggregate": 43783632
},
{
"route-id": "puppet-v3-report-/*/",
"count": 780,
"mean": 3433,
"aggregate": 2677740
},



"http-client-metrics": [
{
"count": 821,
"mean": 48,
"aggregate": 39408,
"metric-name":
"puppetlabs.localhost.http-client.experimental.with-metric-id.puppetdb.command.replace_catalog.full-response",
"metric-id": [
"puppetdb",
"command",
"replace_catalog"
]
},
{
"count": 832,
"mean": 25,
"aggregate": 20800,
"metric-name":
"puppetlabs.localhost.http-client.experimental.with-metric-id.puppetdb.command.replace_facts.full-response",
"metric-id": [
"puppetdb",
"command",
"replace_facts"
]
},
{
"count": 780,
"mean": 19,
"aggregate": 14820,
"metric-name":
"puppetlabs.localhost.http-client.experimental.with-metric-id.puppetdb.command.store_report.full-response",
"metric-id": [
"puppetdb",
"command",
"store_report"
]
},
{
"count": 215,
"mean": 43,
"aggregate": 9245,
"metric-name":
"puppetlabs.localhost.http-client.experimental.with-metric-id.puppetdb.facts.find.full-response",
"metric-id": [
"puppetdb",
"facts",
"find"
]
}
]


So I think that's showing it's quick to pass it off to puppetdb when
it's storing changes.

puppetdb logs are telling me that 'replace catalog' is taking 2-3
seconds, and 'replace facts' is taking 10-20 seconds (previous puppetdb
wasn't logging the time taken, so I can't compare).

I tried changing puppetdb logging to debug but it doesn't tell me what
it's doing with those 'replace' commands (I don't think, I might've
missed it). I haven't found a way to manually process one of those
files, do you know if there is a way to do it?

I've set up postgres logging to alert for queries over 200ms (both on
the primary & replica) and I get very little (a couple of queries every
now and then), so I don't think it's the database.


Cheers,
Chris.

Matthaus Owens

unread,
Jan 10, 2018, 9:41:56 PM1/10/18
to Puppet Users
Chris,

Good to know that PuppetDB isn't the cause here. One thing worth doing
on your servers is moving forward to Puppet-agent 5.3.2. There was a
performance regression in Puppet 5
https://tickets.puppetlabs.com/browse/PUP-8009 around
internationalization and modules (something like 30% in some cases).
You might also need to set `disable_i18n=true` in the [master] section
of your puppet.conf.
To see what puppetserver is doing and how long it takes you can set
`profile=true` in the [master] section of puppet.conf (I would
recommend only doing this on one of your servers, as it will generate
a lot of log output per puppet run). The profiling results will be
logged at the info level to
/var/log/puppetlabs/puppetserver/puppetserver.log, so you could then
inspect the timing of different parts of the agent lifecycle.
Yea none of those client metrics are worrying, they are all less than 50ms.

>
> puppetdb logs are telling me that 'replace catalog' is taking 2-3
> seconds, and 'replace facts' is taking 10-20 seconds (previous puppetdb
> wasn't logging the time taken, so I can't compare).
>
> I tried changing puppetdb logging to debug but it doesn't tell me what
> it's doing with those 'replace' commands (I don't think, I might've
> missed it). I haven't found a way to manually process one of those
> files, do you know if there is a way to do it?

I don't know offhand unfortunately.

>
> I've set up postgres logging to alert for queries over 200ms (both on
> the primary & replica) and I get very little (a couple of queries every
> now and then), so I don't think it's the database.
>
>
> Cheers,
> Chris.
>
> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/1b549321-8db4-c352-5b75-ebe8eb0e5a79%40gmail.com.

Poil

unread,
Jan 10, 2018, 10:43:01 PM1/10/18
to puppet...@googlegroups.com, chris smith
Hey,

Are you sure that you have environment cache enabled on your Puppet5 installation ? (because x10 is what we have here when we disable it)


Best regards,
--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.

Chris Smith

unread,
Jan 11, 2018, 6:38:20 AM1/11/18
to puppet...@googlegroups.com
On 11/01/18 08:41, Matthaus Owens wrote:
> Chris,
>
> Good to know that PuppetDB isn't the cause here. One thing worth doing
> on your servers is moving forward to Puppet-agent 5.3.2. There was a
> performance regression in Puppet 5
> https://tickets.puppetlabs.com/browse/PUP-8009 around
> internationalization and modules (something like 30% in some cases).
> You might also need to set `disable_i18n=true` in the [master] section
> of your puppet.conf.

Nice find, thanks for that.

> To see what puppetserver is doing and how long it takes you can set
> `profile=true` in the [master] section of puppet.conf (I would
> recommend only doing this on one of your servers, as it will generate
> a lot of log output per puppet run). The profiling results will be
> logged at the info level to
> /var/log/puppetlabs/puppetserver/puppetserver.log, so you could then
> inspect the timing of different parts of the agent lifecycle.

Awesome, thanks - I'll give that a shot.

Cheers,
Chris.

Chris Smith

unread,
Jan 11, 2018, 6:40:09 AM1/11/18
to Poil, puppet...@googlegroups.com
On 11/01/18 09:42, Poil wrote:
> Hey,
>
> Are you sure that you have environment cache enabled on your Puppet5
> installation ? (because x10 is what we have here when we disable it)

No, that's not enabled at the moment (it wasn't before either). I'm
working on setting that up, I did a test run and it helped quite a bit.

Thanks for the suggestion.

Cheers,
Chris.

Matthaus Owens

unread,
Jan 24, 2018, 5:41:21 PM1/24/18
to Puppet Users
Hey Chris,
Just wondering if you got any further in figuring this out. I'd be
happy to try to help debug some more if you need as well.
> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/8b8cc991-165b-20df-dbf2-8fe9e48ecd24%40gmail.com.

Chris Smith

unread,
Jan 28, 2018, 7:48:12 PM1/28/18
to puppet...@googlegroups.com, Matthaus Owens
Hey,

Thanks for the follow up.

Environment caching helped a lot, compile times are down to 5-10 seconds
(from 10+ minutes).

I haven't had time to test puppet-agent-5.3 or look at converting to
hiera5 unfortunately.
Cheers,
Chris.
Reply all
Reply to author
Forward
0 new messages