PuppetServer 2.8 resource Leak

100 views
Skip to first unread message

Poil

unread,
Oct 3, 2017, 8:11:32 AM10/3/17
to puppet...@googlegroups.com
Hi,

We have PuppetServer 2.8 on RHEL7.

After some days computation of the catalog become slower and slower; the
load average of the compute nodes increased and the compute goes in timeout.

All our 4 computes nodes have 48 cores, only 10% of each core is used
when the timeout occured.

We are on hiera v3, we only tuned  "max-requests-per-instance: 5000"
because of a databases connection leak with our Trocla library.

Only 150 nodes are connected to our PuppetServer.

We never had this problem with Puppet3 (with more than 3000 nodes)

Anyone have already see this or have a tips to resolv this ?

Best regards,

puppetserver_memory.png
puppetserver_cpuaverage.png
puppetserver_load.png
puppetserver_catalog_time.png

Matthaus Owens

unread,
Oct 3, 2017, 3:59:28 PM10/3/17
to Puppet Users
Depending on what the trocla library does, it could be leaking objects
to the java layer, in which case tuning the max-requests-per-instance
down would not help. In general, the best way to find leaks like you
are talking about is described here:
https://puppet.com/blog/puppet-server-advanced-memory-debugging and
involves taking some heap dumps and investigating what changes between
dumps to see what is leaking.
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/c6a457e9-5caf-f048-68b7-9dbdc867a5bb%40quake.fr.
> For more options, visit https://groups.google.com/d/optout.

Peter Meier

unread,
Oct 4, 2017, 2:28:02 AM10/4/17
to puppet...@googlegroups.com, Poil
> We are on hiera v3, we only tuned "max-requests-per-instance: 5000"
> because of a databases connection leak with our Trocla library.

This is fixed with the trocla module 1.0.1 [1] are you on that version?

best

~pete

[1]
https://github.com/duritong/puppet-trocla/commit/bbedb788a7951e2f69c1c2815a5c3c669ff02ae6

signature.asc

Poil

unread,
Oct 4, 2017, 4:58:54 AM10/4/17
to puppet...@googlegroups.com, Peter Meier
Hi,

Thanks !

We had Trocla 0.2.3, and module 0.2.2 I'm upgrading to Trocla 0.3.0 and
latest module code

Best regards,

Antony Fomenko

unread,
Oct 4, 2017, 12:35:32 PM10/4/17
to Puppet Users
We faced similar issue. We now in process of switching to Puppet 4. Why not Puppet 5. Well it happened that latest 2.x PuppetDB works only with Puppet 4 so we must to do this intermediate step.
Anyway I hoped that our standard server could handle at lest as twice as more agents as before, with passnger/httpd stack, but I was wrong.
For now we have only 286 nodes connected to server with 40 cores E5-2680 v2 @ 2.80GHz and 384Gb RAM. Memory is not an issues but CPU is.
PuppetDB with PostgreSQL on the same node.
With this 286 nodes LA ~ 30. A lot of errors line JRuby timeout during instance borrowing with a 503/504 codes.
Catalog compile time ca be 1000s or even more.

We use puppetserver 2.7 and agent 4.10.4
max-active-instances: 32

Also we use environment_timeout = unlimited with cache flusing. We used it on passenger and it gave us a huge performance boost.

Maybe some here knows a way to optimize new puppetserver against CPU?
Anyone changed borrow-timeout, environment-class-cache-enabled or compile-mode ?

Poil

unread,
Oct 6, 2017, 6:40:41 AM10/6/17
to Peter Meier, puppet...@googlegroups.com
Hi,

The database connections leak remains after upgrading trocla.
Also after 2 days the catalog computation times start to increase again.
I'm going to install newrelic on a puppetserver and if I see nothing I will try to analyze via your blog article.

Best regardsLe 4 oct. 2017 10:58, Poil <po...@quake.fr> a écrit :
>
> Hi,
>
> Thanks !
>
> We had Trocla 0.2.3, and module 0.2.2 I'm upgrading to Trocla 0.3.0 and
> latest module code
>
> Best regards,
>
>
> Le 04/10/2017 à 08:27, Peter Meier a écrit :

> >> We are on hiera v3, we only tuned  "max-requests-per-instance: 5000"
> >> because of a databases connection leak with our Trocla library.

> > This is fixed with the trocla module 1.0.1 [1] are you on that version?
> >
> > best
> >
> > ~pete
> >
> > [1]
> > https://github.com/duritong/puppet-trocla/commit/bbedb788a7951e2f69c1c2815a5c3c669ff02ae6
> >
>

> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/675be630-3ef4-e316-6d05-d6000eaadd6c%40quake.fr.

Poil

unread,
Oct 16, 2017, 3:44:58 AM10/16/17
to Peter Meier, puppet...@googlegroups.com
Hi,

We have upgrade to Hiera Backend v5 (and to the new trocla v5 backend)

There still a leak but it seems to be very very low.

I've also switched in our site.pp all hiera/hiera_hash/hiera_array call
to lookup. We still have some hiera* function call in some module, I'm
asking myself if the leak is not in these functions.

Best regards,

Poil

unread,
Oct 18, 2017, 4:01:44 AM10/18/17
to Peter Meier, puppet...@googlegroups.com
Hi,

There still a leak on Trocla database connections (latest
gem/module/hiera backend).

I've installed a puppetserver, with only a node connected on; after 5
days, there is 240 opened connections.

Best regards,

Poil

unread,
Oct 18, 2017, 5:26:24 AM10/18/17
to Peter Meier, puppet...@googlegroups.com
The Trocla leak seems to be in the trocla-hiera-backend

diff --git a/lib/puppet/functions/trocla_lookup_key.rb b/lib/puppet/functions/trocla_lookup_key.rb
index d377ec8..f61df46 100644
--- a/lib/puppet/functions/trocla_lookup_key.rb
+++ b/lib/puppet/functions/trocla_lookup_key.rb
@@ -33,6 +33,8 @@ Puppet::Functions.create_function(:trocla_lookup_key) do
trocla_hierarchy(trocla_key, format, opts)
end

+ @trocla.close
+
context.not_found unless res
context.cache(key, res)
end

Peter Meier

unread,
Oct 18, 2017, 5:59:31 PM10/18/17
to puppet...@googlegroups.com
> There still a leak on Trocla database connections (latest
> gem/module/hiera backend).
>
> I've installed a puppetserver, with only a node connected on; after 5
> days, there is 240 opened connections.

So, as you were changing a lot (new gem, module and switching to new
backend) I'm not sure whether you are still the facing exactly the same
issue as at the beginning, but just a different variant of it.

But so let's focus on the current situation as it is the one with
everything up-to-date.

I dumped the situation and my current guess on what is going wrong in
the module's issue tracker:

https://github.com/duritong/puppet-trocla/issues/25

As you have a single node in on environment that reproduces the problem
I would be happy if you could provide in the ticket more information,
specifically regarding to:

* How does your hiera config look like atm., so how is the trocla
backend hooked into the hierarchies
* How many classes does the node include during compilation
* How many class parameters are triggering the trocla backend?
* is the amount of connections growing over time, even if you don't
change anything (no new puppet code or something like that).

Estimated numbers are fine, but they should get an indication whether we
have too many caches that all keep connections open, while the actual
idea was to use the caches to not open too many connections.

thanks and best

~pete

signature.asc
Reply all
Reply to author
Forward
0 new messages