puppetdb dies at random

James Green

unread,

Feb 16, 2015, 6:23:20 AM2/16/15

to puppet-users

We have a puppet-master box with the following installed:

root@puppet-master:/var/log/puppetdb# dpkg -l | grep puppet

ii facter 2.4.1-1puppetlabs1 all Ruby module for collecting simple facts about a host operating system

ii hiera 1.3.4-1puppetlabs1 all A simple pluggable Hierarchical Database.

ii puppet 3.7.4-1puppetlabs1 all Centralized configuration management - agent startup and compatibility scripts

ii puppet-common 3.7.4-1puppetlabs1 all Centralized configuration management

ii puppetdb 2.2.2-1puppetlabs1 all PuppetDB Centralized Storage.

ii puppetdb-terminus 2.2.2-1puppetlabs1 all Connect Puppet to PuppetDB by setting up a terminus for PuppetDB.

ii puppetlabs-release 1.0-11 all "Package to install Puppet Labs gpg key and apt repo"

ii puppetmaster-common 3.7.4-1puppetlabs1 all Puppet master common scripts

ii puppetmaster-passenger 3.7.4-1puppetlabs1 all Centralised configuration management - master setup to run under mod passenger

Occasionally puppetdb is found to no longer be running. The end of the log says that it is replacing facts. Then syslog shows puppet-master is unable to "replace-facts" as the connection to puppet-db is refused. The start of the log when we boot it again states:

2015-02-16 10:34:28,202 INFO [p.t.s.w.jetty9-core] Removing buggy security provider SunPKCS11-NSS version 1.7

2015-02-16 10:34:28,537 INFO [p.t.s.w.jetty9-service] Initializing web server.

2015-02-16 10:34:28,604 INFO [p.t.s.w.jetty9-service] Starting web server.

2015-02-16 10:34:28,606 INFO [o.e.j.s.Server] jetty-9.1.z-SNAPSHOT

2015-02-16 10:34:28,638 INFO [o.e.j.s.ServerConnector] Started ServerConnector@395ac93f{HTTP/1.1}{localhost:8080}

2015-02-16 10:34:28,732 INFO [o.e.j.s.ServerConnector] Started ServerConnector@298d54c9{SSL-HTTP/1.1}{0.0.0.0:8081}

2015-02-16 10:34:28,787 INFO [c.p.p.c.services] PuppetDB version 2.2.2

2015-02-16 10:34:28,792 WARN [c.j.b.BoneCPConfig] JDBC username was not set in config!

2015-02-16 10:34:28,792 WARN [c.j.b.BoneCPConfig] JDBC password was not set in config!

2015-02-16 10:34:31,650 INFO [c.p.p.s.migrate] There are no pending migrations

2015-02-16 10:34:31,650 WARN [c.p.p.s.migrate] Unable to install optimal indexing

We are unable to create optimal indexes for your database.

For maximum index performance, we recommend using PostgreSQL 9.3 or

greater.

2015-02-16 10:34:31,654 INFO [c.p.p.c.services] Starting broker

2015-02-16 10:34:31,899 INFO [o.a.a.s.k.MessageDatabase] KahaDB is version 4

2015-02-16 10:34:31,931 INFO [o.a.a.s.k.MessageDatabase] Recovering from the journal ...

2015-02-16 10:34:31,931 INFO [o.a.a.s.k.MessageDatabase] Recovery replayed 2 operations from the journal in 0.026 seconds.

2015-02-16 10:34:32,455 INFO [c.p.p.c.services] Starting 12 command processor threads

2015-02-16 10:34:32,471 WARN [c.j.b.BoneCPConfig] JDBC username was not set in config!

2015-02-16 10:34:32,473 WARN [c.j.b.BoneCPConfig] JDBC password was not set in config!

2015-02-16 10:34:32,479 INFO [c.p.p.c.services] Starting query server

2015-02-16 10:34:32,496 WARN [o.e.j.s.h.ContextHandler] Empty contextPath

2015-02-16 10:34:32,500 INFO [o.e.j.s.h.ContextHandler] Started o.e.j.s.h.ContextHandler@41ec3132{/,null,AVAILABLE}

2015-02-16 10:34:32,515 INFO [c.p.p.c.services] Starting sweep of stale reports (threshold: 14 days)

2015-02-16 10:34:32,530 INFO [c.p.p.c.services] Finished sweep of stale reports (threshold: 14 days)

2015-02-16 10:34:32,531 INFO [c.p.p.c.services] Starting database garbage collection

2015-02-16 10:34:32,752 INFO [c.p.p.c.services] Finished database garbage collection

And then we're back to replacing facts again.

Any ideas where we should go from here?

Thanks,

James

Ken Barber

unread,

Feb 16, 2015, 7:14:34 AM2/16/15

to Puppet Users

It might be that PuppetDB is running out of heap? Check
/var/log/puppetdb for a file 'puppetdb-oom.hprof' for an indiciation
this is happening.

You can find instructions for how to adjust your heap space here:

https://docs.puppetlabs.com/puppetdb/2.2/configure.html#configuring-the-java-heap-size

ken.

> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/CAMH6%2BazCSD-BPD%2ByAO1jV_36bExqhVJ98DBBPk3s3ex4iDVNvg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

James Green

unread,

Feb 16, 2015, 7:43:58 AM2/16/15

to puppet-users

I have a 270MB puppetdb-oom.hprof.prev file in /var/log/puppetdb

Google reports this as http://projects.puppetlabs.com/issues/23237 however this page is a 500 Internal Server Error at present.

To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/CAE4bNTmTwMH%2B8ZU54R%3D-W-LuD%2Bz2qsk8BN16773G_2-9QXyBcQ%40mail.gmail.com.

James Green

unread,

Feb 16, 2015, 7:48:19 AM2/16/15

to puppet-users

16850 puppetdb 20 0 12.697g 418684 14848 S 0.9 0.4 4:32.74 java

That's top now since it began running around 10.30 this morning (GMT). 12G of ram? It's the only proc in the list having a 'g' against it. Seems excessive..?

Ken Barber

unread,

Feb 16, 2015, 7:54:44 AM2/16/15

to Puppet Users

> I have a 270MB puppetdb-oom.hprof.prev file in /var/log/puppetdb

This isn't unexpected behaviour per se, although it appears as such if
you haven't dealt much with Java applications. Memory usage is a hard
to predict thing, and if its too low, yes the JVM will crash itself &
drop that hprof file. Not all languages have limits, they'll just
increase the usage until the operating system kills them, but Java
does have a hard limit that needs to be set up front, and if it
exceeds that limit we have it set to crash & drop a hprof file in case
we need to analyze it. I don't believe in this case it needs analysis,
not yet anyway :-).

The main bug is in the fact that the daemon stdout isn't being logged
so its easier to determine its happening, at least not for Debian.
Basically Java is expressing the out of memory error via STDOUT, but
STDOUT is not being logged anywhere. We'll have this fixed in a future
release.

For now the recommendation is to increase your heap size until the
behaviour stabilises:

https://docs.puppetlabs.com/puppetdb/2.2/configure.html#configuring-the-java-heap-size

ken.

Ken Barber

unread,

Feb 16, 2015, 8:04:46 AM2/16/15

to Puppet Users

> 16850 puppetdb 20 0 12.697g 418684 14848 S 0.9 0.4 4:32.74 java
>
> That's top now since it began running around 10.30 this morning (GMT). 12G
> of ram? It's the only proc in the list having a 'g' against it. Seems
> excessive..?

So, there is a difference in the columns here ... the column with the
'g' is the 'virtual' usage, that means its the amount of RAM allocated
for potential usage. Its not using all of that memory right now, but
it can under large circumstances change to use that much. I'm not
quite sure why this is so high, you'd need to show a full output of
your settings, perhaps a ps auxwww | grep java will give us the
settings that have been passed and will enable me to understand why
its so high. Either way, it's usually been set to do that by
someone/something - by default our settings shouldn't enable Java to
consume 12 GB out of the box, so I can only presume the heap setting
was changed at some point.

The column just after that is the RES column, indicating how much its
actually consuming now. This is usually the important one. I'm of
course trivialising the description of each column, but understand
virt versus res is important. There are lots of articles on the
internet about this subject that are definitely worth researching as a
sysadmin.

Another thing, if that is truly that high, you might want to check
your dmesg output to make sure the process isn't getting caught by the
OOMkiller in Linux. I have no other information about your system then
what you've given me, so I can't make a judgement on whether 12 GB is
high or not for you. It does seem high, although I could understand
this increase in the setting if you were processing a lot of data.

ken.

James Green

unread,

Feb 17, 2015, 4:23:14 AM2/17/15

to puppet-users

An excellent write-up, thank you.

In our case puppet-master is actually an LXC container instance. On reflection the values reported by top are meaningless, and I'm not convinced I know the solution for monitoring purposes. I might suggest however that part of application support now needs to include the question "is this a container instance" to help reduce time wasted by yourselves and others (this is not a criticism of you or your colleagues but an observation of an enhancement to be made across the industry in [potential] fault diagnosis).

I have removed the -Xmx192M from our start-up parameters, we'll see how things go for the time being.

Thanks,

James

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/CAE4bNTmhJFvS-cUg3UoUxpyCoKV8YX7WKC1fq72XwHQ5yPtnUA%40mail.gmail.com.

Reply all

Reply to author

Forward