Puppet Server dying with high number of JRuby instances

699 views
Skip to first unread message

Dietrich, Stefan

unread,
Aug 25, 2015, 7:05:54 AM8/25/15
to puppet...@googlegroups.com
Hello,

we tried to today to migrate our Puppet Masters from Apache/Passenger to Puppet Server 1.1.1.
However, Puppet Server just dies with error messages as soon as we increase the number of JRuby instances to >24 and a JVM heapsize of > 16GB.

During startup of Puppet Server, it starts to spawn the JRuby instances one after another and around ~8 instances an exception is logged:
2015-08-25 10:25:05,676 INFO [puppet-server] Puppet Puppet settings initialized; run mode: master
2015-08-25 10:25:06,254 INFO [p.s.j.jruby-puppet-agents] Finished creating JRubyPuppet instance 7 of 32
2015-08-25 10:25:08,567 ERROR [p.t.internal] shutdown-on-error triggered because of exception!
java.lang.IllegalStateException: There was a problem adding a JRubyPuppet instance to the pool.
Caused by: org.jruby.embed.EvalFailedException: (LoadError) load error: jopenssl/load -- java.lang.NoClassDefFoundError: org/jruby/ext/openssl/NetscapeSPKI
at org.jruby.embed.internal.EmbedEvalUnitImpl.run(EmbedEvalUnitImpl.java:132) ~[puppet-server-release.jar:na]
at org.jruby.embed.ScriptingContainer.runUnit(ScriptingContainer.java:1341) ~[puppet-server-release.jar:na]

The full log file is available in this Gist [1].
The log file is from the initial setup with max-active-instances set to 32 and a JVM heap size of 48gb.
We had a working setup with 16GB Heap and 16 instances. Sometimes 24 worked as well, but not always.
However, 16 instances will be too small to handle all the Puppet agents.
Increasing the timeout in /etc/sysconfig/puppetserver did not help either.

We use rather beefy HW for our 3x Puppet Masters (2x Dell R715, 1x R815), for Apache/Passenger this scaled nicely.

The OS on the Puppet Masters is Scientific Linux 6.6 (RHEL 6.6 clone) and OpenJDK 8 is used.
We tried the Oracle JRE as well, but this did not change anything.
HTTPS is terminated at our F5 Loadbalancer, which forwards the traffic unencrypted to Puppet Server.

Any help would be appreciated!

[1] https://gist.github.com/stdietrich/5a5b8f9b1dc2445c3ec7

Regards,
Stefan

--
------------------------------------------------------------------------
Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems)
Ein Forschungszentrum der Helmholtz-Gemeinschaft
Notkestr. 85
phone: +49-40-8998-4696 22607 Hamburg
e-mail: stefan....@desy.de Germany
------------------------------------------------------------------------

Chris Price

unread,
Aug 25, 2015, 7:59:42 AM8/25/15
to Puppet Users, stefan....@desy.de
Stefan,

That is a very weird error.  The way it reads it sounds like something that should happen on every JRuby instance or on none of them ("NoClassDefFoundError" usually means it's trying to load some code that doesn't exist), so I wouldn't expect you to see a difference in behavior between 16 instances and 32 instances.

It might be best if you open a bug about this on our issue tracker: https://tickets.puppetlabs.com/browse/SERVER , so that we can get some other folks to weigh in on it... would you mind doing that?

Stefan Dietrich

unread,
Aug 25, 2015, 11:10:20 AM8/25/15
to puppet...@googlegroups.com
Hi Chris,

this was also our expectation. Even with 24 instances, it did not die
at 17, but at instance 20.

I have created issue SERVER-858 for this.

Regards,
Stefan
> > e-mail: stefan....@desy.de <javascript:>
> >
> > Germany
> > -------------------------------------------------------------------
> > -----
> >
>

jcbollinger

unread,
Aug 26, 2015, 9:49:19 AM8/26/15
to Puppet Users


On Tuesday, August 25, 2015 at 6:59:42 AM UTC-5, Chris Price wrote:
Stefan,

That is a very weird error.  The way it reads it sounds like something that should happen on every JRuby instance or on none of them ("NoClassDefFoundError" usually means it's trying to load some code that doesn't exist), so I wouldn't expect you to see a difference in behavior between 16 instances and 32 instances.


A NoClassDefFoundError could also mean that Java simply failed to load the specified class, even though it does exist in some other sense.  This could happen for some JRuby instances and not others if each uses its own ClassLoader, and each of those loads the specified class directly (rather than delegating to a common ClassLoader).  I don't know off-hand how Puppetserver is set up, but it would be pretty reasonable for it to be arranged to run all the JRubys in the same VM, using distinct ClassLoaders for improved isolation of the different instances.  In that case, the problem could be one of resource exhaustion (but not necessarily heap).

As a special case of the above, a NoClassDefFoundError can be caused by a failure in a static initializer of the the affected class.  That throws the door wide open for just about any kind of capacity or timing problem, especially if any native code is involved, as the full name of the affected class suggests is likely in this case.  On the other hand, there does not appear to be another error or exception recorded as a cause of the NoClassDefFoundError, so that casts doubt on this particular variation.


John

Reply all
Reply to author
Forward
0 new messages