Lucee 5 keeps crashing on me

350 views
Skip to first unread message

Seb Duggan

unread,
Jun 2, 2016, 6:31:54 AM6/2/16
to Lucee
I have an issue with a Lucee 5 server repeatedly crashing - it's got to the stage now where it's going down after 2 to 3 hours.

The setup is this:

DigitalOcean droplet, 2GB RAM, 2CPUs
CentOS 7
Only running Apache and Lucee 5 (5.0.0.252), via mod_CFML
Database is hosted on a separate droplet

The server serves:
2 MuraCMS instances (each with 2 sites)
2 other low-traffic CFML (FW1) sites

Tomcat memory options:
CATALINA_OPTS="-Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC";

If I look at the Lucee admin, I can see the heap memory usage climbing steadily over time. This screenshot was taken about 5 minutes before the server crashed:



When it crashes (I'm assuming when the heap and non-heap combine to reach 100%), I see the following message in the browser:

Service Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

If I check on the server, I can see that the Lucee/Tomcat service is no longer running. And a look at the Catalina logs shows:

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007fbc562b0000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/lucee/tomcat/hs_err_pid27367.log

The error report file reads:

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (os_linux.cpp:2627), pid=27367, tid=140446781527808
#
# JRE version: Java(TM) SE Runtime Environment (8.0_74-b02) (build 1.8.0_74-b02)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.74-b02 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#


---------------  T H R E A D  ---------------


Current thread (0x00007fbc60108000):  JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=27378, stack(0x00007fbc5075d000,0x00007fbc5085e000)]


Stack: [0x00007fbc5075d000,0x00007fbc5085e000],  sp=0x00007fbc5085a850,  free space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  
[libjvm.so+0xaba7ea]

I've installed The FusionReactor trial to try to get more insight into what's going on; it shows that there were approx. 135 active sessions when it fell over - not sure what else to be looking for.


Does anyone have any ideas what might be going on here? I have other droplets with the same basic setup which seem perfectly stable...


- Seb

Nando Breiter

unread,
Jun 2, 2016, 7:54:38 AM6/2/16
to lu...@googlegroups.com
Was the same setup / codebases working on Lucee 4.5 without a memory issue?



Aria Media Sagl
+41 (0)76 303 4477 cell
skype: ariamedia

--
Win a ticket to dev.objective from Lucee via Twitter, see http://bit.ly/1UbTMWj for details, good luck and see you there...
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/81312f3e-7ac5-48c4-91a2-4ff07d9d3cc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nando Breiter

unread,
Jun 2, 2016, 7:59:48 AM6/2/16
to lu...@googlegroups.com
And, you might want to look into using jClarity's Censum tool to get a handle on how you should tune this server if indications point to trying a JVM tuning solution rather than an outright memory leak. I had a long running scheduled task that prevented garbage collection from performing optimally, stuff was getting promoted to permGen that shouldn't have. Censum indicated very clearly the issue, although the best solution turned out to be more memory.



Aria Media Sagl
+41 (0)76 303 4477 cell
skype: ariamedia

Seb Duggan

unread,
Jun 2, 2016, 8:08:22 AM6/2/16
to Lucee
The server was different - I was previously running on one big (8GB) dedicated server which ran about a dozen Tomcat contexts (inc. 4 MuraCMS instances) and the PostgreSQL database. Lucee was assigned between 3 & 4 GB.

The only difference in the codebase is that one of the Mura instances has been updated to use Mura 7.

Nando Breiter

unread,
Jun 2, 2016, 8:40:10 AM6/2/16
to lu...@googlegroups.com
So it seems to me from what you've posted that the issue's origin could be, roughly speaking

a) too little memory available for Lucee
b) a memory leak ( in Mura 7 ?, your codebase?)
c) a memory issue / leak in Lucee 5

Which do you suspect?



Aria Media Sagl
+41 (0)76 303 4477 cell
skype: ariamedia

Seb Duggan

unread,
Jun 2, 2016, 9:09:50 AM6/2/16
to Lucee
Not really sure.

I'm going to try separating out the Mura 7 instance onto a separate droplet, and see if either then still crashes. Should give me more information whether it does or doesn't...

Nando Breiter

unread,
Jun 2, 2016, 9:42:43 AM6/2/16
to lu...@googlegroups.com
If I were to guess ... I'd start with insufficient memory. If you increase it to 4GB overall, and give Lucee 3GB and the heap stabilizes, (rather than crashing after 4 to 6 hours), then you've more or less solved it. I'm no expert at memory tuning, but in my experience, an application needs to be fairly lean in its use of memory to run with 1.5 GB. 

That said, it still could be a memory leak that you're experiencing, but again, if you increase to 4GB and you still see the heap expanding until the app crashes, you have a much better idea what's going on. 

And by the way, after using the Censum tool, it seems to me that simply copying JVM memory settings from somewhere and pasting them in is about as useful as trying to debug an application without the error message, and without any idea where in the application the error is occuring. Like when a client writes you an email that says " It's not working. Please fix asap!!!! " You may be able to manage just fine with limited memory if you tune the JVM properly, but for that, you'll need to use Censum, or a JVM consultant.



Aria Media Sagl
+41 (0)76 303 4477 cell
skype: ariamedia

Seb Duggan

unread,
Jun 2, 2016, 11:23:26 AM6/2/16
to Lucee
I think 1.5GB should be plenty for many apps. I've got a couple of small/medium sized MuraCMS installs running happily in 768MB on a 1GB droplet...

I've removed the GC argument I had and it seems to be stabilising now. Strangely, for the first few weeks with the -XX:+UseConcMarkSweepGC argument the performance seemed much improved, so I don't know why I saw the sudden dip. But it's now running with no explicit GC arguments, just specifying the heap size (and that's with all the contexts I originally had on the VM).

Needless to say, I'll be keeping a close eye on it over the next few days :-)


- Seb

Andrew Dixon

unread,
Jun 2, 2016, 3:12:43 PM6/2/16
to lu...@googlegroups.com
Hi Seb,

Did you change from Java 7 to Java 8 at the same time as changing from Lucee 4.5 to Lucee 5?

Kind regards,

Andrew

Seb Duggan

unread,
Jun 2, 2016, 3:23:15 PM6/2/16
to lu...@googlegroups.com
Yes, I did.

I also ran Lucee 5 RC for a few weeks in production, and I think it’s since the final release that it’s started crashing. But I don’t really know if that’s related...
You received this message because you are subscribed to a topic in the Google Groups "Lucee" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lucee/Nm_IheluGJs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lucee+un...@googlegroups.com.

To post to this group, send email to lu...@googlegroups.com.

Andrew Dixon

unread,
Jun 2, 2016, 3:49:30 PM6/2/16
to lu...@googlegroups.com
I think (but I'm no expert on Java) that the garbage collection in Java 8 was changed quite a bit, so that might be why removing the -XX:+UseConcMarkSweepGC has improved it stability as I believe in Java 8 it can automatically work out the best garbage collection routine and by specifying one you are overriding that and telling it you want to use that one instead, which might be the best for your situation.

Kind regards,

Andrew

Nando Breiter

unread,
Jun 2, 2016, 4:15:48 PM6/2/16
to lu...@googlegroups.com
I think 1.5GB should be plenty for many apps. I've got a couple of small/medium sized MuraCMS installs running happily in 768MB on a 1GB droplet...

It completely depends on the application! In my case, everything worked perfectly with 1.5GB ... except that one client wanted their pre-planned schedule to extend out 1 year instead of my recommended 90 days ... and most importantly, the code that ran the scheduled task that kept this schedule up to date instantiated a single ORM entity again and again, over the course of an hour. I thought garbage collection would be able to keep up and clean out the entities as the task ran, but what was actually happening was that these were getting promoted to PermGen (it's called premature promotion) and if the scheduled task lasted longer than the available memory, the server would grind to a halt. 

At the end of the scheduled task run, if it made it (often would just stall for lack of memory), GC would clear the PermGen and memory used would drop to "normal" within a few seconds. 

I didn't know any of this until I added logging to the JVM params, let the app run through a few cycles of trying to get through the scheduled task run, and then uploaded the resulting log file into Censum. I could see that PermGen was growing in FusionReactor, but the Censum analysis allowed me to see exactly why this was occurring, and suggested a set of JVM settings to prevent premature promotion. 

Before I figured out how to use Censum, I had many ideas what might be going wrong and what I might be able to do to fix it, including reprogramming the scheduled task to run without the ORM entity, but after I was able to nail down what was actually occurring, it was much easier to adjust the JVM parameters. Here are the JVM settings I ended up with, but these are adjusted in a way that is unique to this particularly application, and particularly to handle the one trouble spot - that single scheduled task run for the client that insisted on a year in advance.

-Xms3072m -Xmx3072m -XX:PermSize=192m -XX:MaxPermSize=384m -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseCompressedOops -Xss256k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=25m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -verbose:gc -Xloggc:gc.log

I know it seems easier to try and guess your way through it, but adding the logging params, gathering some data, and throwing the log at a really good GC analysis tool is actually quite rewarding. :-)



I've removed the GC argument I had and it seems to be stabilising now. 

Nando Breiter

unread,
Jun 2, 2016, 4:53:47 PM6/2/16
to lu...@googlegroups.com
On Thu, Jun 2, 2016 at 9:49 PM, Andrew Dixon <andrew...@gmail.com> wrote:
I think (but I'm no expert on Java) that the garbage collection in Java 8 was changed quite a bit, so that might be why removing the -XX:+UseConcMarkSweepGC has improved it stability as I believe in Java 8 it can automatically work out the best garbage collection routine and by specifying one you are overriding that and telling it you want to use that one instead, which might be the best for your situation.

From what I understand, GC changes can and are implemented with each update, so you can have significant changes between point releases. I'm subscribed to the Censum mailing list, and these guys sometimes remember the modifications made at a particular point release, among, what, hundreds spanning Java 6, 7 and 8, and discuss potential impacts of those updates if someone brings an issue. If Seb took his issue to this mailing list, the response he'd get would be "Show us the GC log", They are very nice, if he didn't know how to generate it, they would explain it.

There's an open source gc log viewer : https://github.com/chewiebug/GCViewer - but I went with Censum since it was easier, and much more comprehensive.

As to which garbage collector is better, Parallel, Concurrent Mark Sweep, or the new G1, .... I think it depends on your app. Everything I read suggests testing your app with each, again with verbose logging enabled, to work it out. But I would not expect a dramatic difference between Parallel and CMS. Perhaps I would expect a significant difference depending on the defaults for NewRatio and SurvivorRatio in a CFML app for a given Java version. Mike Brunt has written that premature promotion is a typical issue he runs across in CFML apps, and the default JVM settings are typically not well suited to a CFML app, especially one that makes use of CFC's.

Bottom line, no need for hand waving or opinions when it comes to JVM memory settings. The key is generating a log file and passing it into a viewer / analysis tool.


Jochem van Dieten

unread,
Jun 2, 2016, 5:37:42 PM6/2/16
to lu...@googlegroups.com

On Jun 2, 2016 12:31, "Seb Duggan" wrote:
> DigitalOcean droplet, 2GB RAM, 2CPUs

> Tomcat memory options:


> CATALINA_OPTS="-Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC";

This is to much. You only leave 512MB for OS, Apache and permgen. And due to the way the JVM works it has to be contiguous space too.

> # Possible solutions:


> #   Decrease Java heap size (-Xmx/-Xms)

Did you try this?

Jochem

Seb Duggan

unread,
Jun 8, 2016, 4:40:42 AM6/8/16
to Lucee
Well, after playing around with it all a bit and trying out different settings, I've had a rethink of my server strategy.

I had been trying to take advantage of the ease of spinning up new droplets on DigitalOcean to separate out all my sites - so each client would have their own droplet, with maybe one or two contexts on it. But the overhead on each droplet meant that I was having to increase the droplet's size and so lose out on the cost savings for small droplets. I had one, for example, which only became stable after I increased it to 4GB, with 3GB given over to Lucee.

On top of that, there was a great deal of wasted space, bandwidth and CPU on each of these droplets.

So I've moved instead to just 2 droplets. I still have a separate VM for database - a 2GB image with PostgreSQL on it, which seems to cope just fine. And an 8GB/4CPU droplet which hosts Apache and Lucee - 7GB given over to Lucee. I have about 10 contexts currently, and it flies along very nicely.

FusionReactor shows me a very nice saw-tooth memory graph, with the peak usage remaining at a steady 2.8GB. After a bit of tuning and waiting and watching, I may even be able to reduce the droplet size to 4GB, but I'm not going to rush into it (I'm just enjoying it being stable at the moment!). It's also currently showing 540 active sessions, with a peak of 680.

My main decision now is what to do about FusionReactor in 8 days when the trial expires. I had a license for it years ago, and found it quite useful, and it seems a whole lot improved since then - but the price has gone up significantly since I paid £199 for v3.5...


- Seb
Reply all
Reply to author
Forward
0 new messages