Lucee 4.5 - Tomcat8 high cpu activity but no active requests in fusion reactor

67 views
Skip to first unread message

Dan Baughman

unread,
May 17, 2017, 9:15:08 AM5/17/17
to FusionReactor
Hey gang,

I'm trying to use fusionreactor to find a peformance problem on a server which I'm pretty sure has just slow disk IO.

But every once in a while the tomcat8 will be pretty steady at about 50% cpu while not servicing requests (http connections will just spin and spin) but fusion reactor doesn't show any active requests or activity.

How can I see what is happening there?

Charlie Arehart

unread,
May 17, 2017, 9:38:36 AM5/17/17
to FusionReactor
Hi, Dan. I have a thought. (Others may reply with briefer answers. I prefer to offer ones with more details, as I find in my daily troubleshooting consulting that there are different underlying causes for the kind of thing you're seeing. Different strokes...)

To be clear, you do mean you CAN get into FR but the requests>activity page shows no requests running, right? More on that in a moment. But what about if you look at requests>history (the 100 most recent requests). Does it show any running slowly in recent seconds? It could be that there are "none running right now" but there had been "some running slow recently" that reflected the slowness you see in your browser requests, which may change your perspective that "FR doesn't show the problem happening".

Of course, seeing Tomcat (or whatever engine one is using) showing 50% CPU is indeed an indication that it's not happy, and we want to understand that. And when you say " http connections will just spin and spin", that's distressing. I want to take these in parts.

1) Since you say you can get into FR (to see, for instance, that requests>activity page), that's an indication that the instance being monitored is not "so gone" that it's unable to process FR UI requests. (Just something to keep in mind, as FR does indeed run in the same JVM as Lucee, or CF, or whatever you are monitoring.)

2) And that then begs the question of why you would see "requests spinning" (in their browser) and yet not show running in the instance, and especially if you may also say that there are none showing to have recently run in the requests>history page.

One thought would be to check the web server connector. Since you're on Lucee with Tomcat (and it would apply to CF10 and above, as they are on Tomcat), this "spinning" of requests while there are NONE showing in FR could be an indication of a need of web connector tuning.

(Adobe blogged about it in 2012 and 2014 for CF10 and CF11. Just google coldfusion iis tuning to learn more. It applies to Apache as well. And everything there but the mention of "max_reuse_connection" is pure Tomcat stuff. Only that "max_reuse_connection" is something unique to CF.)

3) But back to the fact that you show Tomcat at 50%, you'd want to know about that of course. I will point out that sometimes when you see overall slowness (whether running or recent), it may be an indication of a general JVM problem.

3a) I would look first at the resources>garbage collection page (or its corresponding log, if the instance crashes). Note that there are separate indications of the two types of GCs (in the UI and logs), typically indicated as marksweep (aka "major GCs") and scavenge ("minor"). You want to see few if any majors, and you don't want to see them taking a really long time (seconds or more), because when that happens, you will see the high CPU and impact on requests (running or recent).

3b) Next, consider looking at the resources>memory spaces page (or its corresponding logs). In the top right, there is a drop-down to choose the kind of memory to show. The most important are typically the old gen (where long-lived objects stay and may cause excessive heap use over time), and the code cache (a surprising cause of hangups to many), and the metaspace (in Java 8, or permgen in Java 7), which can also fill and be a gotcha. If any of those memory spaces are nearly full or filling and dropping, they could be a possible explanation for sudden (or persistent) slowness.

Finally, with both those features, and indeed most FR graphs, note that in the UI you can look back at the last minute, hour, day, or week (though beware that the larger the time, the larger the interval over which averages are taken, so brief spikes are smoothed out or even lost, at larger intervals.)

Let us know if any of that helps. And even if it may not address your need somehow, I hope it may help others.

BTW, I will be doing a free 60-minute webinar starting in about 20 mins (as I write) on using FR for post-crash troubleshooting. (Other talks have address stuff like the above, on using FR when the problem is currently happening.) For more on that and the past recordings, just google: fusionreactor webinars, or see https://www.fusion-reactor.com/webinars.

/charlie

michael...@intergral.com

unread,
May 18, 2017, 7:18:09 AM5/18/17
to FusionReactor
Hi Dan, Charlie

After reading Charlie's response there is not a lot I can add to the discussion here, the only input I can put out there is that when this happens I often see that the server is running out of perm gen or meta space (depending on the java version) which will often cause this issue.

You can determine this by looking in Resources > Memory Spaces, an indication that this is the case can also be found by looking at the garbage collection of your server. You will often see high amounts of garbage collection, where in the previous day it was much lower.

I would recommend following charlie's steps and also like he mentioned viewing some of our webinars.

Michael Flewitt
FusionReactor Support Team

charlie arehart

unread,
May 18, 2017, 9:45:29 AM5/18/17
to fusion...@googlegroups.com

Thanks for the followup and encouragement. Just to be clear, as for your suggestion at the end of your first sentence, I’ll note I did cover that in my 3b. :-) Again, I appreciate that it was a long note, and some may be reading things on their phone and only scanning. I can never know, so I give the details for those it may help.

So since this is a relatively common problem for which people may wan to know how to solve it, let me bullet point things for Dan and others:
- since you can get to FR (and see “no requests running”), that tells us your instance (Tomcat) is not “down” or crashed, which may be useful to keep in mind
- even if there may be no current running requests, what about recent requests? Are they all slow? See the requests>history page in FR
- if FR may show NO requests have run for some time, the issue may be in the web server or web server connector (not letting them in)
- the high CPU in your instance may be due to JVM issues. Check out first FR’s resources>garbage collection page. Has the rate or duration increased?
- check out FR’s resources>memory spaces page (and choose among them on the top right), to see if any of them are high or have risen significantly in recent time
- consider in particular the metaspace (java 8) or permgen (java  7), as well as the codecache or finally the oldgen
- and if FR has restarted since the problem, you can find all this info in FR logs (zipped up hourly in an archive folder and kept for 30 days, by default)

Again, the details on each are in my first reply, and still more of course in the webinars, and the docs cover use of features.

/charlie

Dan Baughman

unread,
May 18, 2017, 1:46:37 PM5/18/17
to FusionReactor
Thanks, Charlie.

I used your guide and stepped through those. I'll be doing some additional monitoring.

I do have lucee configured to use lots of memory - up to 3.5 gigs. I don't see any mark sweeps but I do see a minor event freeing about a gigabyte of memory every few seconds but it completes typically in just a few milliseconds.  I'm on JVM 1.8 so it uses the metaspace versus the perm gen and it isn't over 100 megs.  I did have a few requests running intermittently which were using gigabytes of memory and those were easily refactored to only use a few hundred - but I still wasn't able to figure out what Lucee is doing when it isn't servicing any requests but the CPU usages was still so high.

I'll take a look at those IIS tuning docs.

Dan
Reply all
Reply to author
Forward
0 new messages