High CPU usage because of scheduled tasks

1,413 views
Skip to first unread message

Sergii Galashyn

unread,
Mar 10, 2012, 8:53:23 AM3/10/12
to ra...@googlegroups.com
Subject may sound confusing, but here's the story.

Ubuntu server VPS is running old(-ish) setup of Railo 3 with Resin (x86), since 2009. I've used preview updates provider, except couple of occasional rollbacks it worked just fine.

Few weeks ago Railo service started using CPU resources more intensively. There was no user base growth, or traffic.

Strangest thing here is that CPU usage was high right from the service startup and continued to stay high for hours. Usually it was 100% (1 of 4 CPU), sometimes > 200%, or even > 300%.

First decision was to upgrade. So I've installed latest Railo+Tomcat (3.3.1.000). Same result. Updated to 3.3.2.002 -- the same.

Next was playing with JVM settings, without much success. Currently they look like this:

JAVA_OPTS="-Xms256m -Xmx1024m -XX:MaxPermSize=64m -XX:+UseParallelGC -XX:ParallelGCThreads=20 ";

Next upgraded Java to 1.7.0_03, same result as 1.6.0_29 and 1.6.0_26.

I've played with server.xml and found out that problem with only one application. Well, not with application in fact: I've tried to remove my code from the webroot and result was the same.

So it should be something wrong with 3 year old WEB-INF, right? Yes, when removed and created on Railo restart -- CPU levels were fine.

While restoring the context settings I've discovered that reason is actually scheduler. When scheduler.xml is blank -- all works fine.

Because my previous setup was old, I've fixed config syntax: for example port is now explicitly stored with URL.

Currently config looks like this http://pastebin.com/jH98jEvz -- tasks work just fine, applications also feel OK. Only CPU usage stays high...

I've played with config and looks like deleting some of the tasks helps, for example this is cut and "safe" version of config: http://pastebin.com/k6QkanPg -- when I use it everything is OK.

The more of "unsafe" entries I add -- the more CPU is used. In full version it looks like this http://dl.dropbox.com/u/2495797/img/1203/tomcat-cpu-htop.png

By the way, I'm using very similar setup on my local box and it is consuming this config without any complaints. This is even with mostly default Tomcat/JVM settings.

Why would Railo do this? Any ideas how to fix?

Thanks,
-sg

Cameron Childress

unread,
Mar 10, 2012, 9:02:07 AM3/10/12
to ra...@googlegroups.com
What happens when you run the scheduled task by hand?  Outside of the scheduler?  What does FusionReactor show?

My first thought would be that it's not the scheduler, it's the CFML inside the tasks.

I also wonder - when Railo is restarted, does it run all "interval < daily" tasks?  I wouldn't expect daily or weekly tasks to run at startup, but I do note that you have one task that runs on interval=3600 for caching blog entries.  That *might* run on startup.  Not sure.

It's also worth putting FusionReactor on that box (if you haven't) to see what's going on, but that's where I would look first.

-Cameron
--
Cameron Childress
--
p:   678.637.5072
im: cameroncf

Sergii Galashyn

unread,
Mar 10, 2012, 9:09:56 AM3/10/12
to ra...@googlegroups.com
You are right, I've installed FusionReaction immediately when problems started. It shows high CPU but no threads are running. By the way, HR doesn't recognize >1 CPU in metrics.

Any way, I've tried to remove all the code except WEB-INF from the web root, which of course includes code invoked by tasks.

That's the essence of my problem: blank application does this after each restart.

Plus I am preffy confident none of the tasks are executed on startup, because app starts the init process only when I open it in browser.

Thanks,
-sg

10.03.12 16:02, Cameron Childress написав(ла):
Message has been deleted

Sergii Galashyn

unread,
Mar 10, 2012, 12:16:38 PM3/10/12
to ra...@googlegroups.com
Cameron, your response reminded me about scheduler.log. Here's the output after Railo restart:

"INFO","web-0","03/10/2012","09:11:57","schedule task:Check Expired Accounts","first execution runs at {ts '2012-03-11 00:01:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Cleanup Temp","first execution runs at {ts '2012-03-17 04:00:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Cache Blog Entries","first execution runs at {ts '2012-03-10 10:01:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Purge Forms Queue","first execution runs at {ts '2012-03-11 00:02:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Weekly Digest 6","first execution runs at {ts '2012-03-11 10:00:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 22","first execution runs at {ts '2012-03-20 01:20:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 16","first execution runs at {ts '2012-03-13 01:10:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 43","first execution runs at {ts '2012-03-13 01:15:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 45","first execution runs at {ts '2012-03-11 01:25:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 53","first execution runs at {ts '2012-03-13 01:30:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 74","first execution runs at {ts '2012-03-15 01:45:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 72","first execution runs at {ts '2012-03-16 01:40:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 75","first execution runs at {ts '2012-03-16 01:50:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 91","first execution runs at {ts '2012-03-16 01:55:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 101","first execution runs at {ts '2012-03-17 02:05:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 103","first execution runs at {ts '2012-04-04 02:10:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 122","first execution runs at {ts '2012-03-13 02:20:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 126","first execution runs at {ts '2012-03-17 02:30:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 130","first execution runs at {ts '2012-03-14 02:40:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 63","first execution runs at {ts '2012-03-11 01:35:00'}"
"INFO","web-0","03/10/2012","09:11:57","schedule task:Generate Report 67","first execution runs at {ts '2012-03-11 01:05:00'}"

Interestingly, some of the tasks are not present in log -- totally five of them. For example, "Generate Report 111" and "Generate Report 124".

I can't see any reasons of this, they look absolutely the same as others.

Any way, this is better than nothing. First I've deleted all these tasks -- and problem disappeared!

Still, when I create them manually problem returns... Only difference between these and "safe" tasks is the time. What's the heck?

Deleted these tasks for now.


Thanks,
-sg


10.03.12 16:02, Cameron Childress написав(ла):
What happens when you run the scheduled task by hand?  Outside of the scheduler?  What does FusionReactor show?

Derrick Peavy

unread,
Mar 10, 2012, 12:18:47 PM3/10/12
to Railo
Sergii,

I checked in this morning to post an update about this - I had asked a
similar question previously about Tomcat setenv.sh file and settings.

In my case (similar), loads would be fine at first (0.2 0.0 0.3)
something like that. Then, over a day, it would grow and eventually I
would see something like CPU usage at 120%^ and loads of 100.0 120.0
101.0.

Something was wrong. My server is my own hardware in a colo running
Ubuntu Server 11.04. Apache 2, Tomcat 7, Railo 3, MySQL 5. It's a
dual quad core Xeon 5620 chipset, 48 GB of RAM and all SSD on GB
connection. So I/O should never be an issue. And considering that I
was running the same app for years on a 7 year old PPC (Apple) X Serve
system with no problems under Blue Dragon, surely the issue had to be
my heap settings.

After weeks of tinkering, trying different settings and researching,
here are my results.


Started with

=======> export JAVA_OPTS="-server -Xmx2g -XX:MaxPermSize=256m -
Djava.awt.headless=true";

Ran OK, but had random problems with CPU load, out of memory errors
with Railo when running large imports (XML parse).



Then... got "smart" and tried to learn all about the garbage
collection and "tweak" things (ha!)

=======> export JAVA_OPTS="-server -Xss1024k -Xms1g -Xmx16g -
XX:MaxPermSize=128m -XX:NewSize=512m -XX:+UseConcMarkSweepGC -XX:
+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=80 -
Djava.awt.headless=true"

That didn't help. Problems were only worse.



Then,

=======> export JAVA_OPTS="-server -Xss1024k -Xms1g -Xmx16g -
XX:MaxPermSize=512m -XX:NewSize=64m -XX:+UseConcMarkSweepGC -XX:
+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=80 -XX:
+CMSClassUnloadingEnabled -XX:+UseParNewGC -Djava.awt.headless=true";

Causes max cpu usage, lods above 100.0 and use of all memory. WTF?



Still believing I knew what I was doing, I tried...

=======> export JAVA_OPTS="-server -XX:+UseConcMarkSweepGC -XX:
+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=80 -XX:
+CMSClassUnloadingEnabled -XX:+UseParNewGC -Djava.awt.headless=true";

Result, not enough memory left within the heap to run maintenance on
databases on a regular basis



Next up, more research:
- http://sanjaal.com/java/tag/jvm-option-parameter-xxuseconcmarksweepgc/
- http://lucene.472066.n3.nabble.com/Garbage-Collection-I-have-given-bad-advice-in-the-past-td3102099.html

Which lead to...

=======> export JAVA_OPTS="-server -Xss1024k -Xms1g -Xmx2g -
XX:MaxPermSize=512m -XX:NewSize=64m -XX:+UseParallelGC -
XX:ParallelGCThreads=16 -XX:+CMSClassUnloadingEnabled -
Djava.awt.headless=true";




Still had problems, so I tried...

=======> export JAVA_OPTS="-server -Xss1024k -Xms1g -Xmx2g -
XX:MaxPermSize=512m -XX:NewSize=64m -Djava.awt.headless=true";

this built up load until more and more cpu and memory were used.
didn't reach the 100% laods of past (70.x, etc), but was headed there.




Next up...

Found in the Railo group, according to bruce (bottom of page)
https://groups.google.com/forum/#!topic/railo/3TF9YxtEaR0
But then sean corfield weighed in, in response to me a few weeks ago
http://groups.google.com/group/railo/browse_thread/thread/bd92b0747759274d
try this next (worked fine)

=======> export JAVA_OPTS="-Xms1024m -Xmx1024m -XX:MaxPermSize=512m -
XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC -XX:
+CMSClassUnloadingEnabled";

YES! Finally!!! Worked great! A bit slow for some page loads that are
intensive. The trick seems to be to set the Xms and Xmx to the same
so that java doesn't have to do this dynamically. Since this worked
great, decided to start adding more memory to the heap to see if there
was a limit or problem with more memory and garbage collection...

Loads were always 0.x 0.x and 0.x, CPU never above 1 or 2%




So I added more memory next to see how it would go...

=======> export JAVA_OPTS="-Xms2048m -Xmx2048m -XX:MaxPermSize=512m -
XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC -XX:
+CMSClassUnloadingEnabled";

Worked great! Things started speeding up (heavy database queries,
etc). Loads were always 0.x 0.x and 0.x, CPU never above 1 or 2%




More still... AND I noticed that I had left off the awt headless
flag, so I put that back in.

=======> export JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=512m -
XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC -XX:
+CMSClassUnloadingEnabled -Djava.awt.headless=true";

Things going swimmingly. Sweet! Loads were always 0.x 0.x and 0.x,
CPU never above 1 or 2%



Now, let's try 8 GB...

=======> export JAVA_OPTS="-Xms8192m -Xmx8192m -XX:MaxPermSize=512m -
XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC -XX:
+CMSClassUnloadingEnabled -Djava.awt.headless=true";


F'n awesome! Loads were always 0.x 0.x and 0.x, CPU never above 1 or
2%



And now, let's go to 1/3 of available RAM since that's why it's there,
no sense in having it if you're not using it - 16 GB!!

=======> export JAVA_OPTS="-Xms16384m -Xmx16384m -XX:MaxPermSize=512m
-XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC -XX:
+CMSClassUnloadingEnabled -Djava.awt.headless=true";


Sweet!!! Loads were always 0.x 0.x and 0.x, CPU never above 1 or 2%,
page loads are lightening fast, GB+ size data parsing it done in
seconds, everyone is flippin happy!!




Bottom line - it looks like it's a bad idea to try and be smarter
than your modern, updated software (64 bit OS, latest Java (6 or 7),
latest Tomcat and Railo. It would appear that the garbage collection
and other settings have already been considered and well implemented.
So the final setting would seem to be:

-Xms16384m
-Xmx16384m
-XX:MaxPermSize=512m
-XX:+ExplicitGCInvokesConcurrent
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-Djava.awt.headless=true";

Where -Xms16384m -Xmx16384m -XX:MaxPermSize=512m are all set
according to your available memory, and how heavy duty you think your
app is or needs or wants to be.


Hope it helps!!!

-Derrick
> Currently config looks like thishttp://pastebin.com/jH98jEvz-- tasks work
> just fine, applications also feel OK. Only CPU usage stays high...
>
> I've played with config and looks like deleting some of the tasks helps,
> for example this is cut and "safe" version of config:http://pastebin.com/k6QkanPg-- when I use it everything is OK.
>
> The more of "unsafe" entries I add -- the more CPU is used. In full version
> it looks like thishttp://dl.dropbox.com/u/2495797/img/1203/tomcat-cpu-htop.png

Sergii Galashyn

unread,
Mar 10, 2012, 12:21:31 PM3/10/12
to ra...@googlegroups.com
Derrick,

Thanks for the response. I definitely going to research this JVM tuning
subject deeper.

-sg


10.03.12 19:18, Derrick Peavy написав(ла):

Cameron Childress

unread,
Mar 10, 2012, 12:26:39 PM3/10/12
to ra...@googlegroups.com
On Sat, Mar 10, 2012 at 12:16 PM, Sergii Galashyn <tro...@gmail.com> wrote:
Interestingly, some of the tasks are not present in log -- totally five of them. For example, "Generate Report 111" and "Generate Report 124".

Your PasteBin samples are no longer accessible, so I can't look, but that's definitely a hint.  If those 5 items are at the bottom I'd specifically look at the first, and the item right before it.  Anything that these tasks have in common could be a hint
 
I can't see any reasons of this, they look absolutely the same as others.

Totally a stab in the dark here, but the thought occurred to me...  Look for any words in the tasks that could be reserved words in Java, or could be the same as a built in setter or getter in Railo's guts.  

I'd just pick one offending task, clear the rest out of the system, then add that ONE task back. Then start changing settings on it (including name, urlstring, etc) one at a time till you see the error stop happening. You can look at the JVM/Memory settings too, but I wouldn't put those at the top of my list.
 
-Cameron

Sergii Galashyn

unread,
Mar 13, 2012, 2:51:53 PM3/13/12
to ra...@googlegroups.com
Think I have found the reason.

Let's say we have three tasks: A, B, C.

A - daily - 2:25
B - weekly - 2:35
C - daily - 2:45

With this exact configuration CPU goes crazy.

If I change B to run daily -- everything is fine.
If I make it run at 3:00 -- everything is fine.

At some point I thought task timeout can be the reason: it was set 300 seconds.  But it wont work with 50 as well.

Same time problem is resolved if I make bigger gaps between task times: 15 min instead of 10. This looks pretty strange, but I've re-checked this approach few times: once I set 10 or less -- Railo goes crazy.

I've changed my code to create the tasks with 15 min gaps, so problem is resolved (kind of) for now. But I guess this workaround wont work for a long time.

Maybe I don't know something about Railo scheduling?

Thanks,
-sg


10.03.12 19:26, Cameron Childress написав(ла):
Reply all
Reply to author
Forward
0 new messages