Debugging a node server that spikes to 100% CPU utilization

4,398 views
Skip to first unread message

Saikat Chakrabarti

unread,
Sep 18, 2010, 9:18:22 PM9/18/10
to nod...@googlegroups.com
I have a node server that will every now and then (maybe once or twice
a day, usually timed perfectly to happen while I'm sleeping =) spike
to 100% CPU utilization and stay there. I know there are any number
of things that could be causing this, so I'm just trying to figure out
what strategies people here use to debug an issue like this. So far,
I've come up with:

1) Look at the logs I already keep to see if there are any patterns.
Add in more logging and hope it helps next time this problem happens
(and doesn't make the problem worse).
2) Try to run apache bench against my server to see if that can trigger it.
3) Try changing out various modules that I know have caused problems
for others in the past (like gzip)
4) Just reason about the code

I'm not too worried about trying things that slow down my server some
- it's not running anywhere near capacity currently. Are there better
ways to debug something like this?

Also, as a side note - not sure if there are any monit experts on this
list, but I'm trying to get monit to just restart node when this
happens. I've used monit successfully for a lot of other processes on
my server, but it doesn't seem to be able to detect the CPU spike and
restart node (I thought this might be because the system as a whole is
going down due to the CPU utilization, but that isn't the case - other
services on my machine are working fine since I'm able to still access
the site and the django server). My monit entry for node is
http://gist.github.com/586242 (I tried using both ">" and "is greater
than" - I have another service on my machine (Xvfb) that leaks quite a
bit of memory, and monit handles restarting that service fine).

Also, I'm not using SSL which is the only other reference I could find
to an issue like this (I'm fairly sure the broken code is not in node
core, but can't be certain).

Thanks in advance for any suggestions!

-- Saikat

Tim Caswell

unread,
Sep 20, 2010, 1:46:28 PM9/20/10
to nod...@googlegroups.com
Those all sound like great ideas.  Especially if you're using something like connect.  Try disabling middleware layers until the error goes away.   I'm current running howtonode.org without inline gzipping because it's caused me problems in the past and I didn't have time to track them down.


--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.


r...@tinyclouds.org

unread,
Sep 20, 2010, 2:10:24 PM9/20/10
to nod...@googlegroups.com
On Sat, Sep 18, 2010 at 6:18 PM, Saikat Chakrabarti
<sai...@gomockingbird.com> wrote:
> I have a node server that will every now and then (maybe once or twice
> a day, usually timed perfectly to happen while I'm sleeping =) spike
> to 100% CPU utilization and stay there.  I know there are any number
> of things that could be causing this, so I'm just trying to figure out
> what strategies people here use to debug an issue like this.  So far,
> I've come up with:
>
> 1) Look at the logs I already keep to see if there are any patterns.
> Add in more logging and hope it helps next time this problem happens
> (and doesn't make the problem worse).
> 2) Try to run apache bench against my server to see if that can trigger it.
> 3) Try changing out various modules that I know have caused problems
> for others in the past (like gzip)
> 4) Just reason about the code
>
> I'm not too worried about trying things that slow down my server some
> - it's not running anywhere near capacity currently.  Are there better
> ways to debug something like this?

Force a coredump and get a stacktrace (with, like, kill -SIGTRAP pid).
That'll probably tell you what's spinning.

Vitali Lovich

unread,
Sep 22, 2010, 12:24:54 AM9/22/10
to nod...@googlegroups.com
If it's not anything obvious, I'd instrument the GC of v8 - it could be that you're allocating a lot & putting pressure on the v8 heap.

Saikat Chakrabarti

unread,
Sep 22, 2010, 12:33:05 AM9/22/10
to nod...@googlegroups.com
In case anyone is interested about this, I had a hunch that it was the
gzip module which I verified with a few more log messages, and after
Ivan's recent fixes, I switched mine out with his. I still see spikes
in CPU usage just because gzipping is CPU-intensive, and I'm doing it
a lot, but so far the server hasn't hung.

I also managed to get monit to restart the process in high CPU cases
by checking for connection failures to the port instead of monitoring
the CPU usage. For some reason, I never did get that working.

Thanks for the suggestions!

Kenny B

unread,
Sep 25, 2010, 4:54:38 AM9/25/10
to nod...@googlegroups.com
compile node with debug turned on (so you get the symbols) and you can still use node instead of node_g

then, when it goes to 100% type:

gdb attach `pidof node`

this gets you into gdb. if you know anything about gdb, you should be just fine, but if not,some basic commands are:
bt -> shows the current stacktrace of the current moment in the program
continue -> continues executing the program
CTRL-C -> pause the program a moment so you can check the stack trace

there are more advanced things to do like setting breakpoints and stuff... but hopefully that should get you started
Reply all
Reply to author
Forward
0 new messages