Tips for finding runaway CPU bugs

53 views
Skip to first unread message

Bgsosh

unread,
Oct 10, 2016, 9:53:03 AM10/10/16
to nodejs
Hi,

I'm having a tough time tracking down an issue we currently have in production.  Our node processes will sometimes suddenly spike in CPU usage, and then stay pegged at 100% until restarted.

I'm not able to reproduce on development machine.  Could anyone offer any tips for tracking this down?  Any advice would be appreciated as I'm currently not getting enough sleep!

Thank you

Bg

Zlatko

unread,
Oct 12, 2016, 10:38:39 PM10/12/16
to nodejs
Although it's a CPU issue, a heapdump could still be useful. There are ways to do it, such as https://www.npmjs.com/package/heapdump. You can try to get a dump once the CPU goes up and simply load it into Chrome dev tools - to see what is going on, perhaps there's a hint.
If while at 100% you can't get a dump, you can try doing increments, like every ten seconds or so and then see what changed closer to the problem.

Even better would be if you could run with debugger directly -something like the new `node --inspect server.js`, but that depends on where your servers are, how you run them etc.

Xinyong Wang

unread,
Oct 12, 2016, 10:38:39 PM10/12/16
to nodejs
it seems like the process went into an infinite loop.

if you use linux, you can use strace to track syscalls

$ strace -p <pid>


$ node debug -p <pid>
and `backtrace` commands may help.

gdb is another tool, it can report native backtrace.

$ gdb -p <pid>

and type `bt`

Russ Frank

unread,
Oct 12, 2016, 10:38:40 PM10/12/16
to nodejs
You definitely want to flamegraph the processes: http://www.brendangregg.com/blog/2014-09-17/node-flame-graphs-on-linux.html

This problem is often caused by the GC when you approach V8's heap size limit (which is around 1.5gb). So if you see your procs using around that much memory, you'll have to find a way to limit their memory usage.

Karim Tarek

unread,
Oct 12, 2016, 10:38:40 PM10/12/16
to nodejs
Hello Bg,

I faced a similar problem and after searching everywhere and checking every source possible, it turned out to be a faulty URL. An encoding/decoding problem which when passed through express it caused an infinite loop somewhere in the express library. So a way to check that might be tracing (logging) all the URLs handled by your servers and check them when the 100% spike happens. And see what was the last 10-20 URLs your servers received (in my case when the spike happened everything stopped including logging) and  then try to replay those on your development/staging environment. Also for the getting enough sleep. As a temp solution you can write a 10 lines bash script which auto restart the node processes (using ps or any other cpu stats command) when their CPU usage exceed 90% or w/e for more than 1 minute (you can do that by flags) so that you don't have to wakeup and restart the node processes manually knowing that "forever" and other NPM packages didn't work for me on the restart part because of the 100% CPU problem.


On Monday, October 10, 2016 at 3:53:03 PM UTC+2, Bgsosh wrote:

Ben Noordhuis

unread,
Oct 12, 2016, 10:39:25 PM10/12/16
to nod...@googlegroups.com
`node --perf_basic_prof app.js`, then `perf top -p <pid>` when it
starts to busy-loop.

That assumes you run Linux and your copy of node is new enough to have
perf support. If that isn't the case, you now have a good reason to
make the switch. :-)
Reply all
Reply to author
Forward
0 new messages