Hello!
On Thu, Apr 9, 2015 at 1:26 PM, Lord Nynex wrote:
> This is about a week old.
>
> The conclusion I reached was the volume of data I was sending back and forth
> to redis was too large. I've been re factoring the data structure to cut
> down on the amount of data needing to be read.
>
From the off-CPU flame graph, it seems that your nginx processes are
CPU-bound, which is a good thing. But your off-CPU flame graph also
suggests a bad thing, that is, quite some frames are pure CPU
computations like those lj_BC_XXX frames which are just LuaJIT VM
interpreting bytecode instructions. This is usually a sign of massive
process preemption enforced by the kernel process scheduler. Maybe
your nginx workers are competing CPU time with each other and/or your
nginx workers are competing CPU with other processes like
redis-server? A simple improvement is to set CPU affinity
appropriately and bind each busy processes to dedicated CPU core(s) to
avoid such meaningless context switches (which is expensive).
Your on-CPU flame graph shows a more important issue. Your Lua code in
nginx is the dominating bottleneck. Most of your Lua code is not JIT
compiled because of all those lj_BC_XXX frames which belong to the
LuaJIT interpreter. I suggest focusing on JITting as much as your Lua
code as possible on the hot Lua code paths, which can make a big
difference. Making more Lua code JIT compiled is usually about working
around NYI primitives listed here:
http://wiki.luajit.org/NYI
CloudFlare has been sponsoring Mike Pall to reduce the NYI list in
LuaJIT 2.1 (which is the default in recent OpenResty bundle releases).
Use the jit.v and/or jit.dump Lua modules shipped with the default
LuaJIT installation can help analyze what NYIs are in your way (these
Lua modules are also used to implement the luajit -jv command-line
option).
In particular, I see the lj_BC_ITERN frame which is for executing the
ITERN bytecode while iterating the hash-table-like Lua tables via
pairs() or something like that. Avoid iterating hash-tables whenever
possible. It cannot be JIT compiled (even it will be in the future,
iterating a hash-table can never be really cheap because hash-table is
not designed for iteration operations anyway). We usually use a flat
array of key-value pairs in the same hash-table specifically for
iterations, or just use a flat array exclusively if we don't need to
or seldom want to index values by keys.
BTW, generating a Lua-land flame graph with the lj-lua-stacks tool can
further help nailing down the "hot Lua code paths":
https://github.com/openresty/stapxx#lj-lua-stacks
We may find some surprises and more low hanging fruits with this new
perspective :)
Hope this helps!
Best regards,
-agentzh