Openresty+redis performance improvement

Bogdan Irimia

unread,

Apr 7, 2015, 11:01:50 AM4/7/15

to openre...@googlegroups.com

Hello, everyone

We have implemented a monitoring system based on openresty+redis, which receives text messages which contain numerical values, and these values go into a redis data structure for storing as time-series stats. The data is structured in redis so that for each interval, each parameter has a key of type hash in redis, each element of the hash containing the statistical value for a moment in time.
For example:

The key:
stats:<host_id>:cpu0:linux-cpurec:idlepct:132258:60:MAX
has stats values for the host with the id specified, for cpu0, from message linux-cpurec, for the idlepct parameter, for timeslot 132258, for a 60-seconds resolution. The function applied for calculating values is MAX (if we have more than one value during 60-secs).
This key holds 180 items, because we are storing values for 3 hours (each value is at 60 seconds, which is the resolution).
The key expires after a specified time (double the size of the width of the interval).

The issue is that we are testing this application under high load: 200 hosts sending 4-5 messages each minute still works ok. When going to 300 hosts, we are reaching a limit of CPU load on the server. At 400 hosts, we have redis issues (dropped connections, even core dumps).

We read suggestions about using sorted-sets in redis, but the complexity of inserting data in a sorted set is higher than for hashes, and also the size seems to be bigger.

The question is how can we improve the data structure in regards with the performance of storing data, as well as the memory requirements.

Thank you

Bogdan

Luis Gasca

unread,

Apr 7, 2015, 11:31:43 AM4/7/15

to openre...@googlegroups.com

Bodgan,

Using redis pipelining is likely to solve many of your problems.

Note that lua-resty-redis fully supports this (via init_pipeline and commit_pipeline)

Hope this helps,
Luis

--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lord Nynex

unread,

Apr 7, 2015, 3:43:06 PM4/7/15

to openre...@googlegroups.com

For what it's worth I'm in the same boat. I've been up for 2 days trying to get the systems CPU down because of the amount of redis calls required.

I'm in the middle of putting pipelining in, but I'm still skeptical that this will solve the issue in it's entirety.

--

Bogdan Irimia

unread,

Apr 8, 2015, 4:11:33 AM4/8/15

to openre...@googlegroups.com

I forgot to mention we DID use pipelining, but to some extent, because if-branches can only be made outside pipelines. We used some stored procedures in simple pieces of logic where a result from redis was needed to compute another value.
Anyway, I can say we covered the main code path very well with pipelines. Indeed, without pipelines the results were much worse.

Do you think the data structure can be optimized any further?

Thank you

Lord Nynex

Tuesday, April 07, 2015 22:43

For what it's worth I'm in the same boat. I've been up for 2 days trying to get the systems CPU down because of the amount of redis calls required.

I'm in the middle of putting pipelining in, but I'm still skeptical that this will solve the issue in it's entirety.

--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bogdan Irimia

Tuesday, April 07, 2015 18:01

Lord Nynex

unread,

Apr 8, 2015, 4:37:24 AM4/8/15

to openre...@googlegroups.com

My issue seems related to the amount of data I'm reading and the frequency with which I'm reading it. Using agentzh's systemtap tools I've come to the conclusion that my CPU usage is terrible because of the amount of time ngx_lua is spending reading data off the redis socket. In my case SINTER and SUNION operations would be beneficial but I can't because the volume of data requires that I use redis pre-sharding. Because of this I can not guarantee the keys exist in the same shard. Even if this were not an issue, I'm uncomfortable with implementing lua logic within redis because of it's ability to block the redis server entirely.

My current plan is to consolidate these properties into a hash object and 'de-normalize' my data models to lazily load properties on demand. I'm currently working on a method to create a bloom filter and serialize it in the hash object to indicate whether a server side union or intersection is required. This will hopefully cut down on the number of reads from redis that are required. Hopefully pipelining will also help.

I've also been thinking about protobuf or compression on the data inserted into redis. Currently everything is serialized via cjson. I'm uncomfortable with the CPU expense of this constant deserialization in read operations. I have not spent any time researching how I would do compression within openresty though. My preference would be https://github.com/forhappy/lua-snappy

Bogdan Irimia

unread,

Apr 8, 2015, 4:50:10 AM4/8/15

to openre...@googlegroups.com

Compressing data for redis indeed would reduce memory, but at the expense of CPU cycles.
Do you think lua-snappy compressing/decompressing is more efficient (in terms of CPU cycles) than json serialization? And, of course, they don't address the same problem. How would you compress complex data structures with lua-snappy?

Regarding the bloom filter, it too adds CPU cycles but reduces memory. Do you have that many elements so that "perfect hashing" is not optimal?

Let me know what improvements you bring to your app, maybe we can adapt some of them to our case.

Thank you

Lord Nynex

Wednesday, April 08, 2015 11:37

My issue seems related to the amount of data I'm reading and the frequency with which I'm reading it. Using agentzh's systemtap tools I've come to the conclusion that my CPU usage is terrible because of the amount of time ngx_lua is spending reading data off the redis socket. In my case SINTER and SUNION operations would be beneficial but I can't because the volume of data requires that I use redis pre-sharding. Because of this I can not guarantee the keys exist in the same shard. Even if this were not an issue, I'm uncomfortable with implementing lua logic within redis because of it's ability to block the redis server entirely.

My current plan is to consolidate these properties into a hash object and 'de-normalize' my data models to lazily load properties on demand. I'm currently working on a method to create a bloom filter and serialize it in the hash object to indicate whether a server side union or intersection is required. This will hopefully cut down on the number of reads from redis that are required. Hopefully pipelining will also help.

I've also been thinking about protobuf or compression on the data inserted into redis. Currently everything is serialized via cjson. I'm uncomfortable with the CPU expense of this constant deserialization in read operations. I have not spent any time researching how I would do compression within openresty though. My preference would be https://github.com/forhappy/lua-snappy

--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bogdan Irimia

Wednesday, April 08, 2015 11:11

Stefan Parvu

unread,

Apr 8, 2015, 7:03:43 AM4/8/15

to openre...@googlegroups.com

> My issue seems related to the amount of data I'm reading and the frequency
> with which I'm reading it. Using agentzh's systemtap tools I've come to the
> conclusion that my CPU usage is terrible because of the amount of time
> ngx_lua is spending reading data off the redis socket. In my case SINTER

We (me and Bogdan) have similar issues: amount of data. And more or less don't know for sure
the time spent in NGINX/Lua handling that. We see 85-90% CPU utilization on Redis
side when the load increases. For example:

N HTTP Errors DB Errors HTTP Workers CPU % DB Process CPU %
50 No No 2 8
100 No No 6 15
200 No No 15 30
300 No No 35 55
400 No No 40 85

We are running on FreeBSD. I am currently analysing the results after playing around
with dtrace and flamegraph on FreeBSD:

NGINX: http://www.kronometrix.org/bugs/nginx.300ds.svg
DB: http://www.kronometrix.org/bugs/redis.300ds.svg

Interesting enough, after a fresh boot of Redis, when processing within OpenResty + Redis
300DS (data sources) I can see the redis-server process using 30-40% cpu utilization
but after the first snapshot on disk (latest_fork_usec:39150) things are starting to slow down
a bit. From that moment on, the snapshots will take a bit longer and longer and redis-server
will use more and more CPU.

One quick thing coming to my mind is the number of keys processes:

Before snapshot:
# Keyspace
db0:keys=1336359,expires=1167391,avg_ttl=1874307780

After snapshot:
# Keyspace
db0:keys=1352573,expires=1183605,avg_ttl=1257523821

--
Stefan Parvu <spa...@kronometrix.org>

Stefan Parvu

unread,

Apr 8, 2015, 1:30:00 PM4/8/15

to openre...@googlegroups.com

> DB: http://www.kronometrix.org/bugs/redis.300ds.svg

redis-server`dictFind
redis-server`getExpire
redis-server`expireIfNeeded
redis-server`lookupKeyReadOrReply
redis-server`hgetCommand
redis-server`call
redis-server`luaRedisGenericCommand
redis-server`luaD_precall
redis-server`luaV_execute
redis-server`luaD_call
redis-server`luaD_rawunprotected
redis-server`luaD_pcall
redis-server`lua_pcall
redis-server`evalGenericCommand
redis-server`call
redis-server`processCommand
redis-server`processInputBuffer
redis-server`readQueryFromClient
redis-server`aeProcessEvents
redis-server`aeMain
redis-server`main
redis-server`_start

This somehow tells me Redis uses Lua and not LuaJIT ? Looking into source code
of Redis 2.8.19 I see these:

README for Lua 5.1

See INSTALL for installation instructions.
See HISTORY for a summary of changes since the last released version.

% ls -lrt
total 61
drwxr-xr-x 2 krmx krmx 22 Dec 16 10:18 test
drwxr-xr-x 2 krmx krmx 66 Dec 16 10:18 src
drwxr-xr-x 2 krmx krmx 12 Dec 16 10:18 etc
drwxr-xr-x 2 krmx krmx 13 Dec 16 10:18 doc
-rw-r--r-- 1 krmx krmx 1378 Dec 16 10:18 README
-rw-r--r-- 1 krmx krmx 3695 Dec 16 10:18 Makefile
-rw-r--r-- 1 krmx krmx 3868 Dec 16 10:18 INSTALL
-rw-r--r-- 1 krmx krmx 7907 Dec 16 10:18 HISTORY
-rw-r--r-- 1 krmx krmx 1528 Dec 16 10:18 COPYRIGHT

I had the impression Redis uses LuaJIT - does not seem the case. Is this true ?
Can we use Redis and LuaJIT ? Could this be a problem when running and processing lots of data,
executing Lua within Redis ?

--
Stefan Parvu <spa...@kronometrix.org>

Vladislav Manchev

unread,

Apr 8, 2015, 1:42:35 PM4/8/15

to openre...@googlegroups.com

Nope, Redis uses standard Lua as far as I know.

There was a fork that used LuaJIT, not sure if it's actively supported though.

As for the original question, I'm running a similar configuration for stats collection with some differences in how data is structured inside Redis. I'm definitely not experiencing any problems like the one you describe. I'll take a look and make a comparison to see what the differences between our cases are when I get a chance.

I'm wondering if you have a simple self-contained example of your code and if you have any stress tests to run against different code implementations you might be experimenting with? Also, did you ever run redis-benchmark to rule out any possible problems in the build on your specific OS? I'm running some OpenBSD and Linux boxes and never noticed such problems.

Best,

Vladislav

--
Stefan Parvu <spa...@kronometrix.org>

Stefan Parvu

unread,

Apr 8, 2015, 1:54:22 PM4/8/15

to openre...@googlegroups.com

> Nope, Redis uses standard Lua as far as I know.

I see. I have misunderstood. I had the impression Redis uses LuaJIT same
way as OpenResty. Thanks for pointer.

> As for the original question, I'm running a similar configuration for stats
> collection with some differences in how data is structured inside Redis.
> I'm definitely not experiencing any problems like the one you describe.
> I'll take a look and make a comparison to see what the differences between
> our cases are when I get a chance.
>

We will test same thing on Debian 7 amd64 but most likely this is our code executing
within Redis and making things slow. It is a bit complicated and cant simple reproduce
it without our numerical kernel processing done Lua.

Most likely our own bugs :)

--
Stefan Parvu <spa...@kronometrix.org>

Yichun Zhang (agentzh)

unread,

Apr 9, 2015, 3:51:16 PM4/9/15

to openresty-en

Hello!

On Wed, Apr 8, 2015 at 10:29 AM, Stefan Parvu wrote:
>> DB: http://www.kronometrix.org/bugs/redis.300ds.svg
>

Hmm, this flame graph looks interesting.

The hottest thing looks like the gettimeofday() calls. Not sure if
it's a real system call on FreeBSD. Nevertheless, it's expensive.
Taking more than 20% of the total CPU time in this sample. Maybe we
could save the hot gettimeofday() calls by caching the time inside
redis-server just like in nginx?

Also, the embedded Lua code can be optimized by avoiding creating new
Lua strings or Lua tables on hot Lua code paths (that is, Lua code
paths always exercised upon each redis command). GC is always
expensive and we'd better avoid it at any price ;)

Regards,
-agentzh

Yichun Zhang (agentzh)

unread,

Apr 9, 2015, 3:55:30 PM4/9/15

to openresty-en

Hello!

On Wed, Apr 8, 2015 at 10:42 AM, Vladislav Manchev wrote:
> Nope, Redis uses standard Lua as far as I know.
>
> There was a fork that used LuaJIT, not sure if it's actively supported
> though.
>

LuaJIT 2 is ABI-compatible with Lua 5.1. So as long as redis-server
embeds Lua 5.1, LuaJIT 2 should work out of the box with the official
redis-server.

Regards,
-agentzh

Yichun Zhang (agentzh)

unread,

Apr 9, 2015, 4:01:08 PM4/9/15

to openresty-en

Hello!

On Wed, Apr 8, 2015 at 4:03 AM, Stefan Parvu wrote:
>
> NGINX: http://www.kronometrix.org/bugs/nginx.300ds.svg
> DB: http://www.kronometrix.org/bugs/redis.300ds.svg
>

Your redis-server's on-CPU flame graph looks good but your nginx's
graph looks very wrong to me.

Seems like your nginx binary lacks debug symbols? You can try the
OpenResty bundle which ensures proper debug symbols in nginx, bundled
Lua libraries and nginx modules, as well as bundled LuaJIT by the
default configuration options.

Regards,
-agentzh

Lord Nynex

unread,

Apr 9, 2015, 4:02:15 PM4/9/15

to openre...@googlegroups.com

Hello,

I don't have time to dig through redis right now but I can tell you from experience this setup is not desirable. The server side lua script can horribly block the entire redis server which throws all other operations into a wait state. I notice openresty (no fault of openresty) is sensitive to this wait.

The gettimeofday calls are most likely because redis enforces 'max script execution time' so it tests frequently. It is important to note that redis does not kill the running script after this timer is reached.

Last I checked there are some modifications to the lua5.1 that is embedded in redis which makes it directly incompatible. Without doing any research I think this is a non-trivial amount of work.

Yichun Zhang (agentzh)

unread,

Apr 9, 2015, 4:03:57 PM4/9/15

to openresty-en

Hello!

On Wed, Apr 8, 2015 at 1:37 AM, Lord Nynex wrote:
> My issue seems related to the amount of data I'm reading and the frequency with
> which I'm reading it. Using agentzh's systemtap tools I've come to the conclusion that
> my CPU usage is terrible because of the amount of time ngx_lua is spending reading
> data off the redis socket.

Mind to share you detailed results with my systemtap tools? I'm
particularly interested in seeing and interpreting flame graphs, be it
on-CPU and/or off-CPU ;) It's fun and beneficial (to everyone). Maybe
we can find low hanging fruits to work on :)

Best regards,
-agentzh

Yichun Zhang (agentzh)

unread,

Apr 9, 2015, 4:08:14 PM4/9/15

to openresty-en

Hello!

On Thu, Apr 9, 2015 at 1:02 PM, Lord Nynex wrote:
> I don't have time to dig through redis right now but I can tell you from
> experience this setup is not desirable. The server side lua script can
> horribly block the entire redis server which throws all other operations
> into a wait state. I notice openresty (no fault of openresty) is sensitive
> to this wait.
>

Yes, it requires special care when embedding Lua code into the redis
server. If done right, I think it's still *worth* the effort. But
according to your statements below, it seems that the redis-server's
Lua embedding implementation is VERY suboptimal (but it still does not
invalidate the basic Lua embedding idea entirely).

> The gettimeofday calls are most likely because redis enforces 'max script
> execution time' so it tests frequently. It is important to note that redis
> does not kill the running script after this timer is reached.
>

This is unfortunate and I hope we have an option to disable that.
Otherwise Lua in redis-server is doomed to be VERY SLOW especially on
systems where gettimeofday() is a real syscall.

> Last I checked there are some modifications to the lua5.1 that is embedded
> in redis which makes it directly incompatible. Without doing any research I
> think this is a non-trivial amount of work.
>

Oh, that's really unfortunate. I feel sorry about redis-server.

Regards,
-agentzh

Lord Nynex

unread,

Apr 9, 2015, 4:26:03 PM4/9/15

to openre...@googlegroups.com

This is about a week old.

The conclusion I reached was the volume of data I was sending back and forth to redis was too large. I've been re factoring the data structure to cut down on the amount of data needing to be read.

sample-bt.svg

sample-bt-off-cpu.svg

Yichun Zhang (agentzh)

unread,

Apr 9, 2015, 5:21:15 PM4/9/15

to openresty-en

Hello!

On Thu, Apr 9, 2015 at 1:26 PM, Lord Nynex wrote:
> This is about a week old.
>
> The conclusion I reached was the volume of data I was sending back and forth
> to redis was too large. I've been re factoring the data structure to cut
> down on the amount of data needing to be read.
>

From the off-CPU flame graph, it seems that your nginx processes are
CPU-bound, which is a good thing. But your off-CPU flame graph also
suggests a bad thing, that is, quite some frames are pure CPU
computations like those lj_BC_XXX frames which are just LuaJIT VM
interpreting bytecode instructions. This is usually a sign of massive
process preemption enforced by the kernel process scheduler. Maybe
your nginx workers are competing CPU time with each other and/or your
nginx workers are competing CPU with other processes like
redis-server? A simple improvement is to set CPU affinity
appropriately and bind each busy processes to dedicated CPU core(s) to
avoid such meaningless context switches (which is expensive).

Your on-CPU flame graph shows a more important issue. Your Lua code in
nginx is the dominating bottleneck. Most of your Lua code is not JIT
compiled because of all those lj_BC_XXX frames which belong to the
LuaJIT interpreter. I suggest focusing on JITting as much as your Lua
code as possible on the hot Lua code paths, which can make a big
difference. Making more Lua code JIT compiled is usually about working
around NYI primitives listed here: http://wiki.luajit.org/NYI
CloudFlare has been sponsoring Mike Pall to reduce the NYI list in
LuaJIT 2.1 (which is the default in recent OpenResty bundle releases).
Use the jit.v and/or jit.dump Lua modules shipped with the default
LuaJIT installation can help analyze what NYIs are in your way (these
Lua modules are also used to implement the luajit -jv command-line
option).

In particular, I see the lj_BC_ITERN frame which is for executing the
ITERN bytecode while iterating the hash-table-like Lua tables via
pairs() or something like that. Avoid iterating hash-tables whenever
possible. It cannot be JIT compiled (even it will be in the future,
iterating a hash-table can never be really cheap because hash-table is
not designed for iteration operations anyway). We usually use a flat
array of key-value pairs in the same hash-table specifically for
iterations, or just use a flat array exclusively if we don't need to
or seldom want to index values by keys.

BTW, generating a Lua-land flame graph with the lj-lua-stacks tool can
further help nailing down the "hot Lua code paths":

https://github.com/openresty/stapxx#lj-lua-stacks

We may find some surprises and more low hanging fruits with this new
perspective :)

Hope this helps!

Best regards,
-agentzh

Lord Nynex

unread,

Apr 9, 2015, 6:19:04 PM4/9/15

to openre...@googlegroups.com

Thanks for your suggestions. I believe CPU affinity will certainly help.

I don't want to send the lua land flame graph because it reveals some sensitive information. (we are under a cloudflare mutual NDA if you want to look at it in private) This project basically just finds intersections between flat tables. { "aaa", "bbb", "ccc" } intersected against { "ccc", "ddd", "eee" }. But there are hundreds of tables.

Right now the biggest culprit is I'm using Moses (https://github.com/Yonaba/Moses) intersect() (https://github.com/Yonaba/Moses/blob/master/moses.lua#L919) in a pairs() loop. This is a huge problem in the flame graph. The intersect function returns a bunch of anonymous iterator functions which appear to be quite expensive.

In this case, the pairs() loops are unavoidable by design. Is there any performance win I can get for fast intersection? Would it be worth implementing in C and passing two tables to an FFI func?

Best regards,
-agentzh

Yichun Zhang (agentzh)

unread,

Apr 9, 2015, 6:25:36 PM4/9/15

to openresty-en

Hello!

On Thu, Apr 9, 2015 at 3:19 PM, Lord Nynex wrote:
> I don't want to send the lua land flame graph because it reveals some
> sensitive information. (we are under a cloudflare mutual NDA if you want to
> look at it in private) This project basically just finds intersections
> between flat tables. { "aaa", "bbb", "ccc" } intersected against { "ccc",
> "ddd", "eee" }. But there are hundreds of tables.
>

Creating many such short-lived little Lua tables put pressure on
LuaJIT's GC, which should be eliminated.

> Right now the biggest culprit is I'm using Moses
> (https://github.com/Yonaba/Moses) intersect()
> (https://github.com/Yonaba/Moses/blob/master/moses.lua#L919) in a pairs()
> loop. This is a huge problem in the flame graph. The intersect function
> returns a bunch of anonymous iterator functions which appear to be quite
> expensive.
>

Yes, anonymous functions cannot be JIT compiled (yet) and also
requires dynamic allocations (again, GC burden) :)

> In this case, the pairs() loops are unavoidable by design. Is there any
> performance win I can get for fast intersection? Would it be worth
> implementing in C and passing two tables to an FFI func?
>

If we can avoid short-lived tables (and other GC objects) altogether,
then it will make a huge difference :)

I'm not sure about your detailed requirements but the rule of thumb is
to avoid using Lua for very complicated and GC-intensive computations.
Such computations better go onto the C land if feasible. And yeah, FFI
is your friend :) Well, just my 2 cents :)

Regards,
-agentzh

Stefan Parvu

unread,

Apr 10, 2015, 2:09:51 AM4/10/15

to openre...@googlegroups.com

Thanks all for answers.

> I don't have time to dig through redis right now but I can tell you from
> experience this setup is not desirable. The server side lua script can
> horribly block the entire redis server which throws all other operations
> into a wait state. I notice openresty (no fault of openresty) is sensitive
> to this wait.

yes, I think this is whats going on in our case: a considerable slow down caused by
Lua execution within Redis. The execution pipeline was designed something like:

local agg_function_l = string.lower(agg_function)
if agg_function_l == 'min' then
-- HSETIFLOWER
red:eval("local c = tonumber(redis.call('hget', KEYS[1], KEYS[2])); if c then if tonumber(ARGV[1]) < c then return redis.call('hset', KEYS[1], KEYS[2], ARGV[1]) else return 0 end else return redis.call('hset', KEYS[1], KEYS[2], ARGV[1]) end", 2, key, timestamp, value)
elseif agg_function_l == 'max' then
-- HSETIFHIGHER
red:eval("local c = tonumber(redis.call('hget', KEYS[1], KEYS[2])); if c then if tonumber(ARGV[1]) > c then return redis.call('hset', KEYS[1], KEYS[2], ARGV[1]) else return 0 end else return redis.call('hset', KEYS[1], KEYS[2], ARGV[1]) end", 2, key, timestamp, value)
elseif agg_function_l == 'sum' then
red:hincrbyfloat(key, timestamp, value)
elseif agg_function_l == 'count' then
red:hincrby(key, timestamp, 1)
elseif agg_function_l == 'sumsq' then
red:hincrbyfloat(key, timestamp, value*value)
elseif agg_function_l == 'last' then
red:hset(key, timestamp, value)
end

which needs to be re-written.

> The gettimeofday calls are most likely because redis enforces 'max script
> execution time' so it tests frequently. It is important to note that redis
> does not kill the running script after this timer is reached.

> > The hottest thing looks like the gettimeofday() calls. Not sure if
> > it's a real system call on FreeBSD. Nevertheless, it's expensive.
> > Taking more than 20% of the total CPU time in this sample. Maybe we
> > could save the hot gettimeofday() calls by caching the time inside
> > redis-server just like in nginx?

yes, somehow here is the trick and I think it is caused in this case by the Lua-Redis
combination, running within. We need to experiment and fix our code to see
if this will re-occur. In general gettimeofday() should not be hot but it is in our case.

Thanks for pointers and comments,

--
Stefan Parvu <spa...@kronometrix.org>

Reply all

Reply to author

Forward