Lua overhead compared to nontransactional pipelining overhead

2,029 views
Skip to first unread message

tw-bert

unread,
Apr 12, 2014, 8:50:15 PM4/12/14
to redi...@googlegroups.com
We're using Redis (amongst other things) to replicate data from a big logistics RDBMS, for LAN clients/apis as well as cloud services.
The RDBMS has at peak hours thousands of transactions per minute (sometimes per second).

Right now, I'm working on filling-up Redis with 'historic' data. This serves as a starting point for database CRUD actions, which also will be forwarded to Redis (and reverted when a RDBMS transaction is rolled back).

I've optimized various parts in this setup, hardware-wise as well as software-wise.
The RDBMS is on AIX7, the Redis 2.8.8 server is on Debian. Fibre 1Gbit in between, with failover. Both VM's, but on different containers (dedicated resources tho).

The route from the RDBMS is as follows (just to put it in context, no details here): 3rd party RDBMS client binary, custom C 'pythonbridge' shared object loaded, embedded cpython interpreter with multi-threading support and GIL control, data flows through named-pipe from 3rd party binary into secondary python thread, which streams it to Redis on the Debian server.
All working very fast and stable.

The benchmarks I do have a large scope, intentional. I want to focus on the bottlenecks in this complete data flow, involving all technologies and servers.
I made sure of good preconditions, like: the RDBMS has the data in cache, the servers are as separated from the LAN as possible, etc etc.

Each data row results in the following redis commands:
  1. HINCRBY
  2. ZADD

When dumping a small test set of about 7500 rows, I get the following end results:
  1. Using no pipelining, direct commands: ~10000ms
  2. Using redis-py nontransactional pipelining, flushed each 3000 rows (6000 commands) (@1): 1397ms
  3. Using evalsha (preloaded Lua script with 2 commands), on the pipeline, flushed each 3000 rows (3000 commands): 2247ms
  4. Using evalsha (preloaded Lua script), without pipelining, but accumulating the parameters for the (batch size) 3000*2 commands, serialize all parameter in msgpack format, call Lua once per batch, and letting the Lua script deserialize and execute them: 640ms
(@1): Note on redis-py StrictRedis.Pipeline object: this is not a streaming interface, 'only' a client-side buffer which uses the redis-server pipeline interface in bulk.
I recently had a discussion with Andy about this here: https://github.com/andymccurdy/redis-py/issues/451 

Seeing the numbers above, the choise should be easy: use the last one. But, ofcourse this isn't very good for concurreny performance. I use Lua here for performance optimization, and don't specifically need or want the transactional blocking characteristics of Lua execution (which are hard/impossible to avoid by the Redis engine).

For now, I will do a tradeoff: use batched multi-command Lua, but keep the batch size small enough to avoid hindering other clients. I'll probably use a size of around 500-1000 commands.
The short total execution time indicates a lesser overall overhead, which is actually good for concurrency performance.

What interests me, is where this big difference in performance comes from.
Is it the boxing/unboxing of data to and from Lua?
Can you think of something to improve this?

If you need some specific benchmark for me to run, please ask.

Kind regards, TW







tw-bert

unread,
Apr 12, 2014, 9:10:07 PM4/12/14
to redi...@googlegroups.com
Addendum to my previous post:

Occasional readers, who just glance over these benchmark results, might think "Oops, Lua has overhead".

To prevent users from shying away from Lua:
  1. This is a very specific scenario on a dedicated network, where everything else is optimized
  2. This scenario does nothing with return values. It's 'dump and forget'.
  3. Lua overhead is in most use cases neglectable, and gets you a performance gain, not a performance loss
Kind regards, TW

Josiah Carlson

unread,
Apr 13, 2014, 1:35:11 AM4/13/14
to redi...@googlegroups.com
See my reply inline :)

The performance difference is actually easy to explain. The msgpack format uses a more or less fixed-size prefix notation for storing its data, which makes unpacking it *very* fast. Once you get to Lua, all of your pieces that make redis.call()s pass in pre-parsed argument lists. But when you provide the commands via pipelining or non-msgpack'd data to Lua, it uses the Redis protocol.

Now, the Redis protocol is flexible, but the variable-sized human-readable number of arguments, length of each argument, and the sizes and numbers being suffixed by a CRLF pair make parsing the Redis commands noticeably slower than parsing msgpack. With every argument, that's one more number to read, find the CRLF pair, call atoi() to get a number, then read that number of bytes. Head to the msgpack.org site to see an example on the front page that highlights some basics. Toss in the modest but measurable difference in data size (possibly close to 10-20% less with msgpack, depending on your data), and you have a recipe for better performance with msgpack :)

Actually, though Redis has already gone from the legacy completely line-based protocol to the binary-safe Redis protocol, a person could make a fairly decent argument that switching to a msgpack-backed serialization of arguments would offer several advantages, among which being a reduction in command parsing overhead. The drawback obviously being that all clients would need to support it, though with most client libraries using hiredis as a parser already, injecting msgpack into the hiredis extension would provide at least some of that for fairly minimal cost.

 - Josiah

If you need some specific benchmark for me to run, please ask.

Kind regards, TW







--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

tw-bert

unread,
Apr 13, 2014, 2:32:48 AM4/13/14
to redi...@googlegroups.com
Thank again, replies again inline :)

On Sunday, 13 April 2014 07:35:11 UTC+2, Josiah Carlson wrote:
[snip]


The performance difference is actually easy to explain. The msgpack format uses a more or less fixed-size prefix notation for storing its data, which makes unpacking it *very* fast. Once you get to Lua, all of your pieces that make redis.call()s pass in pre-parsed argument lists. But when you provide the commands via pipelining or non-msgpack'd data to Lua, it uses the Redis protocol.

Now, the Redis protocol is flexible, but the variable-sized human-readable number of arguments, length of each argument, and the sizes and numbers being suffixed by a CRLF pair make parsing the Redis commands noticeably slower than parsing msgpack. With every argument, that's one more number to read, find the CRLF pair, call atoi() to get a number, then read that number of bytes. Head to the msgpack.org site to see an example on the front page that highlights some basics. Toss in the modest but measurable difference in data size (possibly close to 10-20% less with msgpack, depending on your data), and you have a recipe for better performance with msgpack :)

I happen to know the msgpack format almost by heart. I just wrote a tailormade C (cython) reserialization routine to stream directly from the native RDBMS format to MsgPack format. Python was too slow for that.

Thanks for your insights, it confirms what I thought was happening here.
 

Actually, though Redis has already gone from the legacy completely line-based protocol to the binary-safe Redis protocol, a person could make a fairly decent argument that switching to a msgpack-backed serialization of arguments would offer several advantages, among which being a reduction in command parsing overhead. The drawback obviously being that all clients would need to support it, though with most client libraries using hiredis as a parser already, injecting msgpack into the hiredis extension would provide at least some of that for fairly minimal cost.


Interesting idea.

On top of that, I'd suggest not streaming all parameters to the Lua script, but let the Lua script ask for it. The Lua script knows what it needs, and at what time it needs it. The msgpack data would be boxed to Lua on request, and deserialized when needed. As a sidenote, I would keep the implementation non-strict. For example, simple arrays are fast, and dict-like objects are readable and less error-prone. In our high-speed implementation, we seperate metadefinitions from data everywhere (meaning: we use arrays and positional combining).

On top of the on top, we might be able to skip the boxing altogether. Just let the msgpack parser read from memory. But better put some boundary checking in place to prevent the heartbleeds. ;)

One last Q: do you agree with my final verdict, "Use Lua-msgpack batching with a small batch size", being optimal for this use case?

Thanks again, TW

Josiah Carlson

unread,
Apr 13, 2014, 1:19:13 PM4/13/14
to redi...@googlegroups.com
You are quite welcome, your problem is interesting, and I sometimes have ideas on how to improve solutions, so I try to help :)

I agree 100% with your small-ish msgpack-to-Lua calls. Pick your desired typical latency + efficiency, and go with it :)

Also, that RDBMS -> Python + CPython extension -> Redis sounds pretty awesome. :)

 - Josiah



tw-bert

unread,
Apr 13, 2014, 1:48:37 PM4/13/14
to redi...@googlegroups.com


On Sunday, 13 April 2014 19:19:13 UTC+2, Josiah Carlson wrote:
You are quite welcome, your problem is interesting, and I sometimes have ideas on how to improve solutions, so I try to help :)

Same for me, whenever I can.
 

I agree 100% with your small-ish msgpack-to-Lua calls. Pick your desired typical latency + efficiency, and go with it :)

Great.
 

Also, that RDBMS -> Python + CPython extension -> Redis sounds pretty awesome. :)

Thanks for the compliment! Yes, that turned out to be a good decision. I initially started this reserialization code with pure python, but trying to do efficient memcopy's to structs was just too much work (I was only fighting with the python memory manager). I love python, but this specific task was easier done in C. And Cython is excellent to glue a py binary extension together (for CPython), I can heartly recommend it. Best of both worlds! Under NDA, but Inada Naoki's msgpack was a very good starting point (if you're interested: https://github.com/msgpack/msgpack-python ). 

Cheers, TW

Josiah Carlson

unread,
Apr 13, 2014, 2:49:15 PM4/13/14
to redi...@googlegroups.com
Just short of 10 years ago I was writing Pyrex wrappers for some search engine code, and participating in the Pyrex mailing list like I do here. Pyrex eventually got a successor, Cython. They are both great projects, and I'm turning into an old man ;)

 - Josiah

tw-bert

unread,
Apr 13, 2014, 3:25:11 PM4/13/14
to redi...@googlegroups.com

Just short of 10 years ago I was writing Pyrex wrappers for some search engine code, and participating in the Pyrex mailing list like I do here. Pyrex eventually got a successor, Cython. They are both great projects, and I'm turning into an old man ;)

 - Josiah


The more experienced you get, the bigger the comfort zone will be. What are we going to do with all that space?! ;)

 

Josiah Carlson

unread,
Apr 13, 2014, 3:28:12 PM4/13/14
to redi...@googlegroups.com
Put it all in memory with Redis :D

 - Josiah

Pepijn de Vos

unread,
Apr 28, 2014, 6:49:38 AM4/28/14
to redi...@googlegroups.com
I jumped over here from the redis-py issue.

I tried to replicate your EVALSHA setup, dumping 1000 rows * 3 commands via a msgpack blob.

On the Redis side this indeed seems to perform somewhat better, but Python can't keep up.
Packing the parameters is just to slow.

When pipelining individual rows, I kan get a sustained 70000 rows per second,
with EVALSHA I get only up to 20000, with the trace showing significant time spent in msgpack.

I experimented with using JSON instead, because Python has a very optimized JSON implementation.
This moved the CPU usage from the client to the server and did not improve speed.
It does show that serializing msgpack is the bottleneck here.

You talked about some C code to pack the arguments. Is that something that is easily reusable?
There does not appear to be a propper C based msgpack implementation.

Maybe I could build one using CFFI, but I'm not sure it's worth it.

Pepijn

tw-bert

unread,
Apr 28, 2014, 12:36:41 PM4/28/14
to redi...@googlegroups.com
Hi Pepijn, I'm not sure how much use the C code will be to you. On top of that, I'd have to talk to my superiors. Not open source as of now.

What it does, is serialize incoming RDBMS data (propriatary export format, but very simple ascii/utf8 based) directly to msgpack format.
Having all that RDBMS data converted to python objects first, only to use those python objects to convert back to msgpack, was way too slow.

I therefore made a simple but efficient reserialization mechanism, based on the Cython msgpack implementation by INADA Naoki. You can grab his code from github.

I split the streamed data in two parts:

1. A QSD string, which is a quick serialization definition, which looks somewhat like this:
#0[IisssDd#1[III]#2[#3[Is]]]
In this example, the data to be serialized is this:
level 0:
int64,int32,string,string,string,datetimetz,date
level 1:
array of three int64
level 2:
child group indicator
level 3:
N child records: int64,string

In python, several QSD's are present in a dict, and selected by incoming data on a stream (named pipe/fifo).
Then the incoming data is reserialized to msgpack by this QSD in one call per dataset/row. By using redis-py, this bytearray is stored in redis.
Several of these msgpack 'rows', which can be small datasets, are bundled in python, but now using INADA Naoki's msgpack module.
These are sent per 500 to a Lua script.

If it helps, I can put some of my QSD reserialization snippets in a gist, but now you have a better idea of what I'm doing to achieve high throughput.

One question: have you looked in the install log of the msgpack module? Inada implemented a pure-python failsafe. I recommend not to use that, the pyd/so binary is quite a bit more efficient.

Good luck, I'm following your efforts with great interest.

Pepijn de Vos

unread,
May 3, 2014, 4:26:30 AM5/3/14
to redi...@googlegroups.com
I'm definitely using the Python fallback, as I'm using PyPy for speed benefits in other areas.
Due to JIT compilation it's not quite as slow as CPython, but still slower than the C extension.
http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/

The current setup is fast enough, but not the fastest possible.
I currently use Pypredis to pipeline individual commands to 4 Redis instnces.
Maybe I'll have time for optimizations, maybe I'll get a new project.

Time permitting, I might either use CFFI to interface the C implementation of msgpack with PyPy, or follow your approach and translate the data source directly to msgpack in C.

My data source is a log file that I currently parse line by line into chunks of namedtuples. I could instead parse a bunch of lines directly to a big msgpack string.


Pepijn

On Sunday, April 13, 2014 2:50:15 AM UTC+2, tw-bert wrote:
Reply all
Reply to author
Forward
0 new messages