If you need some specific benchmark for me to run, please ask.Kind regards, TW
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
[snip]
The performance difference is actually easy to explain. The msgpack format uses a more or less fixed-size prefix notation for storing its data, which makes unpacking it *very* fast. Once you get to Lua, all of your pieces that make redis.call()s pass in pre-parsed argument lists. But when you provide the commands via pipelining or non-msgpack'd data to Lua, it uses the Redis protocol.Now, the Redis protocol is flexible, but the variable-sized human-readable number of arguments, length of each argument, and the sizes and numbers being suffixed by a CRLF pair make parsing the Redis commands noticeably slower than parsing msgpack. With every argument, that's one more number to read, find the CRLF pair, call atoi() to get a number, then read that number of bytes. Head to the msgpack.org site to see an example on the front page that highlights some basics. Toss in the modest but measurable difference in data size (possibly close to 10-20% less with msgpack, depending on your data), and you have a recipe for better performance with msgpack :)
Actually, though Redis has already gone from the legacy completely line-based protocol to the binary-safe Redis protocol, a person could make a fairly decent argument that switching to a msgpack-backed serialization of arguments would offer several advantages, among which being a reduction in command parsing overhead. The drawback obviously being that all clients would need to support it, though with most client libraries using hiredis as a parser already, injecting msgpack into the hiredis extension would provide at least some of that for fairly minimal cost.
You are quite welcome, your problem is interesting, and I sometimes have ideas on how to improve solutions, so I try to help :)
I agree 100% with your small-ish msgpack-to-Lua calls. Pick your desired typical latency + efficiency, and go with it :)
Also, that RDBMS -> Python + CPython extension -> Redis sounds pretty awesome. :)
Just short of 10 years ago I was writing Pyrex wrappers for some search engine code, and participating in the Pyrex mailing list like I do here. Pyrex eventually got a successor, Cython. They are both great projects, and I'm turning into an old man ;)- Josiah
What it does, is serialize incoming RDBMS data (propriatary export format, but very simple ascii/utf8 based) directly to msgpack format.
Having all that RDBMS data converted to python objects first, only to use those python objects to convert back to msgpack, was way too slow.
I therefore made a simple but efficient reserialization mechanism, based on the Cython msgpack implementation by INADA Naoki. You can grab his code from github.
I split the streamed data in two parts:
1. A QSD string, which is a quick serialization definition, which looks somewhat like this:
#0[IisssDd#1[III]#2[#3[Is]]]
In this example, the data to be serialized is this:
level 0:
int64,int32,string,string,string,datetimetz,date
level 1:
array of three int64
level 2:
child group indicator
level 3:
N child records: int64,string
In python, several QSD's are present in a dict, and selected by incoming data on a stream (named pipe/fifo).
Then the incoming data is reserialized to msgpack by this QSD in one call per dataset/row. By using redis-py, this bytearray is stored in redis.
Several of these msgpack 'rows', which can be small datasets, are bundled in python, but now using INADA Naoki's msgpack module.
These are sent per 500 to a Lua script.
If it helps, I can put some of my QSD reserialization snippets in a gist, but now you have a better idea of what I'm doing to achieve high throughput.
One question: have you looked in the install log of the msgpack module? Inada implemented a pure-python failsafe. I recommend not to use that, the pyd/so binary is quite a bit more efficient.
Good luck, I'm following your efforts with great interest.