I like how in this medium people can talk behind your back and in your face at the same time! :P
I actually invested about 2 weeks (both at work AND at home), experimenting with MANY different options of storing and retrieving data in redis, using all structure-types, using both generic (procedural-generated) data, and our own real-world data. It started out as a pet-project, but it mushroomed into a very detailed and flexible "py-redis-benchmarking tool", which I have every intention of sharing on github - I think it's over a 1k-loc already... You basically tell it which benchmak-combination(s) you wish to run, and it prints the results in a nicely-organized table. If you choose to use the procedurally-generated data (for synthetic-benchmarking) you can define each of the 3 dimensions it has (keys, records, fields), to see how each effect each redis-storing-option (lists, sets, hashes, etc.). So you can get a feel for how "scale" behaves as a factor of influence on the benefits/trade-offs of each storage-option. I think I will add graph-plotting for IPython, just for the fun of it...
In conclusion:
A major performance-factor is the number of round-trips to redis, so I employed heavy-use of "pipeline", But it turns out that another major-performance-factor after that, is the manipulations that need to happen to the data in python, on pre-storing and post-retrival, in order to fit the data into the redis-structures. Turns out, that - at least for bulk-store/retrival (pipeline-usage), the overheads of fitting a data structure into redis, outweighs the benefits, sometimes by orders of magnitude. Perhaps if an application is written to use redis as a database, it would be worth it, as interjecting into a specific value "nested" inside a redis-structure "may" be faster than having to pull an entire "key" with serialized data - but that's not the use-case we're talking about for "caching" in web2py.
So, the *tl;dr;* version of it, is:
"Flat key-value store of serialized data is fastest for bulk-store/retrieval"
* Especially when using "hiredis" (python-wrapper around a "c-compiled" redis-client - That's orders-of-magnitudes faster...)
Then I went to testing many serialization formats/libraries:
- JSON (pure-python)
- simplejson (with "c-compiled" optimizations)
- cjson (a "c-compiled" library w/Python-wrapper)
- ujson (a "c-compiled" library w/Python-wrapper)
- pickle (pure-python)
- cPickle (a "c-compiled" library w/Python-wrapper)
- msgpack (with "c-compiled" optimizations)
- u-msgpack (pure-python)
- marshal
Results:
- all pure-python options are slowest (unsurprising)
- simplejson is almost as fast as cjson when used with c-compiled-optimization, and is more maintained, so no use for cjson.
- cPickle is almost as fast as marshal, and is platform/version agnostic, so no use for marshal.
- ujson is only faster than simplejson for very long (and flat) lists, and is less maintained/popular/mature.
So, that leaves us with:
- simplejson
- cPickle
- msgpack
- cPickle is actually "slowest", AND is python-only.
- With either simplejson or msgpack, you can read the data from redis from non-python clients AND they both (surprisingly) handle unicode really well..
- msgpack is roughly x2 faster than simplejson, but is less-readable in a redis-GUI.
However:
When using simplejson or msgpack. once you introduce "DateTime" values, you need to process the results in python by interjecting into the parsers with hooks... Once you do that, all the performance-gain nullifies...
So cPickle becomes fastest, as it generates the python "DateTime" objects in the c-level...
So I ended-up where I started, rounding a full-circle back to flat-keys with cPickle...
The only benefit I ended-up gaining, is by re-factoring our high-level cache-data-structure, on-top of redis_cache.py, that does bulk-retrival and smart-refreshes - but I'm not sure I can share that code...
We are now doing a bulk-get of our entire redis-cache on every request.
It has over 100 keys, some very small and some with hundreds of nested-records. We narrowed it down to 16ms per-request (best-case), which is good enough for me.
We basically have a class in a module, which instanciates a non-thread-local singleton, once per-process. It has an ordered-dictionary of "keys" mapped to "lambdas". We call it the "cache-catalog". The results are stored in a regular dictionary (which is thread-local), which maps the keys to their respective resultant-value. On each request, a bulk-get is issued with a list of all the keys (which we already have - it's the keys of the catalog + the "w2py:<app>:" prefix, so we don't even need to have them stored in redis in a separate set... And we still don't have to use the infamous "GET KEYS" redis-command...), and since the catalog is an ordered-dictionary, we know which value in the result maps to which key. So we know the "None" values represent the keys that are currently "missing" in redis, due to a deletion triggered by a cache-update on another request/thread/process. So we get a list of "missing keys", that we just run through in a regular for-loop, generating new values using the regular cache-mechanism (which triggers the lambdas) - so we only update what's missing.
This turns out to be extremely efficient, fast and resilient.
I suggest this approach would be factored-into the redis_cache.py file itself somehow...
Not sure I can share that code though... (legally...)
Anyways, hope this somes-up the topic, and hope some people learned something from this summary of my experience.
If not, hey, what do I know, I'm just an "idea guy" after all, right? :P
I'll be posting a link to the git-repo of the benchmark-code in a few days, after I clean it up a bit...