compressing streams using zstd and custom dictionaries

802 views
Skip to first unread message

Nick Farrell

unread,
Oct 13, 2019, 6:08:52 AM10/13/19
to Redis DB
I understand why redis itself doesn't have built-in compression, given the variable impact on performance. However, there is great potential to drastically reduce the memory requirements for certain types of data. I'd like to share my general approach which I'm rolling out internally, in case others can benefit from this pattern.

I can elaborate or perhaps provide some code snippets if there is interest, though it should be pretty easy to implement in whatever application code you have wrapping redis.

The simplest approach is of course to use a generic compression library to compress and decompress each item as it's accessed. For some scenarios this is all you need. However, our setup, using redis streams, we have payloads (of FHIR medical data) which are ~2kb each, and in spite of being json-encoded text, it only compresses to ~500 bytes. This is a win, but not game-changing.

Enter zstandard with its integrated dictionary support. We simply dump out our stream data into a directory of files, point zstd at it and build a training dictionary of ~100kb (which itself compresses down to ~15kb). Using this dictionary, we get an extra order of magnitude of compression, from ~500bytes down to ~50 bytes. This gives us 30-40 times the capacity, while still allowing individual entries to be manipulated in redis. 

If you have larger objects in your redis store, or your application doesn't have the spare CPU to (de)compress the data, this isn't for you. However, I expect that for most applications of redis streams it will provide a great benefit.

If there is anything else relating to this that's unclear, I'm happy to elaborate, but like most good ideas, it's quite simple to understand and implement.

Nick

Itamar Haber

unread,
Oct 14, 2019, 10:26:39 AM10/14/19
to redi...@googlegroups.com
Hi Nick,

That's a decent approach that can work very well in some use cases - perhaps you'd be interested in this effort although it is no longer maintained: https://github.com/chadnickbok/redis-zstd-module

Cheers,

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/ecb74feb-9a14-4af5-ad86-9bd9435741a9%40googlegroups.com.


--

Itamar Haber
Technicalist Evangely

Phone: +972.54.567.9692

Redis Labs

Stefano Fratini

unread,
Oct 15, 2019, 12:43:40 AM10/15/19
to Redis DB
Another approach which I have tested extensively is to use LZ4


LZ4 is a family of snappy-like compression algorithms that have very low CPU overhead in exchange for a 50/70% compression ratio.

They are usually employed internally by database systems to reduce the size of their disk footprint but multiple implementations.
Different programming languages exist and I am using the JS bindings and they perform flawlessly in node.js (https://github.com/pierrec/node-lz4)

I hope this helps.

Stefano

On Tuesday, October 15, 2019 at 1:26:39 AM UTC+11, Itamar Haber wrote:
Hi Nick,

That's a decent approach that can work very well in some use cases - perhaps you'd be interested in this effort although it is no longer maintained: https://github.com/chadnickbok/redis-zstd-module

Cheers,

On Sun, Oct 13, 2019 at 1:08 PM Nick Farrell <nichola...@gmail.com> wrote:
I understand why redis itself doesn't have built-in compression, given the variable impact on performance. However, there is great potential to drastically reduce the memory requirements for certain types of data. I'd like to share my general approach which I'm rolling out internally, in case others can benefit from this pattern.

I can elaborate or perhaps provide some code snippets if there is interest, though it should be pretty easy to implement in whatever application code you have wrapping redis.

The simplest approach is of course to use a generic compression library to compress and decompress each item as it's accessed. For some scenarios this is all you need. However, our setup, using redis streams, we have payloads (of FHIR medical data) which are ~2kb each, and in spite of being json-encoded text, it only compresses to ~500 bytes. This is a win, but not game-changing.

Enter zstandard with its integrated dictionary support. We simply dump out our stream data into a directory of files, point zstd at it and build a training dictionary of ~100kb (which itself compresses down to ~15kb). Using this dictionary, we get an extra order of magnitude of compression, from ~500bytes down to ~50 bytes. This gives us 30-40 times the capacity, while still allowing individual entries to be manipulated in redis. 

If you have larger objects in your redis store, or your application doesn't have the spare CPU to (de)compress the data, this isn't for you. However, I expect that for most applications of redis streams it will provide a great benefit.

If there is anything else relating to this that's unclear, I'm happy to elaborate, but like most good ideas, it's quite simple to understand and implement.

Nick

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages