AUTOMATICALLY LZF compress TEXT columns longer than 20 characters

4 views
Skip to first unread message

Jak Sprats

unread,
Jan 25, 2011, 10:11:30 PM1/25/11
to redisql-dev
Hi All,

I have implemented logic that will AUTOMATICALLY LZF compress TEXT
columns longer than 20 characters. LZF compression is so fast, that
when build into AlchemyDB's architecture, it is basically for free (in
terms of REQ/s).

Using the texts of WarAndPeace that are magically available here:
http://allinram.info/AlchemyDB/WNP_text.txt

And running the script: test/TEST_WNP.lua
(e.g. cd test; lua TEST_WNP.lua)

And comparing the results of the characters in WNP_text.txt (3043532
bytes) w/ the size of the resulting "wnp" table (3193272 bytes)
represents a 5% INCREASE in size storing the text of WarAndPeace
indexed on line-number in AlchemyDB. LZF compresses each line by about
25% (turned it off, size = 3723281 bytes)

Doing 1million rows inserts w/ a text column 100 characters in length
has a 5% slowdown due to compression.
Doing 10million rows inserts w/ a text column 100 characters in length
has a 0% slowdown due to compression.

- Jak

Didier Spezia

unread,
Jan 28, 2011, 1:47:50 PM1/28/11
to redisql-dev
Hi Jak,

I also had good results with Marc Lehmann's LZF in the past.
I usually prefer it to LZO, since it is very convenient to embed
and reuse.

You may be interested in testing quicklz which is more recent
and provides very good performance.

http://www.quicklz.com/

Regards,
Didier.

Jak Sprats

unread,
Jan 28, 2011, 3:04:28 PM1/28/11
to redisql-dev
Hi Didier,

yeah quicklz is better. I chose LZF because it is BSD and it was
already in the code base.

If anyone needed quicklz, it would be pretty easy to add it in
(override two functions). The problem w/ GPL licenses, is if someone
wants to EMBED AlchemyDB into their product, they just have to get a
license from me (normal usage requires no license, but embedding [i.e.
reselling] does), but w/ quicklz, they would also need a license from
them.

On another topic, eventually I am going to add in bzipped text columns
and a few other mechanisms to retrieve columns in zipped form. Server
side compressed columns is nice for memory efficiency, but zipped
column-contents in tcp can reduce Network I/O significantly.

- Jak
Reply all
Reply to author
Forward
0 new messages