predicting the lz4 compression ratio?

341 views
Skip to first unread message

stei...@scionics.de

unread,
Oct 18, 2016, 3:02:50 AM10/18/16
to LZ4c
Hi there,

I have a simple and maybe ignorant/naive question (my apologies for that): given a block of data (say a char buffer/array of size N characters), is there a way to predict or estimate the compression ratio that lz4/zstd will give me (assuming some quality level)? 

I already looked at MSE and shannon entropy, but both have their flaws and are either not applicable (MSE of a sorted and unsorted array maybe equal, but a sorted array can be compressed much better than an unsorted) and/or too slow (byte value based shannon entropy ignores the fact that lz4 is dictionary based AFAIK). With lz4 I'd assume that I could try to generate the dictionary of the buffer/array and use it to predict the compression ratio. the question is how? I saw that the lz4 API yields functions to save the dictionary to disc, but how do I store the dictionary in a new char buffer?

What I am eventually after is to compare the compressability of 2 arrays of equal/different size at runtime and then only compress the one that yields the higher compression ratio, while running lz4_fast on the other one.

I'd be grateful for your comments,
Peter


Cyan

unread,
Oct 18, 2016, 11:32:29 AM10/18/16
to LZ4c
I'm afraid there is no simple solution.
You could try to compress a smaller data section and try to derive an estimation from this,
but even that will be wrong if the data compressibility vary depending on section...

Peter Steinbach

unread,
Oct 21, 2016, 6:49:30 PM10/21/16
to lz...@googlegroups.com
For that very reason, I was wondering if I could generate a dictionary
for a given buffer and then come up with some kind of shannon entropy
based on the dictionary (rather than array element values). This would
be directly correlated with the yielded compression ratio, AFAIK.

Looking at the code, I see that saving the dictionary is possible:
https://github.com/lz4/lz4/blob/master/lib/lz4hc.c#L652
just wondering: would that code there be thread-safe?

Reply all
Reply to author
Forward
0 new messages