Hi there,
I have a simple and maybe ignorant/naive question (my apologies for that): given a block of data (say a char buffer/array of size N characters), is there a way to predict or estimate the compression ratio that lz4/zstd will give me (assuming some quality level)?
I already looked at MSE and shannon entropy, but both have their flaws and are either not applicable (MSE of a sorted and unsorted array maybe equal, but a sorted array can be compressed much better than an unsorted) and/or too slow (byte value based shannon entropy ignores the fact that lz4 is dictionary based AFAIK). With lz4 I'd assume that I could try to generate the dictionary of the buffer/array and use it to predict the compression ratio. the question is how? I saw that the lz4 API yields functions to save the dictionary to disc, but how do I store the dictionary in a new char buffer?
What I am eventually after is to compare the compressability of 2 arrays of equal/different size at runtime and then only compress the one that yields the higher compression ratio, while running lz4_fast on the other one.
I'd be grateful for your comments,
Peter