lz4 compression rate and uncompress performance

991 views
Skip to first unread message

Robert Schneiders

unread,
Aug 25, 2014, 3:59:06 AM8/25/14
to lz...@googlegroups.com
Hi,

I evaluate lz4 compression schemes for simulation results (float arrays,
triangulations). Compared to compression with zlib, I found that the
decompression is about 5 times higher. However, the compressed data
is 60% (lza_compress) to 100% (lza_compressHC) larger.

Our data is partitionaned into 64 blocks which are compressed separetely.
Would a larger block size improve the compression ratio? Is there an
optimal block size?

I will also try the blosc compression scheme.

Does anybody have recommendations on how to improve the compression ration?

Best regards,

  Robert

Yann Collet

unread,
Aug 25, 2014, 4:09:22 AM8/25/14
to lz...@googlegroups.com
> Our data is partitionaned into 64 blocks which are compressed separetely.

64 bytes blocks ? This would be much too small.


Would a larger block size improve the compression ratio?

Probably.
When compressing data using independent blocks, you can observe compression ratio gains up to 1 MB blocks.
Beyond that, benefits become less ans less visible.


Is there an optimal block size?

For independent blocks, 64KB is considered optimal "small" size.
Between 4 KB & 64 KB, it is "very small", but still manageable.
Anything below 4 KB is starting to miss a lost of compression opportunity. Ratio will plummet.


> Does anybody have recommendations on how to improve the compression ration?

Anytime you input data consists of 
tables with fixed length cells,
blosc should be tested : if offers big opportunities for compression savings.


Regards

Francesc Alted

unread,
Aug 25, 2014, 4:54:06 AM8/25/14
to lz...@googlegroups.com
El 25/08/14 a les 10:09, Yann Collet ha escrit:

> Our data is partitionaned into 64 blocks which are compressed separetely.

64 bytes blocks ? This would be much too small.


Would a larger block size improve the compression ratio?

Probably.
When compressing data using independent blocks, you can observe compression ratio gains up to 1 MB blocks.
Beyond that, benefits become less ans less visible.


Is there an optimal block size?

For independent blocks, 64KB is considered optimal "small" size.
Between 4 KB & 64 KB, it is "very small", but still manageable.
Anything below 4 KB is starting to miss a lost of compression opportunity. Ratio will plummet.

Yes, my experience completely confirms these figures too.




> Does anybody have recommendations on how to improve the compression ration?

Anytime you input data consists of 
tables with fixed length cells,
blosc should be tested : if offers big opportunities for compression savings.

Well, Blosc cannot only be applied to tables (series of heterogeneous types) but also to regular arrays (homogeneous type), and in fact, the latter is a better scenario for Blosc (or either using a column-oriented storage for tables, which is fine too).

Regarding optimal block sizes, Blosc chooses them automatically for you when you are passing blocks that are larger than 128 KB so that they are split down to blocksizes ranging from 16 KB (small compression levels) up to 256 KB (compression level 9).  For lz4hc and zlib compressors, these figures are additionally multiplied by 8 because they usually benefit of larger blocks.  However, the user can always enforce his own blocksize to Blosc with the `blosc_set_blocksize()` call:

https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h#L322

Also, although using multithreading typically improves speed, it can also harm compression ratios because blocks have to be further split down in order to allow the different threads do their work independently.

-- Francesc Alted

Robert Schneiders

unread,
Aug 25, 2014, 6:10:33 AM8/25/14
to lz...@googlegroups.com

We use 64k blocks, sorry for the error in my posting. I will change that
to 1MB and post the results.
Reply all
Reply to author
Forward
0 new messages