Hi Yann,
I have been trying r129 (aka 1.7.0) and I can confirm that the new codebase has speed-up my benchmarks by as much as 15%. I am using GCC 4.9.1 here, so I suppose this is due to the vector optimization recently introduced, which really makes a difference.
Now, I have been experimenting with the new LZ4_compress_fast() function and I must say that it can accelerate things by another 20% wrt new baseline (and getting less compression ratios, as expected).
However, I have seen that the compression ratio varies quite significantly depending on the acceleration parameter used. For example, my Blosc compressor uses compression levels from 1 (minimum compression) to 9 (maximum compression), and using this formula 'accel = (int)((1. / clevel) * 100) - 10' for computing the acceleration, I am getting this:
$ bench/bench lz4 single 4
Blosc version: 1.6.2.dev ($Date:: 2015-05-06 #$) List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib
Supported compression libraries:
BloscLZ: 1.0.4
LZ4: 1.7.0
Snappy: 1.1.1
Zlib: 1.2.8
Using compressor: lz4
Running suite: single
--> 4, 2097152, 8, 19, lz4
********************** Run info ******************************
Blosc version: 1.6.2.dev ($Date:: 2015-05-06 #$) Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 4
********************** Running benchmarks *********************
memcpy(write): 518.9 us, 3854.1 MB/s
memcpy(read): 248.2 us, 8058.7 MB/s
Compression level: 0
comp(write): 281.3 us, 7108.8 MB/s Final bytes: 2097168 Ratio: 1.00
decomp(read): 212.3 us, 9419.8 MB/s OK
Compression level: 1
comp(write): 418.8 us, 4775.4 MB/s Final bytes: 855696 Ratio: 2.45
decomp(read): 307.3 us, 6509.3 MB/s OK
Compression level: 2
comp(write): 362.5 us, 5516.5 MB/s Final bytes: 623504 Ratio: 3.36
decomp(read): 254.2 us, 7868.4 MB/s OK
Compression level: 3
comp(write): 372.4 us, 5370.9 MB/s Final bytes: 691856 Ratio: 3.03
decomp(read): 243.1 us, 8226.7 MB/s OK
Compression level: 4
comp(write): 332.5 us, 6014.6 MB/s Final bytes: 489528 Ratio: 4.28
decomp(read): 249.8 us, 8006.7 MB/s OK
Compression level: 5
comp(write): 385.3 us, 5191.1 MB/s Final bytes: 433104 Ratio: 4.84
decomp(read): 273.4 us, 7314.5 MB/s OK
Compression level: 6
comp(write): 477.9 us, 4184.9 MB/s Final bytes: 248764 Ratio: 8.43
decomp(read): 348.7 us, 5735.6 MB/s OK
Compression level: 7
comp(write): 563.1 us, 3551.7 MB/s Final bytes: 182880 Ratio: 11.47
decomp(read): 467.5 us, 4278.2 MB/s OK
Compression level: 8
comp(write): 615.0 us, 3252.1 MB/s Final bytes: 220464 Ratio: 9.51
decomp(read): 537.5 us, 3721.0 MB/s OK
Compression level: 9
comp(write): 603.1 us, 3316.3 MB/s Final bytes: 132154 Ratio: 15.87
decomp(read): 646.4 us, 3093.9 MB/s OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 3.4 s, 5010.3 MB/s
while using the ' LZ4_compress_default()' I am getting a much smoother increase in the compression ratio (due to Blosc assigning larger blocks to larger compression ratios). Here it is an example:
$ bench/bench lz4 single 4 2097152 8 19
Blosc version: 1.6.2.dev ($Date:: 2015-05-06 #$) List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib
Supported compression libraries:
BloscLZ: 1.0.4
LZ4: 1.7.0
Snappy: 1.1.1
Zlib: 1.2.8
Using compressor: lz4
Running suite: single
--> 4, 2097152, 8, 19, lz4
********************** Run info ******************************
Blosc version: 1.6.2.dev ($Date:: 2015-05-06 #$) Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 4
********************** Running benchmarks *********************
memcpy(write): 529.9 us, 3774.5 MB/s
memcpy(read): 244.8 us, 8171.4 MB/s
Compression level: 0
comp(write): 286.1 us, 6989.8 MB/s Final bytes: 2097168 Ratio: 1.00
decomp(read): 213.7 us, 9358.4 MB/s OK
Compression level: 1
comp(write): 661.8 us, 3021.9 MB/s Final bytes: 417200 Ratio: 5.03
decomp(read): 334.0 us, 5988.7 MB/s OK
Compression level: 2
comp(write): 592.7 us, 3374.6 MB/s Final bytes: 417200 Ratio: 5.03
decomp(read): 309.4 us, 6463.4 MB/s OK
Compression level: 3
comp(write): 589.1 us, 3395.0 MB/s Final bytes: 417200 Ratio: 5.03
decomp(read): 305.7 us, 6542.9 MB/s OK
Compression level: 4
comp(write): 544.4 us, 3674.1 MB/s Final bytes: 307168 Ratio: 6.83
decomp(read): 375.6 us, 5324.8 MB/s OK
Compression level: 5
comp(write): 539.7 us, 3705.6 MB/s Final bytes: 307168 Ratio: 6.83
decomp(read): 377.5 us, 5298.3 MB/s OK
Compression level: 6
comp(write): 535.1 us, 3737.8 MB/s Final bytes: 251108 Ratio: 8.35
decomp(read): 388.8 us, 5144.7 MB/s OK
Compression level: 7
comp(write): 615.8 us, 3247.9 MB/s Final bytes: 217632 Ratio: 9.64
decomp(read): 517.9 us, 3861.8 MB/s OK
Compression level: 8
comp(write): 597.8 us, 3345.4 MB/s Final bytes: 217632 Ratio: 9.64
decomp(read): 502.2 us, 3982.4 MB/s OK
Compression level: 9
comp(write): 602.2 us, 3321.3 MB/s Final bytes: 132154 Ratio: 15.87
decomp(read): 684.9 us, 2920.0 MB/s OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 4.0 s, 4224.9 MB/s
I suppose the variation in the compression ratio in new LZ4_compress_fast() is probably due to the new 'sampling' method you introduced, and that this is quite difficult to tune it to produce smoother compression ratio variations, but wanted to confirm.
At any rate, very nice improvements in r129 that I am very excited about and plan to incorporate in Blosc very soon.
Thanks!
--
Francesc Alted