blosc vs snappy

114 views
Skip to first unread message

Valentin Haenel

unread,
Jul 9, 2012, 2:18:27 PM7/9/12
to Blosc
Hi,

I stumbled upon snappy [1][2] today and decided to do some
back-of-the-envelope benchmarks:

>>> import numpy, snappy, blosc
>>> s = numpy.linspace(0,1,1e7).tostring()
10 loops, best of 3: 199 ms per loop
>>> %timeit br = blosc.compress(s, 8)
10 loops, best of 3: 32.8 ms per loop
>>> sr = snappy.compress(s)
>>> len(sr)/float(len(s))
>>> 0.9309533375
>>> br = blosc.compress(s, 8)
>>> len(br)/float(len(s))
>>> 0.0885326125

What do you think. Seems to good to be true?

V-

BTW: the python-snappy bindings have an interesting attempt to avoid copying the
result into a string.

[1]: http://code.google.com/p/snappy/
[2]: https://github.com/andrix/python-snappy

Valentin Haenel

unread,
Jul 9, 2012, 2:53:08 PM7/9/12
to Blosc
* Valentin Haenel <valenti...@gmx.de> [2012-07-09]:
> Hi,
>
> I stumbled upon snappy [1][2] today and decided to do some
> back-of-the-envelope benchmarks:
>
> >>> import numpy, snappy, blosc
> >>> s = numpy.linspace(0,1,1e7).tostring()

The following line was missing from the transscript.
>>> %timeit sr = snappy.compress(s)

Francesc Alted

unread,
Jul 10, 2012, 4:39:28 AM7/10/12
to bl...@googlegroups.com
On 7/9/12 8:18 PM, Valentin Haenel wrote:
> Hi,
>
> I stumbled upon snappy [1][2] today and decided to do some
> back-of-the-envelope benchmarks:
>
>>>> import numpy, snappy, blosc
>>>> s = numpy.linspace(0,1,1e7).tostring()
> 10 loops, best of 3: 199 ms per loop
>>>> %timeit br = blosc.compress(s, 8)
> 10 loops, best of 3: 32.8 ms per loop
>>>> sr = snappy.compress(s)
>>>> len(sr)/float(len(s))
>>>> 0.9309533375
>>>> br = blosc.compress(s, 8)
>>>> len(br)/float(len(s))
>>>> 0.0885326125
> What do you think. Seems to good to be true?

Well, this specific example is actually true. But that is expected, as
blosc is geared to compress binary data, and the linspace distribution
is actually pretty good for blosc. Compressing other data patters will
lead to different results (say, large strings).

>
> V-
>
> BTW: the python-snappy bindings have an interesting attempt to avoid copying the
> result into a string.
>
> [1]: http://code.google.com/p/snappy/
> [2]: https://github.com/andrix/python-snappy

Okay, that could be a nice source of ideas ;)

--
Francesc Alted

Reply all
Reply to author
Forward
0 new messages