xxhash compared to cpu crc32c

3,951 views
Skip to first unread message

Roger Pack

unread,
Dec 2, 2013, 3:21:53 PM12/2/13
to lz...@googlegroups.com
Hello.  Just for sake of completeness, it might be nice to list on the xxhash comparison page:
the speed of "raw line" and/or compared with CPU crc32, just so that people understand better.
Thank you.
-roger-

Roger Pack

unread,
Dec 2, 2013, 3:36:13 PM12/2/13
to lz...@googlegroups.com
Also am I correct in presuming that the comparison with MurmurHash3 was it generating a 32-bit value? (it would also be nice/interesting for xxhash to be able to optionally generate a larger value than 32 bits...some people for some reason prefer it, and it causes them to use MurmurHash instead...)

Yann Collet

unread,
Dec 2, 2013, 3:44:09 PM12/2/13
to lz...@googlegroups.com
Hi Roger

Actually, there is such a line :
CRC32           0.43 GB/s     9

It uses the CRC32 implementation provided within SMHasher package.

Is there anything else you were referring about ?

Yann Collet

unread,
Dec 2, 2013, 3:47:17 PM12/2/13
to lz...@googlegroups.com
Yes, you are correct. Only 32-bits were compared.

I've been working in the past on a 64-bits version.
I had 2 (incompatible) versions, one using 32-bits arithmetic, and the other using 64-bits arithmetic.
They were targeting different CPU.

In the end, I simply did not found enough time to complete them.

A concrete use case would help, in order to focus on some tangible objective.

Roger Pack

unread,
Dec 2, 2013, 3:47:45 PM12/2/13
to lz...@googlegroups.com
On Mon, Dec 2, 2013 at 1:44 PM, Yann Collet <yann.co...@gmail.com> wrote:
> Hi Roger
>
> Actually, there is such a line :
>
> CRC32 0.43 GB/s 9
>
>
> It uses the CRC32 implementation provided within SMHasher package.

Was it this one?
https://code.google.com/p/smhasher/source/browse/trunk/crc.cpp
Anyway, I was referring to just using the "all hardware" crc32c cpu
instruction, I think mentioned here:
http://en.wikipedia.org/wiki/SSE4#SSE4.2
On a newer cpu that supports it.
If that makes sense (if that's what you used then great).

Roger Pack

unread,
Dec 2, 2013, 3:49:35 PM12/2/13
to lz...@googlegroups.com
On Mon, Dec 2, 2013 at 1:47 PM, Yann Collet <yann.co...@gmail.com> wrote:
> Yes, you are correct. Only 32-bits were compared.
>
> I've been working in the past on a 64-bits version.
> I had 2 (incompatible) versions, one using 32-bits arithmetic, and the other
> using 64-bits arithmetic.
> They were targeting different CPU.
>
> In the end, I simply did not found enough time to complete them.
>
> A concrete use case would help, in order to focus on some tangible
> objective.

Yeah I don't have one, more just theoretical. Anybody else out there
do have one here?

Yann Collet

unread,
Dec 2, 2013, 4:18:36 PM12/2/13
to lz...@googlegroups.com
Yes, most likely

Anyway, I was referring to just using the "all hardware" crc32c cpu
> instruction, I think mentioned here:

Unfortunately, this hardware instruction is not available on the Core 2 Duo, which was the platform on which the benchmark was done. It would be necessary to do a new bench using a newer Intel CPU.
Also : Only very latest AMD CPU have hardware CRC32C implemented.
Also : CRC32C "à la Intel" is not available on ARM CPU

All these reasons made me shy of using "hardware CRC32C" as a comparison metric : my primary goal was a portable hash, for error checking. An "Intel only" (and only for newer processors!) was not fitting that bill back then.

Martin Traverso

unread,
Feb 6, 2014, 12:44:06 AM2/6/14
to lz...@googlegroups.com

I've been working in the past on a 64-bits version.
[...]
A concrete use case would help, in order to focus on some tangible objective.


I would *love* to see a 64-bit version. I wrote an implementation of the HyperLogLog algorithm for estimating cardinalities of large data sets, which requires a good hash to get reasonable answers. The largest cardinality it can estimate without introducing unacceptable error is tied to the number of bits in the hash (in the order of 2^k). For my use case, 32 bits is not enough, but 64 would do. So far, I've been using Murmur3-128 and using the first 64 bits from the result. Not ideal.

Martin

Yann Collet

unread,
Jul 20, 2014, 11:29:38 AM7/20/14
to lz...@googlegroups.com
Ok, it's an old thread, but for the completion :

I would *love* to see a 64-bit version

There is now a 64-bits version :


and it indeed performs faster and better (on 64-bits hardware, that is)

Martin Traverso

unread,
Jul 21, 2014, 8:51:56 PM7/21/14
to lz...@googlegroups.com

There is now a 64-bits version :


and it indeed performs faster and better (on 64-bits hardware, that is)

Great! Thanks!

BTW, I wrote a native Java implementation of the algorithm: https://github.com/airlift/slice/commit/bc86c85ad72c7b6d6eb49b944136c4bb32144b4a

Martin
 


Le jeudi 6 février 2014 06:44:06 UTC+1, Martin Traverso a écrit :

I've been working in the past on a 64-bits version.
[...]
A concrete use case would help, in order to focus on some tangible objective.


I would *love* to see a 64-bit version. I wrote an implementation of the HyperLogLog algorithm for estimating cardinalities of large data sets, which requires a good hash to get reasonable answers. The largest cardinality it can estimate without introducing unacceptable error is tied to the number of bits in the hash (in the order of 2^k). For my use case, 32 bits is not enough, but 64 would do. So far, I've been using Murmur3-128 and using the first 64 bits from the result. Not ideal.

Martin


Le lundi 2 décembre 2013 21:36:13 UTC+1, Roger Pack a écrit :
Also am I correct in presuming that the comparison with MurmurHash3 was it generating a 32-bit value? (it would also be nice/interesting for xxhash to be able to optionally generate a larger value than 32 bits...some people for some reason prefer it, and it causes them to use MurmurHash instead...)

On Monday, December 2, 2013 1:21:53 PM UTC-7, Roger Pack wrote:
Hello.  Just for sake of completeness, it might be nice to list on the xxhash comparison page:
the speed of "raw line" and/or compared with CPU crc32, just so that people understand better.
Thank you.
-roger-

--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "LZ4c".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse lz4c+uns...@googlegroups.com.
Pour obtenir davantage d'options, consultez la page https://groups.google.com/d/optout.

Yann Collet

unread,
Jul 22, 2014, 5:57:06 AM7/22/14
to lz...@googlegroups.com
Looks very great Martin !

Do you intend to make it a stand-alone library ?
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse lz4c+unsubscribe@googlegroups.com.

Martin Traverso

unread,
Jul 22, 2014, 4:57:46 PM7/22/14
to lz...@googlegroups.com

Do you intend to make it a stand-alone library ?

The code is in "slice" (https://github.com/airlift/slice), which is already a very small stand-alone library for manipulating byte buffers in Java (as a better alternative to Java's built-in ByteBuffer implementation), so it probably doesn't make sense to split it up further at this point.

We're now using this implementation in Presto (http://prestodb.io).

Martin

Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse lz4c+uns...@googlegroups.com.

Yann Collet

unread,
Jul 23, 2014, 6:00:19 AM7/23/14
to lz...@googlegroups.com
Great use case, thanks for notification !

Roger Pack

unread,
Jul 31, 2014, 6:57:24 PM7/31/14
to lz...@googlegroups.com


On Sunday, July 20, 2014 9:29:38 AM UTC-6, Yann Collet wrote:
Ok, it's an old thread, but for the completion :

I would *love* to see a 64-bit version

There is now a 64-bits version :


and it indeed performs faster and better (on 64-bits hardware, that is)


Wow, with that speed increase, it makes one think that if you need a 32-bit hash, but are on 64-bit hardware, you should just use XXH64 and truncate the result to 32-bit, since its faster :)
I wonder if LZ4 could use it for its hashes for a speed improvement? Also wonder if LZ4 could somehow use [I'm sure it already does, but throwing this out] 64-bit operations to become speedier itself [i.e. "whoa, going to 64 bits doubled the throughput of xxhash? wonder if it's possible to double lz4 itself..." but just wondering].
Cheers!
-roger-

Yann Collet

unread,
Aug 1, 2014, 5:30:29 AM8/1/14
to lz...@googlegroups.com
> Wow, with that speed increase, it makes one think that if you need a 32-bit hash, but are on 64-bit hardware, you should just use XXH64 and truncate the result to 32-bit, since its faster :)

Indeed, if you are sure to work in 64-bits mode, it's the way to go.


> I wonder if LZ4 could use it for its hashes for a speed improvement? 

Nope. LZ4 works on a variety of platforms, and the hash acts as a "weak signature", for error detection. It needs to be recalculated on all target platform, and a lot of them are 32-bits. Forcing 64-bits mode for all of them would make the hash difficult to calculate for the weakest ones.


Also wonder if LZ4 could somehow use [I'm sure it already does, but throwing this out] 64-bit operations to become speedier itself [i.e. "whoa, going to 64 bits doubled the throughput of xxhash? wonder if it's possible to double lz4 itself..." but just wondering].

Indeed, LZ4 is already doing it.
LZ4 uses size_t, which is automatically 32-bits on 32-bits applications, and 64-bits on 64-bits applications.
It does help a bit (64-bits is faster), but not to the point of reaching 2X. In contrast with xxHash, which just streams data as fast as memory can provide it, LZ4 still has some logic (branch, tests ...) which cost a majority of the processing time.


Regards

Y.

Roger Pack

unread,
Aug 15, 2014, 12:14:15 PM8/15/14
to lz...@googlegroups.com


On Friday, August 1, 2014 3:30:29 AM UTC-6, Yann Collet wrote:
> Wow, with that speed increase, it makes one think that if you need a 32-bit hash, but are on 64-bit hardware, you should just use XXH64 and truncate the result to 32-bit, since its faster :)

Indeed, if you are sure to work in 64-bits mode, it's the way to go.


OK, just for chatting purporses, I noticed on this page

"CityHash64 v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes
Murmur2 (64-bit)  6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes"

[somehow 'er other cityhash was able to process 64 bytes relatively cheaply compared to 8].  Wonder if something like that would be possible in this regard [XXH64] as well, though I know nothing about it  ... :]
-roger-

Yann Collet

unread,
Aug 15, 2014, 12:25:52 PM8/15/14
to lz...@googlegroups.com
There is non-linear "initialization time", even within xxxhash,
partly due to the fact that final avalanche stage is the same, whether it is for 1 byte, or for 1000 bytes.

Consequently, 1 byte hash has a non negligible time, compared to 8, 16 or 64 bytes.
You would observe the same using xxhash.
Reply all
Reply to author
Forward
0 new messages