Using a normal cassandra-stress run, one can already see a (small)
improvement. With MD5:
Results:
op rate          : 16007 [READ:16007]
latency mean       : 6.2 [READ:6.2]
latency median      : 5.3 [READ:5.3]
latency 95th percentile  : 11.6 [READ:11.6]
latency 99th percentile  : 13.7 [READ:13.7]
latency 99.9th percentile : 16.0 [READ:16.0]
latency max        : 27.5 [READ:27.5]
Total partitions     : 10000000 [READ:10000000]
Total operation time   : 00:10:24
With xxHash:
Results:
op rate          : 17643 [READ:17643]
latency mean       : 5.7 [READ:5.7]
latency median      : 5.0 [READ:5.0]
latency 95th percentile  : 10.0 [READ:10.0]
latency 99th percentile  : 12.0 [READ:12.0]
latency 99.9th percentile : 14.1 [READ:14.1]
latency max        : 26.5 [READ:26.5]
Total partitions     : 10000000 [READ:10000000]
Total operation time   : 00:09:26
Fixes #2884
Also in:
  Also in:
  g...@github.com:duarten/scylla.git xxhash/v1
  https://github.com/duarten/scylla/tree/xxhash/v1
Sad note: I wish I knew how to configure cassandra-stress to use larger
values.
--
2.15.0
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20171130230038.31318-1-duarte%40scylladb.com.
For more options, visit https://groups.google.com/d/optout.
+Â Â }
+
+Â Â uint64_t finalize() {
+Â Â Â Â return XXH64_digest(_state.get());
+Â Â }
+};
--
2.15.0
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20171130230038.31318-5-duarte%40scylladb.com.
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/680202ca-4015-918f-7702-35ee1a4520b5%40scylladb.com.
Any particular reason why xxHash64 was chosen and not something else (CityHash looks quite good as well, FarmHash even better if SSE4.2 is available which we do require anyway)?
Why xxhash and not others?
[1] says "xxHash64 wins at larger (0.5KB+) data sizes, followed closely by 64 bit FarmHash and CityHash"
[2] says "So the fastest hash functions on x86_64 without quality problems are:
falkhash (macho64 and elf64 nasm only, with HW AES extension)
t1ha + mum (machine specific, mum: different arch results)
FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
Metro (but not 64crc yet, WIP)
Spooky32
xxHash64
fasthash
City (deprecated)"
--
2.15.0
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20171130230038.31318-8-duarte%40scylladb.com.
[1] says "xxHash64 wins at larger (0.5KB+) data sizes, followed closely by 64 bit FarmHash and CityHash"
This is exactly what doesn't matter for us. For large blobs we will use precomputed hash so while we need good speed for that as well it is better to have the fastest algorithm for small values and "just" fast one for the big blobs. Another option would be to use different algorithms depending on the value size, but that will require exposing that threshold in the cluster, probably not a big problem though.
Â
[2] says "So the fastest hash functions on x86_64 without quality problems are:
falkhash (macho64 and elf64 nasm only, with HW AES extension)
t1ha + mum (machine specific, mum: different arch results)
FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
Metro (but not 64crc yet, WIP)
Spooky32
xxHash64
fasthash
City (deprecated)"
I don't have much confidence in any of these results because the values of MB/s and cyclces/hash doesn't seem to make any sense (or a proper explaination is missing, either way, I'm reluctant to trust such data). CityHash is "deprecated" in a way that FarmHash is its successor. FarmHash seems to work on other platforms as well though it is not as fast (no idea how big the difference is). Metro looks quite promising as well.Anyway, these links were probably the places were we should start research, not end it.
[1] says "xxHash64 wins at larger (0.5KB+) data sizes, followed closely by 64 bit FarmHash and CityHash"
This is exactly what doesn't matter for us. For large blobs we will use precomputed hash so while we need good speed for that as well it is better to have the fastest algorithm for small values and "just" fast one for the big blobs. Another option would be to use different algorithms depending on the value size, but that will require exposing that threshold in the cluster, probably not a big problem though.
The latency hits this aims to solve are due to big values. For low values, latency will be dominated by something else. I don't think it's a good idea to be adding cluster features to define hash thresholds.Â
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/803d9ae6-c15f-dc80-68bb-d4e1696a9830%40scylladb.com.
[1] says "xxHash64 wins at larger (0.5KB+) data sizes, followed closely by 64 bit FarmHash and CityHash"
This is exactly what doesn't matter for us. For large blobs we will use precomputed hash so while we need good speed for that as well it is better to have the fastest algorithm for small values and "just" fast one for the big blobs. Another option would be to use different algorithms depending on the value size, but that will require exposing that threshold in the cluster, probably not a big problem though.
The latency hits this aims to solve are due to big values. For low values, latency will be dominated by something else. I don't think it's a good idea to be adding cluster features to define hash thresholds.Â
Â
[2] says "So the fastest hash functions on x86_64 without quality problems are:
falkhash (macho64 and elf64 nasm only, with HW AES extension)
t1ha + mum (machine specific, mum: different arch results)
FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
Metro (but not 64crc yet, WIP)
Spooky32
xxHash64
fasthash
City (deprecated)"
I don't have much confidence in any of these results because the values of MB/s and cyclces/hash doesn't seem to make any sense (or a proper explaination is missing, either way, I'm reluctant to trust such data). CityHash is "deprecated" in a way that FarmHash is its successor. FarmHash seems to work on other platforms as well though it is not as fast (no idea how big the difference is). Metro looks quite promising as well.Anyway, these links were probably the places were we should start research, not end it.
Metro hash seems stuck in 2015. When I google, xxHash seemed to have good adoption (supported in projects like OpenHFT). There's SeaHash too, which gets bonus points for being in rust.
Anyway, I don't really have the cycles to embark on a hash function research project.
[1] says "xxHash64 wins at larger (0.5KB+) data sizes, followed closely by 64 bit FarmHash and CityHash"
This is exactly what doesn't matter for us. For large blobs we will use precomputed hash so while we need good speed for that as well it is better to have the fastest algorithm for small values and "just" fast one for the big blobs. Another option would be to use different algorithms depending on the value size, but that will require exposing that threshold in the cluster, probably not a big problem though.
The latency hits this aims to solve are due to big values. For low values, latency will be dominated by something else. I don't think it's a good idea to be adding cluster features to define hash thresholds.Â
I'd expect large blobs to be I/O and network bound, but perhaps there's a sour spot between small blobs and large blobs where the hash dominates.
Â
[1] says "xxHash64 wins at larger (0.5KB+) data sizes, followed closely by 64 bit FarmHash and CityHash"
This is exactly what doesn't matter for us. For large blobs we will use precomputed hash so while we need good speed for that as well it is better to have the fastest algorithm for small values and "just" fast one for the big blobs. Another option would be to use different algorithms depending on the value size, but that will require exposing that threshold in the cluster, probably not a big problem though.
The latency hits this aims to solve are due to big values. For low values, latency will be dominated by something else. I don't think it's a good idea to be adding cluster features to define hash thresholds.Â
If the big values are the main motivation of this patchset I'm surprised it doesn't include caching the hash.We don't need agree on the thresholds at run-time. We just cannot easily change them (same way we cannot easily change the hash algorithm).
Â
Â
[2] says "So the fastest hash functions on x86_64 without quality problems are:
falkhash (macho64 and elf64 nasm only, with HW AES extension)
t1ha + mum (machine specific, mum: different arch results)
FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
Metro (but not 64crc yet, WIP)
Spooky32
xxHash64
fasthash
City (deprecated)"
I don't have much confidence in any of these results because the values of MB/s and cyclces/hash doesn't seem to make any sense (or a proper explaination is missing, either way, I'm reluctant to trust such data). CityHash is "deprecated" in a way that FarmHash is its successor. FarmHash seems to work on other platforms as well though it is not as fast (no idea how big the difference is). Metro looks quite promising as well.Anyway, these links were probably the places were we should start research, not end it.
Metro hash seems stuck in 2015. When I google, xxHash seemed to have good adoption (supported in projects like OpenHFT). There's SeaHash too, which gets bonus points for being in rust.
What does it mean "stuck in 2015"?
Â
Anyway, I don't really have the cycles to embark on a hash function research project.
Well, you already did.
+Â Â Â Â return *this;
+Â Â }
+
+Â Â void update(const char* ptr, size_t length) {
+Â Â Â Â XXH64_update(_state.get(), ptr, length);
+Â Â }
+
+Â Â uint64_t finalize() {
+Â Â Â Â return XXH64_digest(_state.get());
+Â Â }
+};
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/325cecef-8556-1107-447a-cbca7bd60803%40scylladb.com.
Using a normal cassandra-stress run, one can already see a (small)
improvement. With MD5:
Results:
op rate          : 16007 [READ:16007]
latency mean       : 6.2 [READ:6.2]
latency median      : 5.3 [READ:5.3]
latency 95th percentile  : 11.6 [READ:11.6]
latency 99th percentile  : 13.7 [READ:13.7]
latency 99.9th percentile : 16.0 [READ:16.0]
latency max        : 27.5 [READ:27.5]
Total partitions     : 10000000 [READ:10000000]
Total operation time   : 00:10:24
With xxHash:
Results:
op rate          : 17643 [READ:17643]
latency mean       : 5.7 [READ:5.7]
latency median      : 5.0 [READ:5.0]
latency 95th percentile  : 10.0 [READ:10.0]
latency 99th percentile  : 12.0 [READ:12.0]
latency 99.9th percentile : 14.1 [READ:14.1]
latency max        : 26.5 [READ:26.5]
Total partitions     : 10000000 [READ:10000000]
Total operation time   : 00:09:26
Fixes #2884
xxhash 1024
INFOÂ 2018-01-11 13:24:25,534 [shard 0] repair - repair 1 on shard 0 stats: round_nr=2004, rpc_call_nr=2018, tx_hashes_nr=0, rx_hashes_nr=0, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 2099034960}, {127.0.0.2, 2099034960}}, row_from_disk_nr={{127.0.0.1, 474680}, {127.0.0.2, 474680}}, duration=14.08
seconds, row_from_disk_bytes_per_sec={{127.0.0.1, 142.173}, {127.0.0.2, 142.173}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 33713.1}, {127.0.0.2, 33713.1}} Rows/s
fnv1a_hasher 1024
INFOÂ Â 2018-01-11 13:28:17,192 [shard 0] repair - repair 1 on shard 0 stats: round_nr=2004, rpc_call_nr=2018, tx_hashes_nr=0, rx_hashes_nr=0, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 2099034960}, {127.0.0.2, 2099034960}}, row_from_disk_nr={{127.0.0.1, 474680}, {127.0.0.2, 474680}}, duration=14.004 seconds, row_from_disk_bytes_per_sec={{127.0.0.1, 142.945}, {127.0.0.2, 142.945}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 33896}, {127.0.0.2, 33896}} Rows/s
xxhash 4096
INFOÂ 2018-01-11 13:25:10,052 [shard 0] repair - repair 2 on shard 0 stats: round_nr=1598, rpc_call_nr=1612, tx_hashes_nr=0, rx_hashes_nr=0, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 1671000000}, {127.0.0.2, 1671000000}}, row_from_disk_nr={{127.0.0.1, 100000}, {127.0.0.2, 100000}}, duration=11.875 seconds, row_from_disk_bytes_per_sec={{127.0.0.1, 134.197}, {127.0.0.2, 134.197}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 8421.05}, {127.0.0.2, 8421.05}} Rows/s
fnv1a_hasher 4096
INFOÂ 2018-01-11 13:28:59,011 [shard 0] repair - repair 2 on shard 0 stats: round_nr=1598, rpc_call_nr=1612, tx_hashes_nr=0, rx_hashes_nr=0, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 1671000000}, {127.0.0.2, 1671000000}}, row_from_disk_nr={{127.0.0.1, 100000}, {127.0.0.2, 100000}}, duration=11.829 seconds, row_from_disk_bytes_per_sec={{127.0.0.1, 134.719}, {127.0.0.2, 134.719}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 8453.8}, {127.0.0.2, 8453.8}} Rows/s
Also in:
  Also in:
  g...@github.com:duarten/scylla.git xxhash/v1
  https://github.com/duarten/scylla/tree/xxhash/v1
Sad note: I wish I knew how to configure cassandra-stress to use larger
values.
Duarte Nunes (15):
 Add xxhash (fast non-cryptographic hash) as submodule
 configure.py: Build xxhash
 CMakeLists: Add xxhash directory
 digest: Introduce xxHash hash algorithm
 digest_algorithm: Add xxHash option
 md5_hasher: Extract hash size
 query: Add class to encapsulate digest algorithm
 query-result: Introduce class digester
 query-result: Use digester instead of md5_hasher
 storage_proxy: Extract decision about digest algorithm to use
 message/messaging_service: Specify algorithm when requesting digest
 service/storage_service: Add and use xxhash feature
 schema: Remove unneeded include
 tests/mutation_test: Test xx_hasher alongside md5_hasher
 tests/mutation_test: Use xxHash instead of MD5 for some tests
 configure.py         | 48 +++++++++---
 database.hh         |  5 +-
 digest_algorithm.hh     |  5 +-
 digester.hh         | 173 +++++++++++++++++++++++++++++++++++++++++++
 idl/query.idl.hh       |  3 +-
 md5_hasher.hh        |  8 +-
 message/messaging_service.hh |  4 +-
 mutation.hh         |  4 +-
 mutation_query.hh      |  2 +-
 query-result-writer.hh    | 17 +++--
 query-result.hh       | 18 ++++-
 service/storage_proxy.hh   |  3 +-
 service/storage_service.hh  |  6 ++
 xx_hasher.hh         | 63 ++++++++++++++++
 database.cc         | 16 ++--
 message/messaging_service.cc |  6 +-
 mutation.cc         |  8 +-
 mutation_partition.cc    |  2 +-
 mutation_query.cc      |  4 +-
 query-result-set.cc     |  2 +-
 schema.cc          |  1 -
 service/storage_proxy.cc   | 56 +++++++-------
 service/storage_service.cc  |  3 +
 tests/database_test.cc    |  6 +-
 tests/memory_footprint.cc  |  2 +-
 tests/mutation_test.cc    | 44 ++++++-----
 .gitmodules         |  3 +
 CMakeLists.txt        |  1 +
 xxHash            |  1 +
 29 files changed, 410 insertions(+), 104 deletions(-)
 create mode 100644 digester.hh
 create mode 100644 xx_hasher.hh
 create mode 160000 xxHash
--
2.15.0
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/20171130230038.31318-1-duarte%40scylladb.com.
For more options, visit https://groups.google.com/d/optout.
nohash 1024
INFOÂ 2018-01-12 09:31:59,503 [shard 0] repair - repair 1 on shard 0 stats: round_nr=2004, rpc_call_nr=2018, tx_hashes_nr=0, rx_hashes_nr=0, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 2099034960}, {127.0.0.2, 2099034960}}, row_from_disk_nr={{127.0.0.1, 474680}, {127.0.0.2, 474680}}, duration=10.935 seconds, row_from_disk_bytes_per_sec={{127.0.0.1, 183.063}, {127.0.0.2, 183.063}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 43409.2}, {127.0.0.2, 43409.2}} Rows/s
nohash 4096
INFOÂ 2018-01-12 09:33:28,596 [shard 0] repair - repair 2 on shard 0 stats: round_nr=1598, rpc_call_nr=1612, tx_hashes_nr=0, rx_hashes_nr=0, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 1671000000}, {127.0.0.2, 1671000000}}, row_from_disk_nr={{127.0.0.1, 100000}, {127.0.0.2, 100000}}, duration=11.372 seconds, row_from_disk_bytes_per_sec={{127.0.0.1, 140.133}, {127.0.0.2, 140.133}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 8793.53}, {127.0.0.2, 8793.53}} Rows/s