Hello everyone,
Some report on my experiments on HPPC-RT enhanced with fastutil novelties, using my home-brewed JMH test suite :
https://github.com/vsonnier/hppc/tree/0.65.x-RT-release/hppcrt-benchmarks/src/main/java/com/carrotsearch/hppcrt/jmhChanges :
-- Add partial unrolling in contains() / get() calls (not put(), too messy)
-- Adopt PhiMix from scrambling anything primitive because of some clear performance gain (which includes
IdentityHash and CustomHash) but on the other hand PhiMix appears bad on (not-so-good)Objects,
as revealed by the hash quality = BAD (= int value, stripped of 6 bits of info) and/or RAND_LINEAR/HIGHBITS use cases.
Execution
times can plumet to almost an order of magnitude than normal, while
with Murmurhash3 it stays put. This is clearly visible is HPPC_OBJ vs.
FASTUTIL_OBJ
(fastutil 6.61) cases.
Indeed if I replace Murmurhash3 by PhiMix in
HPPC code for Objects, I observe the same huge degradation as observed
in FASTUTIL_OBJ.
Even if the benchmark cases are exagerrated it
shows that performance has become much more dependent of Object good
behaviour than before, so it is a no-go for me.
So I kept Murmurhash3 to scramble Object.hashCode().
On the other hand, IdentityHash hashCode() is supposed to be as good as it could be, and
on CustomHash the hashCode() good quality entirely rests on the
shoulders of the implementor, so both are candidates for an
inferior-quality but faster re-hashing function so PhiMix is OK.
Thanks to all the tweaks, general performance improvements rank from 10 to 20 % compared to v0.67. :).
Some results on my machine: (i7-2630QM @2Gz, (Sandy Bridge), Win7x64, OracleJDK1.8.0.31, JMH v1.52)
HPPC-RT v0.68 HashContainers benchmark vs. WorldSome notes of the benchmarks principles :
- Tested targetSize = 6000000 , loadFactor = 0.75 : The whole test is 6h long...
- Make the hypothesis that the allocation scheme is power-of-two, and that the provided "preallocation size", of "expected size" in constructors means
that the hash map is suppose to accomodate the provided size without reallocations.
So far, HPPC* and Fastutil are assured to respect that.
Tests:
- Put: put "distribution" entries (primitive/object/identity ==> int) up to the actual load factor, that means for a true RANDOM distribution with all unique keys,
a target size = 6000 0000 in reality means pushing 6291456 keys to reach 0.75 on a 2**23 array.
- Contains: test done at constant size, with effective load factor 0.75: TRUE 100 % sucess, MIXED 50 % sucess, MOSTLY_FALSE, query a random int key.
- Remove : same as contains , but with a remove.
Distributions:
RANDOM : uniformely random, encompassing [ Integer.MIN_VALUE*0.5; Integer.MAX_VALUE*0.5]
RAND_LINEAR: increasing values by randomely chosen steps of [1,2,3] from [- 6291456 ; 2* 6291456 ] and cycling
HIGHBITS: This one is either Dawid's or Sebastiano's, ask them what it means :) but it plays havoc on some implementations.
About the results:
Int-Int: Well HPPC-RT is the fastest here, despite the 2 array approach. Maybe the thinest shell of abstractions and a small call depth plays well with the JIT.
Identity-Int: Koloboke; Fastutil, HPPC in the same score there
Object-Int: a more complicated mix , but GS 6.0.0 has some impressive scores sometimes. Its load factor is
fixed to 0.5, so maybe I should test the rest with 0.5 load factor too. Another 6h run !
A note about the Objets used:
They were especially crafted to have a "heavy" method cost, to simulate the fact that sometimes Objects are well, Objects
so that the JIT won't save us. For that, each call of strategies methods, or hashCode() / equals() has a call to JMH method: Balckhole.consumeCPU(16).
I'm especially intrested in limiting method calls, since HPPC-RT is used in much-much-much less favorable JVM that the Oracle one. (And no, I can't elaborate on that)
Well, that is my JMH "damn lies" benchmark vision, as biased as it could be !
See you !
Vincent