HPPC-RT performance improvements based on Fastutil 6.6.1 / Koloboke

Vincent Sonnier

unread,

Feb 8, 2015, 5:52:12 AM2/8/15

to java-high-performance...@googlegroups.com

Hello everyone,

Some report on my experiments on HPPC-RT enhanced with fastutil novelties, using my home-brewed JMH test suite :

https://github.com/vsonnier/hppc/tree/0.65.x-RT-release/hppcrt-benchmarks/src/main/java/com/carrotsearch/hppcrt/jmh

Changes :
-- Add partial unrolling in contains() / get() calls (not put(), too messy)
-- Adopt PhiMix from scrambling anything primitive because of some clear performance gain (which includes IdentityHash and CustomHash) but on the other hand PhiMix appears bad on (not-so-good)Objects, as revealed by the hash quality = BAD (= int value, stripped of 6 bits of info) and/or RAND_LINEAR/HIGHBITS use cases.

Execution times can plumet to almost an order of magnitude than normal, while with Murmurhash3 it stays put. This is clearly visible is HPPC_OBJ vs. FASTUTIL_OBJ (fastutil 6.61) cases.
Indeed if I replace Murmurhash3 by PhiMix in HPPC code for Objects, I observe the same huge degradation as observed in FASTUTIL_OBJ.

Even if the benchmark cases are exagerrated it shows that performance has become much more dependent of Object good behaviour than before, so it is a no-go for me.

So I kept Murmurhash3 to scramble Object.hashCode(). On the other hand, IdentityHash hashCode() is supposed to be as good as it could be, and on CustomHash the hashCode() good quality entirely rests on the shoulders of the implementor, so both are candidates for an inferior-quality but faster re-hashing function so PhiMix is OK.

Thanks to all the tweaks, general performance improvements rank from 10 to 20 % compared to v0.67. :).

Some results on my machine: (i7-2630QM @2Gz, (Sandy Bridge), Win7x64, OracleJDK1.8.0.31, JMH v1.52)
HPPC-RT v0.68 HashContainers benchmark vs. World

Some notes of the benchmarks principles :

- Tested targetSize = 6000000 , loadFactor = 0.75 : The whole test is 6h long...

- Make the hypothesis that the allocation scheme is power-of-two, and that the provided "preallocation size", of "expected size" in constructors means
that the hash map is suppose to accomodate the provided size without reallocations.
So far, HPPC* and Fastutil are assured to respect that.

Tests:
- Put: put "distribution" entries (primitive/object/identity ==> int) up to the actual load factor, that means for a true RANDOM distribution with all unique keys,
a target size = 6000 0000 in reality means pushing 6291456 keys to reach 0.75 on a 2**23 array.
- Contains: test done at constant size, with effective load factor 0.75: TRUE 100 % sucess, MIXED 50 % sucess, MOSTLY_FALSE, query a random int key.
- Remove : same as contains , but with a remove.

Distributions:
RANDOM : uniformely random, encompassing [ Integer.MIN_VALUE*0.5; Integer.MAX_VALUE*0.5]
RAND_LINEAR: increasing values by randomely chosen steps of [1,2,3] from [- 6291456 ; 2* 6291456 ] and cycling
HIGHBITS: This one is either Dawid's or Sebastiano's, ask them what it means :) but it plays havoc on some implementations.

About the results:
Int-Int: Well HPPC-RT is the fastest here, despite the 2 array approach. Maybe the thinest shell of abstractions and a small call depth plays well with the JIT.
Identity-Int: Koloboke; Fastutil, HPPC in the same score there

Object-Int: a more complicated mix , but GS 6.0.0 has some impressive scores sometimes. Its load factor is
fixed to 0.5, so maybe I should test the rest with 0.5 load factor too. Another 6h run !

A note about the Objets used:
They were especially crafted to have a "heavy" method cost, to simulate the fact that sometimes Objects are well, Objects
so that the JIT won't save us. For that, each call of strategies methods, or hashCode() / equals() has a call to JMH method: Balckhole.consumeCPU(16).

I'm especially intrested in limiting method calls, since HPPC-RT is used in much-much-much less favorable JVM that the Oracle one. (And no, I can't elaborate on that)

Well, that is my JMH "damn lies" benchmark vision, as biased as it could be !
See you !

Vincent

Vincent Sonnier

unread,

Feb 9, 2015, 2:58:07 PM2/9/15

to java-high-performance...@googlegroups.com

Ha Ha !
Tested against others with a load factor of 0.5. (see) This time, HPPC-RT is quite consitently behind Fastutil / Koloboke, except in BAD hash scenarios, where Fastutil suffers especially.

Still, this is not the end. This load factor 0.5 favors the "partially unrolled" fastpaths, since the array is more sparse than before. Well, on HPPC-RT for readability
reasons, I didn't simply cut-n-paste the Fastutil one-liners for that, but decomposed the steps.
Apparently at bytecode level the Fastutil one liners look indeed smarter than the other ones, go figure.

i.e that kind of thing:
// The starting point.
if ( ( curr = key[ pos = ( it.unimi.dsi.fastutil.HashCommon.phiMix( (k) ) ) & mask ] ) == (0) ) return false;
if ( ( (k) == (curr) ) ) return true;
// There's always an unused entry.
while( true ) {
if ( ( curr = key[ pos = ( pos + 1 ) & mask ] ) == (0) ) return false;
....

So next step I'm going to test such kind of patterns on HPPC-RT and benchmark once more :)

Vincent

Dawid Weiss

unread,

Feb 9, 2015, 4:03:26 PM2/9/15

to java-high-performance-primitive-collections

You're absolutely relentless, Vincent ;) Thanks for posting these
updates -- I'm reading them and I've been learning from them, but I
just haven't had the time to look into updating HPPC yet. There is a
bunch of changes I would like to introduce (not necessarily
performance-related) and I plan to do it in one batch. Your results
are stimulating, thanks!

Dawid

> --
> You received this message because you are subscribed to the Google Groups
> "High Performance Primitive Collections for Java" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> java-high-performance-primi...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Vincent Sonnier

unread,

Feb 26, 2015, 4:35:08 AM2/26/15

to java-high-performance...@googlegroups.com

Hello everybody,
I'm back with new benchmark data here, based on HPPC-RT v0.68-SNAPSHOT improvements.
Nothing changed much really. Using fastutil "fast-paths" I'm now closer to it than before in factor 0.5, sometimes even faster, while I dominate with factor v0.75.

Something to note, the incredible speed of remove() = TRUE case for lots of libs : Fastutil in particular, Koloboke, Javolution, even plain Java !

As for HPPC-RT goes, I'm satisfied with it's robustness and execution time consitency against the various provided kinds of inputs, so I stop investigating the pure-speed area
for now.
So, HPPC-RT v0.68 is fairely complete and may be well released in due time. Maybe some more Unit tests for coverage, who knows.

Regards,

Vincent

Reply all

Reply to author

Forward