Hello everyone,
It's
here.
I've finally decided to release on Maven a stable version, labelled v0.6.5. Compared to the previous snapshot v0.6.0
there is not so much novelties in API but rather refactorings and internal changes :
A) Some repackaging was done to use com.carrotsearch.hpprt.[sets|lists|maps|heaps|strategies..] for the sake of housekeeping,
B) Strategies were extracted from the Hash containers to create their own CustomHash classes, (à la fastutil) so that we keep the maximum perfomance
for the most frequently used containers.
C) Much experiments this summer as lead to adopt Robin Hood hashing as self-reorganizing algorithm for the containers using either Objects or CustomHashs.
First, I heard of it from
Cliff Moon, who submitted a pull request to the regular HPPC,
then I read this
http://sebastiansylvan.com/2013/05/08/robin-hood-hashing-should-be-your-default-hash-table-implementation/ which is indeed
the best consise explanation I've found about Robin Hood hashing,
Then I stumbled upon a very deep analysis by Emmanuel Goossaert (
http://codecapsule.com/tag/robin-hood/) which demonstrate experimentally
the very low variance in various access patterns of this design. (lots of drawings, graphics, great series of articles in my opinion)
Then of course, I've benchmarked my own implementation adding over Cliff Moon's a caching of hash values.
Someone said once "there are lies, damn lies, and benchmarks" but still Robin Hood proved usefull showing its expected consistent behaviour even
in poor hash qualities and most of all in case of lookups on absent keys where traditional linear lookup times skyrocket, but where by construction RH
has a fast bailout test before actually reaching an empty slot.
All of this have a cost, in such that put() is slower than before but the overrall operations looks smoother (if a fraction slower) which is actually what counts in a realtime environnement.
However, due to the pure raw power of CPU caching and prefetching in a brute force search, RH is not so intresting for pure primitive containers and wasn't enabled for them in v0.6.5.
D) As optimization, I removed the perturbation of hash containers (introdued to fix
HPPC-80) replacing them by an iterator going "in reverse" of the buffer.
Once again benchmarks (a.k.a lies) showed that we actually re-gain the performance of the non-perturbated containers in the general case, still keeping the putAll() as fast as it was with the perturbated version.
The strange thing is that the self-resizing methods of the hash containers expandAndXXX() actually already did traverse the buffer in reverse to rebuild the expanded container, showing that it was probably found interesting to do so...
And yes I finally get into the trouble of releasing it on Maven for everybody:
<dependency>
<groupId>com.github.vsonnier</groupId>
<artifactId>hppcrt</artifactId>
<version>0.6.5</version>
</dependency> |
|
Regards,
Vincent