Koloboke (another primitive collections library)

Dawid Weiss

unread,

Oct 8, 2014, 2:59:06 AM10/8/14

to java-high-performance-primitive-collections

Just FYI, perhaps useful for folks.

There is a another library for primitive collections called Koloboke
(by Roman Leventov) which implements support for JUC compatibility (at
various JDK levels). The benchmarks Roman published show it's also
faster than anything else out there -- which I found interesting.
Indeed, there's a clever trick with a randomly selected sentinel
element on maps and sets, which gets rid of the occupied slots array.
This way one saves on memory accesses on get/ put (and some memory
itself because the array is not needed).

I'm thinking of adding arrays with an explicit sentinel element too,
actually -- they seem useful enough when you know the space domain of
your keys.

Dawid

Vincent Sonnier

unread,

Oct 8, 2014, 3:06:34 PM10/8/14

to java-high-performance...@googlegroups.com

Hello Dawid,

This looks interesting indeed. But, as far as I can see, there is even no need for an explicit sentinel value.

For instance, would a get(K) would not be just this:

if (K == defaultValue) {
// special case, use an explicit allocatedDefaultValue flag to test existence...etc.
} else {
// since all keys are unique in a map, defaultValue in the key array means allocated = false, else
//every other value indeed mean allocated = true. So effectively the allocated[] array looks no longer needed that way ?

}

Unlees I missed something, it would be an implementation detail of the existing maps/sets, without special constraints on the key domain.

Vincent

Dawid Weiss

unread,

Oct 8, 2014, 4:19:43 PM10/8/14

to java-high-performance-primitive-collections

Hi Vincent,

Clever! This is actually how null is handled in certain JDK maps as
far as I can remember (it's a special-case key). Indeed, for primitive
types you could designate, for example 0 to be that special key;
zero-check is actually a good candidate as it doesn't need an argument
on most architectures.

If you care to provide a patch (or test the idea!) it'd be great -- I
have plenty of other work and temporarily won't be able to go back to
it.

Dawid

> --
> You received this message because you are subscribed to the Google Groups
> "High Performance Primitive Collections for Java" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> java-high-performance-primi...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Vincent Sonnier

unread,

Oct 9, 2014, 4:35:37 AM10/9/14

to java-high-performance...@googlegroups.com

Still I wonder if testing for a "real value" vs. testing boolean is that interesting, except for the fact
there is only one array. When you test for insertion, you test for allocated[] = true until finding a false.
In such process, you would test "real values", lets say doubles, that are :
1) Probably longer to test than booleans, except indeed for a zero check as you said.
2) Do not pack as much values in a cache line as booleans (bytes). So if the search is unsuccessful after 1 cache line, it will need
to access another chunk of memory anyway.
See this article for the argument: http://research.neustar.biz/2011/12/12/big-memory-part-4/

Anyway I'll try this for primitives, this is simple enough to implement. I don't think it is worth for Objects, where the "obect-ness" dominates evrything.

Vincent

> java-high-performance-primitive-collections+unsub...@googlegroups.com.

Dawid Weiss

unread,

Oct 9, 2014, 4:56:05 AM10/9/14

to java-high-performance-primitive-collections

> there is only one array. When you test for insertion, you test for
> allocated[] = true until finding a false.

Well, you'll have to access the keys array at some point anyway
(either to compare keys or to write your own key), so unless your map
is really filled up you're actually always saving on memory/ cache
line accesses. My "zero-check" argument was also motivated by the fact
that if you pick zero as the sentinel, you won't need to store it with
the map/ set itself (and you won't need to read it all over again).
Zero-test is also very likely to be compiled into more efficient
machine code (compared to actual cmp against some value).

It's worth checking how it works in real life, obviously. Roman's
benchmarks seem to indicate it is faster (and his code generates
random sentinels, which should be more costly compared to your nifty
idea).

Dawid

Roman Leventov

unread,

Oct 10, 2014, 9:13:54 PM10/10/14

to java-high-performance...@googlegroups.com

David,

if you mean these benchmarks: http://java.dzone.com/articles/time-memory-tradeoff-example

seems you miss the main reason why Koloboke is way faster than competitors in them: parallel array layout, i. e.

[k1 v1 k2 v2...]

instread of

[k1 k2 ...]

[v1 v2 ...]

Currently all primitive - to - primitive specializations where key and value types of the same length (i. e. including long - to - double for example) are implemented with parallel layout, also object - to - object. Primitive - to - primitive where key and value size differs exactly 2 times (i. e. for including long - to - float and int - to - double) are on the way now.

If your haven't seen this before, here is discussion of array layouts, states encoding and indexing:

http://www.slideshare.net/leventov/optimizing-arraybased-data-structures-to-the-limit

However, when layout is ordinary, Koloboke wins much less:

http://stackoverflow.com/a/25354509/648955 (obj - to - int here)

четверг, 9 октября 2014 г., 12:56:05 UTC+4 пользователь Dawid Weiss написал:

Roman Leventov

unread,

Oct 10, 2014, 9:14:14 PM10/10/14

to java-high-performance...@googlegroups.com

Vincent,

I've considered the idea you propose some time ago. The problem is that it's simplier to say that to implement this stuff. Also, Iterators/Cursor will become inherently slower, because they should accout that special element (or two special elements!) and perform some branching in the beginning of each next() / hasNext() / moveNext() methods. However, now I thaught about this second time and realized that there is cheaper solution that I thaught earlier. So thank you that reminded me about this idea :)

четверг, 9 октября 2014 г., 12:56:05 UTC+4 пользователь Dawid Weiss написал:

> there is only one array. When you test for insertion, you test for

Vincent Sonnier

unread,

Oct 11, 2014, 4:05:20 AM10/11/14

to java-high-performance...@googlegroups.com

Roman, Dawid,

Well the idea is half-way there : https://github.com/vsonnier/hppc/tree/hash-single-array
It turns out to be more painful that I thought on the implementation side, I think I'm reaching the limits of Velocity vs. readability of the templates :)

Indeed Roman it consist in patching lots of methods to handle the "special value", quite a pain to write...
As for the iterators, it doesn't seem to have significant impact sor far as my simple benchamark goes. Besides, the steam-like "forEach()" bulk methods
are the canonical way to go, and IMO the iterators are much less useful now except for a pretty enhanced for loop.

Anyway, this simple benchmark
https://github.com/vsonnier/hppc/blob/0.65.x-RT-release/hppcrt-benchmarks/src/main/java/com/carrotsearch/hppcrt/misc/HppcMapSyntheticBench.java
actually shows a quite definite 10% imprevement (at least) from 0.65.x-RT-release to hash-single-array branches. So I'll probably continue this way with all the rest
of maps and sets when I get more more time.

As for Koloboke, I've read before that everything was indeed flatten-out, with K,V packed fields. On the other hand, why would it be "that" more efficient ? After all, when you search / insert ...etc the work is done on keys (or/and allocated flags arrrays) and only when you reach the index to modify then the values need to be read / accessed.
For instance, an extreme case is an unsucessfull search where the value parts never need to be accessed at all.
As a counter-example, on remove() when you pack K,V in both arrays (HPPC shiftConflictingKeys()) it would certainly be faster.

Have a nice WE

Vincent

Dawid Weiss

unread,

Oct 11, 2014, 7:54:31 AM10/11/14

to java-high-performance-primitive-collections

Hi Roman, good to have you!

Indeed you're right -- I completely missed the different collections
layout. This was so because I only looked at one of the generated
classes and incorrectly assumed all of them share the same
implementation (lazy me). Thanks for clarifying and sharing knowledge,
this indeed is quite cool!

Dawid

P.S. I see you work at HFT; no wonder you (and Peter Lawrey) are
memory-packing obsessed :)

> --
> You received this message because you are subscribed to the Google Groups
> "High Performance Primitive Collections for Java" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to

> java-high-performance-primi...@googlegroups.com.

Vincent Sonnier

unread,

Oct 22, 2014, 1:07:07 PM10/22/14

to java-high-performance...@googlegroups.com

Hello Dawid,

I've submitted a pull request Hash containers allocated[] removal for HPPC which replicate the work done on HPPC-RT.

Vincent

Dawid Weiss

unread,

Oct 22, 2014, 4:06:18 PM10/22/14

to java-high-performance-primitive-collections

Thanks Vincent! I'll add it to my queue to review. I'll probably try
to squash the branch instead of accepting such a vast number of
individual commits, but I'll definitely take a look!

Dawid

> --
> You received this message because you are subscribed to the Google Groups
> "High Performance Primitive Collections for Java" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to

> java-high-performance-primi...@googlegroups.com.

Reply all

Reply to author

Forward