IntOpenHashSet as opposed to java.util.HashSet

362 views
Skip to first unread message

Sophie Sperner

unread,
Aug 7, 2012, 7:17:05 AM8/7/12
to java-high-performance...@googlegroups.com
Dear all,

I'm using many sets IntSet set = new IntOpenHashSet(); in my code and then do set.add(), set.clear(), set.contains() operations only, I do not lookup (set.get()). I changed all the Java HashSets and found no speedup at all, even an overhead in my program. I do measure the execution time and was expecting to find hppc to be faster. Maybe I misunderstood something.

Is IntOpenHashSet is fine for storing ints?

Thank you.

Dawid Weiss

unread,
Aug 7, 2012, 7:23:55 AM8/7/12
to java-high-performance...@googlegroups.com
You won't see any speedup I'm afraid. IntOpenHashSet is there to
complement other collections because HPPC has a slightly different
class design compared to Java's collections framework. Inside, the
implementation is very much alike (there are minor differences). Most
of all, however, Java's default collections use intrinsic operations
(processor hardware) to compute certain things like bitcounts which
can make them even faster on newer hardware.

You will see gains from using HPPC if you start using other
collections for primitive types, however (lists, sets, maps).

Dawid

Sophie Sperner

unread,
Aug 7, 2012, 7:50:44 AM8/7/12
to java-high-performance...@googlegroups.com
Thank you Dawid for reply,

So HashSet<Integer> is the same in performance as IntOpenHashSet. Alright.

I have a HashMap<Integer, ArrayList<Item>> where Item is my program-specific custom object with hashCode and equals implemented. Actuall the hashCode is perfect and returns unique ints. I store my main data in this map. Would you recommend to use HPPC in order to work with int[] keys and Integer to avoid autoboxing etc.

Do you think this will give a speedup?

Dawid Weiss

unread,
Aug 7, 2012, 8:06:05 AM8/7/12
to java-high-performance...@googlegroups.com
> So HashSet<Integer> is the same in performance as IntOpenHashSet. Alright.

Oops, sorry -- I thought you were talking about bitsets. If you're
talking about maps or sets of integers then the difference should be
visible and significant assuming that your set of keys (Integers) is
quite large; don't know how many objects you put in there.

> I have a HashMap<Integer, ArrayList<Item>> where Item is my program-specific

How many unique keys in those maps? If there are not too many the
difference will be obscured by other parts of the code.

> recommend to use HPPC in order to work with int[] keys and Integer to avoid
> autoboxing etc.

Assuming you have plenty of keys to work with
IntObjectOpenHashMap<List<Item>> will be suitable. It will save space
on keys (no boxed integers) and should give you favorable collision
rates compared to standard maps. I would first try to detect
bottlenecks though -- maybe your program spends 99% of time elsewhere,
not in key lookup. Then you're aiming at the wrong target.

D.

Sophie Sperner

unread,
Aug 7, 2012, 8:24:42 AM8/7/12
to java-high-performance...@googlegroups.com
 
> So HashSet<Integer> is the same in performance as IntOpenHashSet. Alright.

Oops, sorry -- I thought you were talking about bitsets. If you're
talking about maps or sets of integers then the difference should be
visible and significant assuming that your set of keys (Integers) is
quite large; don't know how many objects you put in there.

Let's see what more experiments will show me.
 
> I have a HashMap<Integer, ArrayList<Item>> where Item is my program-specific

How many unique keys in those maps? If there are not too many the
difference will be obscured by other parts of the code.

All the keys are unique, from 1 to 100.

> recommend to use HPPC in order to work with int[] keys and Integer to avoid
> autoboxing etc.

Assuming you have plenty of keys to work with
IntObjectOpenHashMap<List<Item>> will be suitable. It will save space
on keys (no boxed integers) and should give you favorable collision
rates compared to standard maps. I would first try to detect
bottlenecks though -- maybe your program spends 99% of time elsewhere,
not in key lookup. Then you're aiming at the wrong target.

Thank you, I will try IntObjectOpenHashMap.
 
Best wishes and have a wonderful day :)

Dawid Weiss

unread,
Aug 7, 2012, 8:29:29 AM8/7/12
to java-high-performance...@googlegroups.com
> All the keys are unique, from 1 to 100.

In this case you don't need a hashmap, really -- just use an array and
directly access those slots that you need:

List<Item>[]

You won't see any speedups if you only have Integers < 128 because
these are autoboxed from cache (same objects are used). If you really
wish to see some differences run a benchmark where random Integers are
used as keys (or pseudo-random but from a large range).

D.

Sophie Sperner

unread,
Aug 14, 2012, 8:01:39 AM8/14/12
to java-high-performance...@googlegroups.com
Dear Dawid,

Thank you for previous thoughts. Do you know the fastest way to copy IntOpenHashSet? Currently I'm doing:
IntSet hsCopy = new IntOpenHashSet(hs);

Dawid Weiss

unread,
Aug 14, 2012, 11:22:40 AM8/14/12
to java-high-performance...@googlegroups.com
IntOpenHashSet variable = otherIntOpenHashSet.clone();

:)

D.

Sophie Sperner

unread,
Aug 15, 2012, 7:55:04 AM8/15/12
to java-high-performance...@googlegroups.com
Thank you, used that.

I'm working also with IntArrayList, why there is no option to add an element by position? Do I need to recreate an array each time I add new element? I'm trying to achieve that at the moment. Also, list.buffer.length is not the size of IntArrayList?

Sophie Sperner

unread,
Aug 15, 2012, 9:22:23 AM8/15/12
to java-high-performance...@googlegroups.com
Well, I did it with recreation of an array. Looks not nice, but anyway. The same I observe for ObjectArrayList - no addition to a specified position possible.

Dawid Weiss

unread,
Aug 15, 2012, 9:49:52 AM8/15/12
to java-high-performance...@googlegroups.com
> I'm working also with IntArrayList, why there is no option to add an element
> by position?

I don't understand your question.

A list has a maximum size; once you have a non-empty list you can set
any element by position (using set) as long as the index is within the
list's size. It's the same as in Java Collections. You can also use
ensureCapacity to make sure your list is large enough instead of
adding empty elements.

> Do I need to recreate an array each time I add new element?

You are probably trying to do something very awkward, I don't understand.

> Also, list.buffer.length is not the size of IntArrayList?

Because it's only a buffer in which the list elements are stored. We
don't reallocate each time -- it would be very wasteful on the garbage
collector and memory copying.

Dawid

Sophie Sperner

unread,
Aug 15, 2012, 11:33:12 AM8/15/12
to java-high-performance...@googlegroups.com
I found it :) list.insert(index, elem) slightly different from ArrayList.add(index, elem).

Dawid Weiss

unread,
Aug 15, 2012, 2:34:46 PM8/15/12
to java-high-performance...@googlegroups.com
> I found it :) list.insert(index, elem) slightly different from
> ArrayList.add(index, elem).

Just note that inserting into a contiguous array list requires
shifting all the elements to the right of "index" so repeated
insertions at small indexes will be costly. I'd say a linked list is
what you need in that case but a linked list in Java is a nightmare in
terms of memory overhead... I'd just code it myself as an int pointer
data structure.

Dawid

Sophie Sperner

unread,
Aug 17, 2012, 6:45:10 AM8/17/12
to java-high-performance...@googlegroups.com

Sophie Sperner

unread,
Aug 17, 2012, 6:47:36 AM8/17/12
to java-high-performance...@googlegroups.com
Dear Dawid,

Sorry for the last post. LinkedList is not only memory-nightmare, it is also slower than regular IntArrayList when you iterate the elements.

It is interesting about int pointer data structure. Is there something similar in HPPC library or could you give a thought how to code it?

Dawid Weiss

unread,
Aug 17, 2012, 7:11:28 AM8/17/12
to java-high-performance...@googlegroups.com
> It is interesting about int pointer data structure. Is there something
> similar in HPPC library or could you give a thought how to code it?

You just simulate/ implement a linked list of values on an int array
(or byte array) pretty much like you would
in any low-level language (C, C++, assembly). No references to
objects, in other words, just int values pointing to indices of
an array. You can use an ArrayIntList for the underlying backing store
of course but the meaning of individual elements is task-dependent and
I won't be able to help you here.

Dawid

Tatu Saloranta

unread,
Aug 17, 2012, 12:55:03 PM8/17/12
to java-high-performance...@googlegroups.com
Also: to combine good iteration and insert performance, a segmented
backing arrays work pretty well.
Instead of using one big contiguous array, which is costly to resize
as well as copy (on inserts), one can split it dynamically into
chunks, to localize changes (resizes and copies).

-+ Tatu +-

Dawid Weiss

unread,
Aug 17, 2012, 2:08:44 PM8/17/12
to java-high-performance...@googlegroups.com
Yes, indeed. There are plenty of possible implementations (and just as
many caveats of each, probably). Segment-oriented representation may
have poor locality on linear traversals and may fragment memory (heap)
resulting in increased GC times, etc.

I'd say this: you're asking way too many specific questions and still
haven't presented the overall problem you're trying to tackle.
Describe what the problem (algorithm) is; I'm sure the data structures
to solve it will follow-up :)

Dawid

Sophie Sperner

unread,
Aug 22, 2012, 8:18:49 AM8/22/12
to java-high-performance...@googlegroups.com
Dear all,

I kind of used an int pointer array, thank you. Now I would like to try the IntObjectOpenHashMap where I store ints as keys and IntArrayLists of different sizes as Objects. When the structure is ready, I would like to iterate by keys sorted in ascending order of its values, for example having this:

map = { 2 => [2, 5, 3, 5], 5 => [0, 390, 39], 20 => [2], 4 => [], 3 => [1, 1, 2, 5]}

I need to iterate in this order:

map = { 2 => [2, 5, 3, 5], 3 => [1, 1, 2, 5], 4 => [], 5 => [0, 390, 39], 20 => [2]}

Could you please recommend the structure from your library to use?

Dawid Weiss

unread,
Aug 22, 2012, 11:47:40 AM8/22/12
to java-high-performance...@googlegroups.com
> Could you please recommend the structure from your library to use?

HPPC doesn't have an ordered map implementation. You may want to take
a look at fastutil, it does have several implementations of sorted
maps. fastutil also has low-level primitive lists so you can convert
your code entirely to it.

http://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/ints/Int2ObjectSortedMap.html

Dawid

Sophie Sperner

unread,
Aug 22, 2012, 5:31:54 PM8/22/12
to java-high-performance...@googlegroups.com
Thank you, will investigate that library too. May I ask please again?

I've recently described my question here. Do you have any tree implementation for int nodes for best performance in the library?

Dawid Weiss

unread,
Aug 23, 2012, 2:36:39 AM8/23/12
to java-high-performance...@googlegroups.com
Sophie,

> I've recently described my question here. Do you have any tree
> implementation for int nodes for best performance in the library?

Again -- you've described particular needs for a data structure, not
the problem :) Anyway:

1) I wouldn't worry about performance until you hit bottlenecks (early
optimization is typically an overkill),
2) any data structure that allocates memory dynamically and has object
references will be a memory hog and will slow down performance.
3) there is no "out of the box" data structure for non-typical
problems. Even a basic course on data structures should be enough to
devise something that will work for you and avoid the pitfalls of (2).
From your description I find some similarities to how trees are
represented in a "neighbor-child" representation that has a constant
representation size and allows variable children count (useful to
avoid reallocation of memory). This paper has the description of such
a data structure:

http://www.drdobbs.com/database/a-practical-suffix-tree-implementation/184404184

It's not directly usable for your application but it does carry the
ideas that will be useful for you (int pointers, constant-size node
representation).

Dawid

Sophie Sperner

unread,
Feb 26, 2013, 7:06:15 AM2/26/13
to java-high-performance...@googlegroups.com
Dear all,

Can I continue exploring hppc here? For your convenience I have asked a question. Could you please tell me what do you think? I have no idea what todo.

Yours,
Sophie

Dawid Weiss

unread,
Feb 26, 2013, 7:24:44 AM2/26/13
to java-high-performance...@googlegroups.com

This mailing list would be probably a better place. Anyway, post a test case or a fully functional snippet of code that repeats this problem.

Dawid


--
You received this message because you are subscribed to the Google Groups "High Performance Primitive Collections for Java" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-high-performance-primi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sophie Sperner

unread,
Feb 26, 2013, 8:39:27 AM2/26/13
to java-high-performance...@googlegroups.com
 Dear Dawid,

I will try to fix it myself. I have a standalone snippet of code, but it is a scientific app working with graphs so not very easy to get the idea quickly. Will post only if will not be able to find any solution. Maybe from your practice you remember when IntIntMap map .get() method hanged forever and could say the reason so that I can think if I have the same reason here.

Best wishes.

Sophie

Dawid Weiss

unread,
Feb 26, 2013, 8:44:19 AM2/26/13
to java-high-performance...@googlegroups.com

Again -- are you completely sure it is hanging inside the get() method and not looping in your code somewhere? Can you get a stack trace (or a few stack traces) to confirm where it's at? If you can dump the contents of the map to the console and send me that along with the key that causes the dump I'll take a look!

Dawid



Sophie

--

Sophie Sperner

unread,
Feb 26, 2013, 10:54:25 AM2/26/13
to java-high-performance...@googlegroups.com
Dear Dawid,

Interesting, I changed IntIntMap onto HashMap<Integer, Integer> everywhere in code while commenting all the code with IntIntMap, so everything is the same and only type is different and it made a trick - now working.



Again -- are you completely sure it is hanging inside the get() method and not looping in your code somewhere?

Yes, sure, when I debug my code it hangs on exactly when you try to execute either IntIntMap.get() or IntIntMap.containsKey() method.

Can you get a stack trace (or a few stack traces) to confirm where it's at? If you can dump the contents of the map to the console and send me that along with the key that causes the dump I'll take a look!

I do not know how to get a stack trace. What I can do is exactly before going into get() method, copy the contents of a particular map and a key which I pass and then send it to you - would it be fine? We can do that whenever you have time to maybe spoil a bug somewhere in IntIntMap. As I said, HashMap works.

Have a nice day!
Sophie

Dawid Weiss

unread,
Feb 26, 2013, 12:33:08 PM2/26/13
to java-high-performance-primitive-collections

Well, that's not good. Please do the following:

1) make sure your JVM runs with assertions enabled (pass -ea switch to your Java; I don't know what environment you launch from but this should be straightforward).

2) If you could dump the state of that map before it enters the loop and then all of the indices you access it'd be great. You can dump the state of the map using apache commons-lang library, using this class:

http://commons.apache.org/proper/commons-lang//api/org/apache/commons/lang3/builder/ReflectionToStringBuilder.html

Please pack it into a txt file and send it to me with the snippet of code that loops over this array (preferably with the one you used to dump the above information, it'll help in reproducing the problem).

Thanks,

Dawid



--

Tatu Saloranta

unread,
Feb 26, 2013, 12:51:37 PM2/26/13
to java-high-performance...@googlegroups.com
On Tue, Feb 26, 2013 at 7:54 AM, Sophie Sperner
<sophie....@gmail.com> wrote:
> Dear Dawid,
>
> Interesting, I changed IntIntMap onto HashMap<Integer, Integer> everywhere
> in code while commenting all the code with IntIntMap, so everything is the
> same and only type is different and it made a trick - now working.
>
>
>
>> Again -- are you completely sure it is hanging inside the get() method and
>> not looping in your code somewhere?
>
>
> Yes, sure, when I debug my code it hangs on exactly when you try to execute
> either IntIntMap.get() or IntIntMap.containsKey() method.
>
>> Can you get a stack trace (or a few stack traces) to confirm where it's
>> at? If you can dump the contents of the map to the console and send me that
>> along with the key that causes the dump I'll take a look!
>
>
> I do not know how to get a stack trace. What I can do is exactly before

This depends on platform, but this:

http://stackoverflow.com/questions/3734696/how-to-get-stack-trace-of-a-thread

lists common methods. IDEs (Eclipse, Idea) also have ways to do that,
as does JConsole.

Hope this helps,

-+ Tatu +-

Dawid Weiss

unread,
Feb 26, 2013, 3:36:22 PM2/26/13
to java-high-performance-primitive-collections

I looked at the code and I think I found a scenario in which get indeed could be looping forever -- if you use a load factor of 1 (or close to 1) and full internal array's capacity is reached (power-of-two - 1 elements) then get() will loop over the array trying to find either an unused slot or a matching element and eventually it'll fail on both.

I'm working on a fix for this but I'd like to confirm that you indeed have a high load factor set in your code -- this would prove we're seeing the same bug. Of course if you could send me those data dumps it'd be splendid.

I filed this issue to track progress on this:

Dawid

Dawid Weiss

unread,
Feb 27, 2013, 9:43:30 AM2/27/13
to java-high-performance-primitive-collections

Sophie? What about that load factor -- did you find the time to look into that?

Dawid

Dawid Weiss

unread,
Mar 1, 2013, 6:03:36 AM3/1/13
to java-high-performance-primitive-collections, sophie....@gmail.com
I have just released version 0.4.3 of HPPC which should fix your
issues with hanging methods.

Dawid

Sophie Sperner

unread,
Mar 1, 2013, 10:49:11 AM3/1/13
to Dawid Weiss, java-high-performance-primitive-collections
Dawid,

You are right. My problem was the following - I created an IntIntMap with loadFactor equal to 1, for example:

IntIntMap ranksA = [2597=>6, 2297=>4, 93=>0, 162=>1, 139=>2, 70=>3, 952=>5, 2551=>7]

    allocated = [true, true, true, true, true, true, true, true]
    assigned = 8
    keys = [2597, 2297, 93, 162, 139, 70, 952, 2551]
    lastSlot = 0
    loadFactor = 1.0
    resizeThreshold = 8
    values = [6, 4, 0, 1, 2, 3, 5, 7]

And when I called int minr = ranksA.get(hA); with hA = 46 it failed.

The problem I fixed by not specifying the loadFactor, though I would specify it because I know exactly the map's size. I will try your new version later. Thank you for fixing it. Anyway I'm still using your library.

Best wishes,
--
Yours,
Sophie

Dawid Weiss

unread,
Mar 1, 2013, 11:39:32 AM3/1/13
to Sophie Sperner, java-high-performance-primitive-collections
It should be clearly stated that specifying a high load factor for
*any* implementation of a hash-based container is going to kill
performance. Remember the load factor determines a random distribution
of "empty gaps" in between slots taken by keys or conflicting keys. So
a lookup of a non-existent element from a hashmap with load factor 1
is going to take as long as a pessimistic linear lookup in a list the
size of that map! If your maps are constructed once and then used for
lookups you may be far better off by creating an array and then
running a binary-search over it. If you don't care about memory that
much, stick to the default load factor.

Thanks for pointing me at the bug though, I fixed a far more important
one that might have had side-effects in low memory conditions (leaving
maps in an inconsistent state).

Dawid
Reply all
Reply to author
Forward
0 new messages