adjustOrPutValue

eran shaham

unread,

Dec 12, 2011, 1:10:24 PM12/12/11

to fast...@googlegroups.com

Hello,

I've read the post regarding 'adjustOrPutValue' functionality in:
http://groups.google.com/group/fastutil/browse_thread/thread/c8f42cbcf0526c7b/d99dc374e65e423b?lnk=gst&q=adjuctOrPutValue#d99dc374e65e423b

The option of using map.add(key,1) is valid only if you use Int2IntOpenHashMap.
Nevertheless, I need to use Int2IntSortedMap in my code.
Therefore, I need to write a code of:

    final int adjustOrPutValue(final int key, final int add) {
        final int prev = map.get(key);
        final int after = (prev == map.defaultReturnValue()) ? 1 : prev + add;
        map.inner.put(key, after);
        return after;
    }

Which have an overhead of double accessing an entry.

Is there a plan to add a functionality of add( ) or adjustOrPutValue( ) for the general Map interface?

Best Regards,
Eran Shaham

Sebastiano Vigna

unread,

Dec 12, 2011, 1:54:07 PM12/12/11

to fast...@googlegroups.com

On Dec 12, 2011, at 7:10 PM, eran shaham wrote:

>
> Is there a plan to add a functionality of add( ) or adjustOrPutValue( ) for the general Map interface?
>

If I remember correctly, the last time we were discussing which were the use cases for such maps, but then the discussion got lost somewhere. Can we just restart? Which are the use cases for increasing the entry of a sorted map?

Ciao,

seba

eran shaham

unread,

Dec 12, 2011, 2:24:17 PM12/12/11

to fast...@googlegroups.com

Hello,

In my case, I am holding a map of values and their occurrences.
On each iteration, values can be added (new or inc occurrences) or removed (dec occurrences or removed in case of occurrences=0).
In addition, I need a fast way of checking what is the smallest value (getFirst()).

Thus, a HashMap, will support O(1) of update, but O(N) of getFirst(),
while TreeMap will support O(log(N)) of update and O(1) of getFirst().

Best Regards,
Eran Shaham

Sebastiano Vigna

unread,

Dec 12, 2011, 4:24:44 PM12/12/11

to fast...@googlegroups.com

On Dec 12, 2011, at 8:24 PM, eran shaham wrote:

> Hello,
>
> In my case, I am holding a map of values and their occurrences.
> On each iteration, values can be added (new or inc occurrences) or removed (dec occurrences or removed in case of occurrences=0).
> In addition, I need a fast way of checking what is the smallest value (getFirst()).
>
> Thus, a HashMap, will support O(1) of update, but O(N) of getFirst(),
> while TreeMap will support O(log(N)) of update and O(1) of getFirst().

Do you have an idea of the relative frequency of the two operations (increment/getFirst())?

Ciao,

seba

Earwin

unread,

Dec 12, 2011, 5:53:59 PM12/12/11

to fastutil

I asked for this feature on SortedMaps some (long) time ago, you asked
me for a use case, I replied, and then there was silence.
I used a SortedMap to keep track of overlapping intervals.
Did .add(begin, +1) .add(end, -1) for each interval, and then I could
calculate all kinds of funny things.

>
> Ciao,
>
> seba

eran shaham

unread,

Dec 12, 2011, 6:36:14 PM12/12/11

to fast...@googlegroups.com

Hi,

Very hard to say what the frequencies of increment() vs. getFirst() usage are (it is a random algorithm on random data).
Nevertheless, considering the low overhead in API and code addition vs. the impact (my profiler is not happy with the current put(key, get(key)+1)) I think it is worth a consideration.
Currently, I am bound to use adjustOrPutValue of Trove for non sorted maps.

Best Regards,
Eran Shaham

Sebastiano Vigna

unread,

Dec 13, 2011, 1:34:42 AM12/13/11

to fast...@googlegroups.com

On Dec 13, 2011, at 12:36 AM, eran shaham wrote:

> Very hard to say what the frequencies of increment() vs. getFirst() usage are (it is a random algorithm on random data).
> Nevertheless, considering the low overhead in API and code addition vs. the impact (my profiler is not happy with the current put(key, get(key)+1)) I think it is worth a consideration.
> Currently, I am bound to use adjustOrPutValue of Trove for non sorted maps.

Now I'm really confused. There's the same method in for unsorted maps in fastutil--in which sense "I am bound"?

I also believe you are missing some operation... don't you need to enumerate the keys in ascending order? Or, really, the only order-depending operation is finding the smallest key?

Ciao,

seba

Sebastiano Vigna

unread,

Dec 13, 2011, 1:35:08 AM12/13/11

to fast...@googlegroups.com

On Dec 12, 2011, at 11:53 PM, Earwin wrote:

> I asked for this feature on SortedMaps some (long) time ago, you asked
> me for a use case, I replied, and then there was silence.

Too many things happening...

> I used a SortedMap to keep track of overlapping intervals.
> Did .add(begin, +1) .add(end, -1) for each interval, and then I could
> calculate all kinds of funny things.

Now I remember. Actually, I had a look at the code, but it's much more complicated than with hash maps.

Which kind of funny things?

Ciao,

seba

Earwin

unread,

Dec 13, 2011, 5:45:42 PM12/13/11

to fastutil

On Dec 13, 10:35 am, Sebastiano Vigna <vi...@dsi.unimi.it> wrote:
> On Dec 12, 2011, at 11:53 PM, Earwin wrote:
>
> > I asked for this feature on SortedMaps some (long) time ago, you asked
> > me for a use case, I replied, and then there was silence.
>
> Too many things happening...

No, no, no need to excuse yourself. I'm thankful for what I got for
free already :) That wasn't an accusation, merely list of events.

> > I used a SortedMap to keep track of overlapping intervals.
> > Did .add(begin, +1) .add(end, -1) for each interval, and then I could
> > calculate all kinds of funny things.
>
> Now I remember. Actually, I had a look at the code, but it's much more complicated than with hash maps.
>
> Which kind of funny things?

Intersection of all randomly supplied intervals, sometimes with
nesting levels at exact points.
That was used in search engine match highlighting part. Maybe not the
best algo, but straightforward and performant enough.

> Ciao,
>
> seba

eran shaham

unread,

Dec 16, 2011, 9:55:49 AM12/16/11

to fast...@googlegroups.com

Sorry for the late reply. Too many tasks.
See my answer inlined

On Tue, Dec 13, 2011 at 08:34, Sebastiano Vigna <vi...@dsi.unimi.it> wrote:

On Dec 13, 2011, at 12:36 AM, eran shaham wrote:

> Very hard to say what the frequencies of increment() vs. getFirst() usage are (it is a random algorithm on random data).
> Nevertheless, considering the low overhead in API and code addition vs. the impact (my profiler is not happy with the current put(key, get(key)+1)) I think it is worth a consideration.
> Currently, I am bound to use adjustOrPutValue of Trove for non sorted maps.

Now I'm really confused. There's the same method in for unsorted maps in fastutil--in which sense "I am bound"?

I was using Trove as I was not aware of Int2IntOpenHashMap add(..) functionality

I also believe you are missing some operation... don't you need to enumerate the keys in ascending order? Or, really, the only order-depending operation is finding the smallest key?

Only need the smallest key.

You've probably did a performance comparison in the past between Trove and Fastutils.
In case you haven't, I wish to report on the comparison I did.
It is basically using a Int2IntOpenHashMap, with many calls to add(..), iterator(..) (of int2IntEntrySet() and values(), with their hasNext(), nextInt(), etc ...).
The results are that Fastutils is slightly faster than Trove.
The difference is ~1%, i.e., work done by fastutils in 100sec will take Trove 101sec

Before I press send, I wish to express my appreciation on your fine work and dedication to supply a high quality (code+performance) Java primitives.

Ciao,

seba

Best Regards,
Eran Shaham

Sebastiano Vigna

unread,

Dec 16, 2011, 10:16:48 AM12/16/11

to fast...@googlegroups.com

On Dec 16, 2011, at 3:55 PM, eran shaham wrote:

> I also believe you are missing some operation... don't you need to enumerate the keys in ascending order? Or, really, the only order-depending operation is finding the smallest key?
> Only need the smallest key.

In this case, I really think you're using the wrong data structures.

Keeping track of a minimum is easily done using a heap (e.g., a fastutil HeapPriorityQueue). Essentially, you keep track of your counters using a hash map. Whenever you add a new key, or you delete a key, you update the heap accordingly. So you can get the minimum in constant time. But all your updates will be in expected constant time, instead of logarithmic, except when you add or remove a key. Memory occupation will reduce by a factor of 10, as trees need nodes, and nodes are very expensive with respect to the small payload (an integer), in particular with 64-bit JVMs.

Ciao,

seba

eran shaham

unread,

Dec 16, 2011, 11:09:41 PM12/16/11

to fast...@googlegroups.com

Hi,

Interesting idea. Just one thing I'm missing (using fastutil IntHeapPriorityQueue):
How do I delete a key from the heap? I can only dequeue(..) the first element ...

In my application, I do many add(key) and remove(key), while, from time to time, I need the first(..).

THX,
e.

On Fri, Dec 16, 2011 at 17:16, Sebastiano Vigna <vi...@dsi.unimi.it> wrote:

HeapPriorityQueue

Sebastiano Vigna

unread,

Dec 17, 2011, 1:21:18 AM12/17/11

to fast...@googlegroups.com

On Dec 17, 2011, at 5:09 AM, eran shaham wrote:
> Interesting idea. Just one thing I'm missing (using fastutil IntHeapPriorityQueue):
> How do I delete a key from the heap? I can only dequeue(..) the first element ...
>
> In my application, I do many add(key) and remove(key), while, from time to time, I need the first(..).

Oops. Of course--you need to delete a key when its counter goes to zero, not when it is the top of the heap.

Just use an indirect heap. In an indirect heap, you can delete any key. The cost is an additional integer per entry (for the inversion map), but you'll be still be way better than the tree approach.

If your keys are not consecutive integer, however, you'll need to keep track of the bijection between the indices in the heap and the keys of the map. This will require some further space, unless you write a hand-made structure that does this (e.g., storing the index corresponding to a key and the counter in the upper and lower halves of a long).

Actually (just thinking while I write) probably the simplest approach at that point would be using just one map to keep track of a bijection keys <-> [0..n), where n is the number of keys. Counters can then be stored into an array, and the heap comes out naturally.

Ciao,

seba

Sebastiano Vigna

unread,

Dec 17, 2011, 4:23:37 AM12/17/11

to fast...@googlegroups.com

On Dec 17, 2011, at 7:21 AM, Sebastiano Vigna wrote:

> Actually (just thinking while I write) probably the simplest approach at that point would be using just one map to keep track of a bijection keys <-> [0..n), where n is the number of keys. Counters can then be stored into an array, and the heap comes out naturally.

Actually, I correct myself--this won't work. The other solution will.

In general, my gut feeling is that using an ordered set to keep track of a minimum is not a good idea, as keeping track of a minimum can be done with a heap, which uses much less memory than a tree. And if you update more frequently than you insert or delete, you're wasting time finding entries in a sorted set...