>
> Is there a plan to add a functionality of add( ) or adjustOrPutValue( ) for the general Map interface?
>
If I remember correctly, the last time we were discussing which were the use cases for such maps, but then the discussion got lost somewhere. Can we just restart? Which are the use cases for increasing the entry of a sorted map?
Ciao,
seba
> Hello,
>
> In my case, I am holding a map of values and their occurrences.
> On each iteration, values can be added (new or inc occurrences) or removed (dec occurrences or removed in case of occurrences=0).
> In addition, I need a fast way of checking what is the smallest value (getFirst()).
>
> Thus, a HashMap, will support O(1) of update, but O(N) of getFirst(),
> while TreeMap will support O(log(N)) of update and O(1) of getFirst().
Do you have an idea of the relative frequency of the two operations (increment/getFirst())?
Ciao,
seba
I asked for this feature on SortedMaps some (long) time ago, you asked
me for a use case, I replied, and then there was silence.
I used a SortedMap to keep track of overlapping intervals.
Did .add(begin, +1) .add(end, -1) for each interval, and then I could
calculate all kinds of funny things.
>
> Ciao,
>
> seba
> Very hard to say what the frequencies of increment() vs. getFirst() usage are (it is a random algorithm on random data).
> Nevertheless, considering the low overhead in API and code addition vs. the impact (my profiler is not happy with the current put(key, get(key)+1)) I think it is worth a consideration.
> Currently, I am bound to use adjustOrPutValue of Trove for non sorted maps.
Now I'm really confused. There's the same method in for unsorted maps in fastutil--in which sense "I am bound"?
I also believe you are missing some operation... don't you need to enumerate the keys in ascending order? Or, really, the only order-depending operation is finding the smallest key?
Ciao,
seba
> I asked for this feature on SortedMaps some (long) time ago, you asked
> me for a use case, I replied, and then there was silence.
Too many things happening...
> I used a SortedMap to keep track of overlapping intervals.
> Did .add(begin, +1) .add(end, -1) for each interval, and then I could
> calculate all kinds of funny things.
Now I remember. Actually, I had a look at the code, but it's much more complicated than with hash maps.
Which kind of funny things?
Ciao,
seba
> > I used a SortedMap to keep track of overlapping intervals.
> > Did .add(begin, +1) .add(end, -1) for each interval, and then I could
> > calculate all kinds of funny things.
>
> Now I remember. Actually, I had a look at the code, but it's much more complicated than with hash maps.
>
> Which kind of funny things?
Intersection of all randomly supplied intervals, sometimes with
nesting levels at exact points.
That was used in search engine match highlighting part. Maybe not the
best algo, but straightforward and performant enough.
> Ciao,
>
> seba
On Dec 13, 2011, at 12:36 AM, eran shaham wrote:Now I'm really confused. There's the same method in for unsorted maps in fastutil--in which sense "I am bound"?
> Very hard to say what the frequencies of increment() vs. getFirst() usage are (it is a random algorithm on random data).
> Nevertheless, considering the low overhead in API and code addition vs. the impact (my profiler is not happy with the current put(key, get(key)+1)) I think it is worth a consideration.
> Currently, I am bound to use adjustOrPutValue of Trove for non sorted maps.
I also believe you are missing some operation... don't you need to enumerate the keys in ascending order? Or, really, the only order-depending operation is finding the smallest key?
Ciao,
seba
> I also believe you are missing some operation... don't you need to enumerate the keys in ascending order? Or, really, the only order-depending operation is finding the smallest key?
> Only need the smallest key.
In this case, I really think you're using the wrong data structures.
Keeping track of a minimum is easily done using a heap (e.g., a fastutil HeapPriorityQueue). Essentially, you keep track of your counters using a hash map. Whenever you add a new key, or you delete a key, you update the heap accordingly. So you can get the minimum in constant time. But all your updates will be in expected constant time, instead of logarithmic, except when you add or remove a key. Memory occupation will reduce by a factor of 10, as trees need nodes, and nodes are very expensive with respect to the small payload (an integer), in particular with 64-bit JVMs.
Ciao,
seba
Oops. Of course--you need to delete a key when its counter goes to zero, not when it is the top of the heap.
Just use an indirect heap. In an indirect heap, you can delete any key. The cost is an additional integer per entry (for the inversion map), but you'll be still be way better than the tree approach.
If your keys are not consecutive integer, however, you'll need to keep track of the bijection between the indices in the heap and the keys of the map. This will require some further space, unless you write a hand-made structure that does this (e.g., storing the index corresponding to a key and the counter in the upper and lower halves of a long).
Actually (just thinking while I write) probably the simplest approach at that point would be using just one map to keep track of a bijection keys <-> [0..n), where n is the number of keys. Counters can then be stored into an array, and the heap comes out naturally.
Ciao,
seba
> Actually (just thinking while I write) probably the simplest approach at that point would be using just one map to keep track of a bijection keys <-> [0..n), where n is the number of keys. Counters can then be stored into an array, and the heap comes out naturally.
Actually, I correct myself--this won't work. The other solution will.
In general, my gut feeling is that using an ordered set to keep track of a minimum is not a good idea, as keeping track of a minimum can be done with a heap, which uses much less memory than a tree. And if you update more frequently than you insert or delete, you're wasting time finding entries in a sorted set...
Ciao,
seba