--
Alan Burlison
--
What I did in a similar situation was:
- gather the changes qualified in a mutable collection (addition/update versus deletions)
- optimize the changes (e.g. throw out additions/updates when a deletion follows, keep only the "latest" update)
- look at the size of the resulting changes
- when "small" - perform changes on the immutable collection directly
- when "big" - process everything in a mutable collection and transform into an immutable result
That was fast enough (for me) but it requires that you are able to gather the changes during your processing. If they have to be applied on the spot then this model isn't an option.
(In my situation it was a simulation over time so each immutable collection represented a point in time and the mutable buffers were calculation intermediates which got integrated into the t0 collection to yield the t0+1 collection).
Greetings
Bernd
-------- Original-Nachricht --------
> Datum: Tue, 20 Dec 2011 23:10:11 +0000
> Von: Alan Burlison <alan.b...@gmail.com>
> An: scala...@googlegroups.com
> Betreff: [scala-user] How to do an efficient update of an immutable map
> Oh, I see what you mean; you'll get an intermediate map between doing the
> additions and the removals. I've never seen a method for doing additions
> and deletions at the same time, and I imagine you've looked as well.
Yes, I've looked and I thought I might be missing something obvious,
which is why I asked :-) Thanks for the confirmation. Deletions will
probably be rare, so I can just gather them up in a list and avoid
creating the intermediate hash if there aren't any deletions to do.
--
Alan Burlison
--
> get your hands on the newbuilder.
What is that?
--
Alan Burlison
--
> As far as I know a map builder just contains a map. Bulk operations on maps
> are not implemented. I guess since map uses a (very flat ) tree internally,
> you could do mapa ++ mapb by merging the trees instead of just adding all
> elements of mapb to mapa sequentially.
I've been blundering around trying to find the code that actually
implements ++ and -- and got pretty lost. From what I can tell it's in
effect a copy & append operation, is that correct? For large maps with
a low number of changes being added that seems rather inefficient.
> A data structure that supports bulk union is IntMap/LongMap in
> scala.collection.immutable. But of course you can only have ints or longs
> as keys, and it is not as memory efficient as a normal map since it is
> basically a binary trie instead of a ~32way trie like the new maps.
>
> That said, Map is pretty efficient as it is. To the OP: Did you check if
> this is an actual performance problem?
Not yet, and in practice I can probably get away with a slightly
inefficient implementation. However I'm trying to understand the
tradeoffs between using a mutable map + locking or using an immutable
map that gets replaced on update - the use case is multithreaded access,
mostly read-only.
--
Alan Burlison
--
An immutable map doesn't get replaced wholesale. Most of the new map
uses the same data as the old map, with just a sprinkling of new data
structures adjusted for the key that was inserted or removed.
--
Daniel C. Sobral
I travel to the future all the time.
> An immutable map doesn't get replaced wholesale. Most of the new map
> uses the same data as the old map, with just a sprinkling of new data
> structures adjusted for the key that was inserted or removed.
Neat, thanks for the info. If I wanted to go get my head around the
implementation, where would be the best place to start looking?
--
Alan Burlison
--
I recommend writing a (immutable) self-balancing binary tree map, with accompanying zipper to achieve your operations efficiently.
From what I saw, it uses a TrieMap based on the hash (HashTrieMap),
plus some optimizations to provide good parallelism. This is a rather
complex structure, and I advise you to get a general grounding on
functional tries before tackling the implementation. Daniel Spiewak's
Extreme Cleverness presentation
(http://www.infoq.com/presentations/Functional-Data-Structures-in-Scala)
might get you started. It's very interesting anyway, and explains very
well the basic concepts of data structure reuse (persistent data
structures).
The whole implementation seems to be inside HashMap.scala, so you can
just look up the immutable HashMap class on Scaladoc, and then click
on the link to the source code.
You want a zipper. Unfortunately there isn't one in the standard Map
implementations. I hear there is one on the way in a third party
library, or you could write one yourself pretty easily. You want to
avoid mutability for efficiency reasons.
--
Tony Morris
http://tmorris.net/
I recommend writing a (immutable) self-balancing binary tree map, with accompanying zipper to achieve your operations efficiently.
--
Alan Burlison
--
> Here is a microbenchmark showing the difference of using merge vs. using ++:
Wow, that's a huge difference even with low numbers of additions.
Thanks for the info, and for a useful example of the SimpleBenchmark
stuff as well :-)
Perhaps a bug should be logged against this?
--
Alan Burlison
--
> Also, what is your case for using an immutable data structure? If it is
> simply to prevent unintended mutations, then you might do the following:
It's for multithreaded access. It's OK for threads to have a slightly
out-of-date version, but updates should be atomic. From reading the
Scala literature that would seem to be the classic case where you'd use
an immutable map, hence the question.
> 1) Use a mutable map, but that map is never passed out into the world; it
> is used only for the batch updates.
> 2) Once a batch update has been done, create an immutable map from the
> mutable map, and use that for all the data access.
The map will only have a low percentage of entries modified when it is
modified, so converting between immutable and mutable maps each time
seemed a little clumsy.
> You could actually wrap a mutable map inside a simple class that would
> force you to use it in a fairly functional manner. Call the class IMap, it
> has 3 operations; add, delete, and return immutable map. The first two
> return a new IMap containing the old mutable map, and set a flag marking
> the original IMap as invalid; after that, any operation on the original
> IMap throws an error. Of course, this isn't functional, what it really is
> is an ad hoc way of enforcing a uniqueness constraint such as provided by
> Concurrent Clean (I have to blow my own horn here--that was my idea, the CC
> folks were nice enough to like it and implement it), so in this case you're
> doing at runtime what CC can do at compile time. This isn't ideal, but it's
> simple, and should sharply reduce imperative-type errors.
>
> Of course, if you need different versions of the map active at the same
> time, this won't help.
Threads that have grabbed the old version can carry on using it, so yes
potentially there may be multiple versions active.
--
Alan Burlison
--
On 21/12/2011 22:07, Ken McDonald wrote:
It's for multithreaded access. It's OK for threads to have a slightly out-of-date version, but updates should be atomic. From reading the Scala literature that would seem to be the classic case where you'd use an immutable map, hence the question.Also, what is your case for using an immutable data structure? If it is
simply to prevent unintended mutations, then you might do the following:
The map will only have a low percentage of entries modified when it is modified, so converting between immutable and mutable maps each time seemed a little clumsy.
1) Use a mutable map, but that map is never passed out into the world; it
is used only for the batch updates.
2) Once a batch update has been done, create an immutable map from the
mutable map, and use that for all the data access.
On 21/12/2011 22:07, Ken McDonald wrote:
It's for multithreaded access. It's OK for threads to have a slightly out-of-date version, but updates should be atomic. From reading the Scala literature that would seem to be the classic case where you'd use an immutable map, hence the question.Also, what is your case for using an immutable data structure? If it is
simply to prevent unintended mutations, then you might do the following:
> If you have multithreaded mutation, remember to use an AtomicReference to
> hold the map, and do the necessary steps (CAS looping etc.).
The mutation will only be done by one thread, but I'll still probably
need to lock around the reads/writes of the hash reference.
--
Alan Burlison
--
Then you only need to use @volatile
>> The mutation will only be done by one thread, but I'll still probably need
>> to lock around the reads/writes of the hash reference.
Caching.
--
Alan Burlison
--
> Then you only need to use @volatile
Yeah, volatile would do. I keep forgetting that they fixed that in Java5.
--
Alan Burlison
--
> Sometimes the scala collections seem a bit unfinished.
>
> I looked at the code, and it seems that map does in fact support bulk
> operations. There is a function called merge in
> scala.collection.immutable.HashMap. But it is not being used when merging
> two maps with ++.
>
> Somebody went through all the trouble of implementing a bulk merge, but
> since 99.5% of all users of scala.collection.immutable.Map will use ++ to
> merge two maps, they will never benefit from it.
>
> Instead ++ just adds all elements of the second map sequentially to the
> first map. Now, that makes sense when the second argument of ++ is not
> actually a map, but if both arguments of ++ are maps the bulk merge
> function should be used!
In practice merge doesn't seem to be all that useful. It's only defined
on immutable.HashMap. If you build up the changes you want to merge in
a mutable.HashHap there's no obvious way to convert that to a
immutable.HashMap so you can pass it to merge - the closest method is
toMap but that gives you back a Map rather than a HashMap.
I have to echo the comment above about the collections feeling
unfinished - what I'm trying to do seems like an obvious use case, but
apparently there's no way to do it.
--
Alan Burlison
--
On 22/12/2011 00:14, Rüdiger Klaehn wrote:
In practice merge doesn't seem to be all that useful. It's only defined on immutable.HashMap. If you build up the changes you want to merge in a mutable.HashHap there's no obvious way to convert that to a immutable.HashMap so you can pass it to merge - the closest method is toMap but that gives you back a Map rather than a HashMap.Sometimes the scala collections seem a bit unfinished.
I looked at the code, and it seems that map does in fact support bulk
operations. There is a function called merge in
scala.collection.immutable.HashMap. But it is not being used when merging
two maps with ++.
Somebody went through all the trouble of implementing a bulk merge, but
since 99.5% of all users of scala.collection.immutable.Map will use ++ to
merge two maps, they will never benefit from it.
Instead ++ just adds all elements of the second map sequentially to the
first map. Now, that makes sense when the second argument of ++ is not
actually a map, but if both arguments of ++ are maps the bulk merge
function should be used!
I have to echo the comment above about the collections feeling unfinished - what I'm trying to do seems like an obvious use case, but apparently there's no way to do it.
> This is just a hunch, but I think that building an immutable hash map
> containing changes just so you can use merge instead of adding them one by
> one is probably not worth it. The time you save using merge instead of ++
> will instead be spent building the changes map. The only situation where
> this would make sense is if you want the time spent in the actual merge to
> be as short as possible because you have to hold a lock during the merge,
> but not during the building of the changes map.
I think you are almost certainly right. And I can build the new map
without locking and then just switch in the reference whilst holding
the lock.
--
Alan Burlison
--