Puzzled by concurrency example in ‘Scala In Depth’

425 views
Skip to first unread message

Daniel

unread,
Jan 26, 2013, 1:28:27 PM1/26/13
to scala...@googlegroups.com
I am looking at the ImmutableService example on p. 32 (section 2.3.2 Concurrency) of the PDF version of Joshua Suereth’s book, ‘Scala in Depth’. I am almost certainly missing something obvious, so I’d appreciate someone enlightening me as to what that is :-) The code is as follows:

class ImmutableService[Key, Value] extends Service[Key, Value] {
var currentIndex = new ImmutableHashMap[Key,Value]
def lookUp(k : Key) : Option[Value] = currentIndex.get(k)
def insert(k : Key, v: Value) : Unit = synchronized {
currentIndex = currentIndex + ((k, v))
}
}

Consider two threads, each running on a separate processor. Thread A (running on CPU 1) calls lookUp. Thread B (running on CPU 2) then calls insert. Thread A (still running on CPU 1) then calls lookUp again.

I’m no expert in the Java Memory Model (JMM), but I can’t see that it provides any guarantee that thread A won’t see a stale version of currentIndex in its second call (e.g. if the previous value of currentIndex happens to be in the processor cache on CPU 1). As far as I understand it, Thread A is only guaranteed to see the latest value of currentIndex written by Thread B if Thread A synchronizes on the same monitor as did Thread B when the updated value of currentIndex was written.

Consequently, it seems to me that var currentIndex ought to be declared volatile for this example to work. Now, since Joshua Suereth is vastly more experienced than I am (my Java is rusty and I am new to Scala), there must be something I’m missing, and I’d like to know what that is -- any help would therefore be very much appreciated.

For more information, see ‘What does synchronization do?’ in the JSR 133 FAQ:


Specifically, that says:

"Important Note: Note that it is important for both threads to synchronize on the same monitor in order to set up the happens-before relationship properly. It is not the case that everything visible to thread A when it synchronizes on object X becomes visible to thread B after it synchronizes on object Y. The release and acquire have to "match" (i.e., be performed on the same monitor) to have the right semantics. Otherwise, the code has a data race."

The accepted answer on this Stack Overflow question also looks relevant:


Many thanks in advance. I look forward to enlightenment!

-- 
Daniel 

√iktor Ҡlang

unread,
Jan 26, 2013, 1:45:56 PM1/26/13
to Daniel, scala-user
Yes, you are right.

A safe version would need a @volatile on currentIndex OR synchronizing on reads as well.
Another option is to do a CAS loop on update and a fenced read (think AtomicReference) to avoid blocking.

Cheers,




-- 
Daniel 

--
 
 



--
Viktor Klang
Director of Engineering

Typesafe - The software stack for applications that scale
Twitter: @viktorklang

Daniel

unread,
Jan 26, 2013, 2:21:00 PM1/26/13
to scala...@googlegroups.com, Daniel
Thank you for the quick response, Viktor – much appreciated. It’s reassuring to know that I wasn’t going mad!

Josh Suereth

unread,
Jan 26, 2013, 2:21:12 PM1/26/13
to Viktor Klang, scala-user, Daniel

Not only are you right, but the example was chosen such that it does not matter.   I.e. a few stale search results isn't a huge deal in the context of that application.

A more detailed intro is already on the errata for the book, not sure that's been made public yet.

The key to the example is the fact that it's *consistent* to use the stale immutable index, while the mutable index forces you to lock, even if you don't mind staleness otherwise.

A lot of search indices lag live results in such a fashion, aiming for "eventual consistency".   The technique outlines how such a thing is possible using immutable data structures, but much harder with mutable ones.

Basically, not every concurrency problem devolves into "I must know exact state now".   Some allow previous *but consistent * state.

Daniel

unread,
Jan 26, 2013, 2:35:28 PM1/26/13
to scala...@googlegroups.com, Viktor Klang, Daniel
On Saturday, 26 January 2013 19:21:12 UTC, Josh Suereth wrote:

Not only are you right, but the example was chosen such that it does not matter.   I.e. a few stale search results isn't a huge deal in the context of that application.

A more detailed intro is already on the errata for the book, not sure that's been made public yet.

Thank you, Josh – that’s a helpful clarification. I do of course appreciate that the data accessed by my example Thread A will be consistent, even if it is stale. (I did first have to check that the reference assignment to var currentIndex was guaranteed to be atomic, however. It might therefore also be worth mentioning that in your revised introduction if you haven’t already, to save readers from having to go and check that for themselves.)
 

The key to the example is the fact that it's *consistent* to use the stale immutable index, while the mutable index forces you to lock, even if you don't mind staleness otherwise.

A lot of search indices lag live results in such a fashion, aiming for "eventual consistency".   The technique outlines how such a thing is possible using immutable data structures, but much harder with mutable ones.

Basically, not every concurrency problem devolves into "I must know exact state now".   Some allow previous *but consistent * state.

I understand, and I’m sure that your revised introduction will make that clear so that people don’t think this pattern is appropriate if up-to-dateness is required. Unfortunately, the present example code is being referenced without any such caveats, which may lead some unwary astray. e.g.:

  http://stackoverflow.com/a/6807324/702798

I’ll see whether I can add a link there back to this discussion.

Many thanks again, I really appreciate your responsiveness.

-- 
Daniel

Igor Skornyakov

unread,
Sep 9, 2013, 1:04:49 PM9/9/13
to scala...@googlegroups.com

Sorry, but I think that the problem is not only with "eventual consistency". Unless a map contract guarantees that adding a new key/value pair  and retrieval of a value by the key are protected by appropriate memory barriers (as is the case e.g. with ConcurrentHashMap) the code is still incorrect. and not only because of stale values - it is just a textbook example of incorrect hand-off. I believe that Scala immutable Maps do have such memory barriers, but this should at least be mentioned. The situation will look even more dramatic if you add a 'remove' operation to the service - in this case even "eventual consistency" will be not an excuse.
Generally speaking it is very naive to think about immutable data structures as of a silver bullet - after all they are used on machines with mutable RAM and of course manipulations with immutable structures do involve mutations behind the scene. When I use Java concurrent collections I feel safe because authors made all guarantees explicit.

Rüdiger Klaehn

unread,
Sep 10, 2013, 3:37:20 AM9/10/13
to Igor Skornyakov, scala-user
If an immutable data structure is written in the standard way, using case classes which use only final fields (e.g. Lists and binary trees), as far as I know the java memory model guarantees that your data structure will not become internally inconsistent and that basically everything works as expected. You are responsible to make sure that the (mutable) root of your tree is visible to other threads by appropriate means (volatile, synchronization, AtomicReference, Actors, whatever).

For "clever" data structures like Vector, HashSet, HashMap that use mutable arrays and fields inside to get a flat hierarchy, I would say that the implementer is responsible to make sure that the data structure behaves correctly from the outside even in a multithreading context, or to put a big fat disclaimer on it.

But I would feel better if the concurrency guarantees of complex immutable data structures like Vector and HashMap were explicitly stated in the docs.

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alexandru Nedelcu

unread,
Sep 11, 2013, 3:26:00 AM9/11/13
to Rüdiger Klaehn, Igor Skornyakov, scala-user
On Tue, Sep 10, 2013 at 10:37 AM, Rüdiger Klaehn <rkl...@gmail.com> wrote:
> If an immutable data structure is written in the standard way, using case
> classes which use only final fields (e.g. Lists and binary trees), as far as
> I know the java memory model guarantees that your data structure will not
> become internally inconsistent and that basically everything works as
> expected.

This is true, JSR 133 guarantees that the values of final fields are
published as soon as the constructor finished.

But it goes further than this. Indirect references (like when storing
an array with references in your final field, or even nested arrays)
are also guaranteed to be visible. The JSR-133 FAQ [1] says something
like ... "the visible values for any other object or array referenced
by those final fields will be at least as up-to-date as the final
fields".

So even if you have nested arrays or classes containing mutable
values, those values will be at least as up to date as the publishing
of the final field, assuming no values escaped during construction.

This happens because when a thread first encounters a reference to
something, it is required to read it from RAM. After this initial
reading, the usual visibility rules apply, but if no references leaked
during construction (to invalidate this initial contact) then the
access to the final field is safe, even if it contains mutable
data-structures.

So basically, if you use "val" everywhere, you've got a guarantee from
JSR-133 that the implementation is indeed thread-safe.

[1] http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalRight


--
Alexandru Nedelcu
www.bionicspirit.com

Alexandru Nedelcu

unread,
Sep 11, 2013, 3:37:30 AM9/11/13
to Igor Skornyakov, scala-user
On Sat, Jan 26, 2013 at 7:28 PM, Daniel <daniel...@gmail.com> wrote:
>
> I am looking at the ImmutableService example on p. 32 (section 2.3.2
> Concurrency) of the PDF version of Joshua Suereth’s book, ‘Scala in Depth’.
> I am almost certainly missing something obvious, so I’d appreciate someone
> enlightening me as to what that is :-) The code is as follows:
>
> class ImmutableService[Key, Value] extends Service[Key, Value] {
> var currentIndex = new ImmutableHashMap[Key,Value]
> def lookUp(k : Key) : Option[Value] = currentIndex.get(k)
> def insert(k : Key, v: Value) : Unit = synchronized {
> currentIndex = currentIndex + ((k, v))
> }
> }

So that example is extremely flawed.

Just because the insert() is synchronized, it doesn't mean that the
result will be visible from other threads, because indeed the JMM only
guarantees visibility when the read is synchronized on the same lock.
It's a really bad example. Not only that, but you've got no guarantee
that the result will ever be published. It's a really bad example.

You get a much stronger visibility guarantee if you touch a @volatile,
as it creates a full memory fence. So writing to a volatile reference,
not only guarantees the visibility of that volatile, but also the
visibility of references that were modified before it by the same
thread. On the other hand volatiles are pretty useless if you also
want to do synchronization to ensure the atomicity of updates
(atomicity is a different problem than visibility).

Personally I prefer to use AtomicReferences for all the public mutable
fields I have. They are like volatile references, except you also get
compareAndSet which is extremely useful, especially in combination
with immutable data-structures.

--
Alexandru Nedelcu
www.bionicspirit.com

Rüdiger Klaehn

unread,
Sep 11, 2013, 3:46:36 AM9/11/13
to Alexandru Nedelcu, Igor Skornyakov, scala-user
Thanks for the link.

So something like HashMap and HashSet, which uses mutable arrays but only mutates them before construction of the actual object, is inherently safe. But it still wouldn't hurt to state this explicitly somewhere in the collections documentation. Also so that people that write their own immutable collections know what the expectations are.

Where I really would like some explicit guarantees is data structures like Vector. It uses a lot of mutable state that is modified after the object is created. For example there is a dirty flag in the class itself and some mutable display pointers in VectorPointer[T], which is mixed in. I don't see any @volatile or anything anywhere.

Now it is possible that the authors of Vector have thought through all the implications of these fields in a multithreaded context, but I would like to see an explicit statement that this is the case. Otherwise, there should be a huge disclaimer that Vector is not safe to use in a multithreaded context.

https://github.com/scala/scala/blob/v2.10.2/src/library/scala/collection/immutable/Vector.scala#L1

Alexandru Nedelcu

unread,
Sep 11, 2013, 12:31:06 PM9/11/13
to Rüdiger Klaehn, Igor Skornyakov, scala-user
On Wed, Sep 11, 2013 at 10:46 AM, Rüdiger Klaehn <rkl...@gmail.com> wrote:
> Where I really would like some explicit guarantees is data structures like
> Vector. It uses a lot of mutable state that is modified after the object is
> created. For example there is a dirty flag in the class itself and some
> mutable display pointers in VectorPointer[T], which is mixed in. I don't see
> any @volatile or anything anywhere.

This sounds pretty bad. This is the first time I take a look at
Vector's code and it's a pity that those private functions don't have
a one-line comment or something for their purpose, as I can't figure
out what that "dirty" flag is doing - it looks as some sort of
optimization for prepending into a Vector?

--
Alexandru Nedelcu
www.bionicspirit.com
Reply all
Reply to author
Forward
0 new messages