multithreaded cache?

bugbear

unread,

May 15, 2012, 5:14:01 AM5/15/12

to

I'm using various Apache Commons Maps as a
multithread cache, protected using
ReentrantReadWriteLock, so that getting() uses a read lock,
and putting() uses a write lock.

But I've got an issue; in the
case of a cache miss (protected by a read lock),
the required value is acquired using the "underlying function"
that the cache is over; this value is then put() into
the cache (protected by a write lock)

This is all perfectly thread safe, and gives
correct results.

However, if the underlying function is slow
and/or resource hungry (consider cacheing
a ray traced image!) many threads can
end up calling the real function (second
and subsequent threads to the first get a miss
during the first threads call to the underlying function).

"obviously..." what I want is for only
the thread that FIRST has a cache miss
calls the underlying function, whilst other
threads (for the same key) wait.

This seems "obvious" enough that I'm guessing
there's a standard solution.

Googling led me to several "heavy" libraries;

This appears more a locking/cacheing issue
than a Java issue, although my implementation
is in Java.

Can anyone (please) point me at a
canonical solution/implementation?

BugBear

Silvio Bierman

unread,

May 15, 2012, 5:41:06 AM5/15/12

to

The simplest approach is to wrap your cached values in an object and
make all cache access go through two stages.

Stage one is to lookup the wrapper in the cache, possibly inserting a
new (empty) wrapper if not present.

Stage two is to get a lock on the wrapper to get the actual cache value.
If that is not present (or perhaps stale) then the thread owing the lock
on the wrapper can recompute the value.

If you cache values that are slow to compute this is usually the way to go.

Eric Sosman

unread,

May 15, 2012, 8:22:16 AM5/15/12

to

Don't know whether it's "canonical," but one fairly obvious
approach is to insert the missed key in the map immediately, but
with a value that means "Coming Soon To A Cache Near You." The
original thread can then release its read lock and fetch the true
value from its source.

Other threads that look for the same value will "hit" the
cache, but will recognize the "Coming Soon" as an indication to
wait until the first thread comes up with the value.

There must be many ways to arrange this, but one that requires
very little awareness on the part of the querying threads is to
make the hashed "value" a value-holder object. Then a querying
thread could do something like

mapLock.lockForReading();
ValueHolder holder = map.get(key);
mapLock.release();

if (holder == null) {
mapLock.lockForWriting();
holder = map.get(key); // inserted by other thread?
if (holder == null) {
holder = new ValueHolder(key);
map.put(key, holder);
}
mapLock.release();
}

Value = holder.getValue();

The ValueHolder class might look like

class ValueHolder {
private final Key key;
private Value cachedValue;
private SourceException fetchError;

ValueHolder(Key key) {
this.key = key;
}

synchronized
Value getValue() throws SourceException {
if (cachedValue == null) {
if (fetchError != null) {
// notify new thread of old error
throw fetchError;
}
try {
cachedValue = fetchValueFromSource(key);
} catch (SourceException ex) {
fetchError = ex;
throw fetchError;
}
}
return cachedValue;
}
}

(That's not the only way to deal with errors in fetching, but it
seems perfectly reasonable to me even if it's a little odd to throw
the same exception object multiple times.)

The cache-querying threads lock the map only briefly, long
enough to determine whether a ValueHolder is present. ValueHolder
can delay its callers for a long time (if not, why cache at all?),
but will only delay callers that are interested in that particular
key; callers looking for other keys will get different ValueHolder
instances and won't be interfered with. And only one thread will
actually attempt fetchValueFromSource() on any given key.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Daniel Pitts

unread,

May 15, 2012, 12:56:39 PM5/15/12

to

What I think I would do in this situation is have the "write" operation
be fast. It would put in a "FutureTask<V>" object, which hasn't been run
yet.

The FutureTask<V> class would basically be a "lazy loader". The reason
to do this is to be able to release the lock on the Map, while still
blocking the return of the get until the value is ready.

public V get(K key) throw ExecutionException, InterruptedException {
readLock.lockInterruptibly();
try {
final Future<V> future = map.get(key);
if (future != null) { return future.getValue(); }
} finally {
readLock.unlock();
}
final FutureTask<V> future;

writeLock.lockInterruptibly();
try {
// We need to double check, to make sure no one else
// has added the future.
final FutureTask<V> cachedFuture = map.get(key);
if (cachedFuture != null) {
future = cachedFuture;
} else {
future = new FutureTask<V>(new
UnderlyingFunctionCallable<V>(key));
map.put(key, future);
}

} finally {
writeLock.unlock();
}
future.run();
return future.getValue();
}

Note, since the "get" and "put" operations should be fairly fast, a
read/write lock may be over-kill, and the whole thing could be simplified:

public V get(K key) throw ExecutionException, InterruptedException {
synchronize(mySync) {
final FutureTask<V> cachedFuture = map.get(key);
if (cachedFuture != null) {
future = cachedFuture;
} else {
future = new FutureTask<V>(new
UnderlyingFunctionCallable<V>(key));
map.put(key, future);
}

}
future.run();
return future.getValue();
}

This basically gives you "per key" synchronization, with the "whole map"
synch being only for an O(1) operation of "check for key, add place-hold
if absent".

Roedy Green

unread,

May 15, 2012, 6:57:14 PM5/15/12

to

On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman <sil...@moc.com>
wrote, quoted or indirectly quoted someone who said :

>The simplest approach is to wrap your cached values in an object and
>make all cache access go through two stages.

Where did you learn this?
--
Roedy Green Canadian Mind Products
http://mindprod.com
Programmers love to create simplified replacements for HTML.
They forget that the simplest language is the one you
already know. They also forget that their simple little
markup language will bit by bit become even more convoluted
and complicated than HTML because of the unplanned way it grows.
.

Daniel Pitts

unread,

May 15, 2012, 7:13:27 PM5/15/12

to

On 5/15/12 3:57 PM, Roedy Green wrote:
> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<sil...@moc.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> The simplest approach is to wrap your cached values in an object and
>> make all cache access go through two stages.
>
> Where did you learn this?

I think he meant "an easy efficient approach", rather than "the simplest".

The simplest of course is a global lock for the cache.

Silvio Bierman

unread,

May 15, 2012, 7:19:07 PM5/15/12

to

On 05/16/2012 12:57 AM, Roedy Green wrote:
> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<sil...@moc.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> The simplest approach is to wrap your cached values in an object and
>> make all cache access go through two stages.
>
> Where did you learn this?

Why do you ask? Do you disagree?

It is something I came up with the first time I encountered the same
situation, probably many years ago.
It is obvious you need to decouple key lookups/insertions, which are
inherently cache-global, from value insertions/updates, which could be
considered key-local.

Eric and Daniel responded with more specific solutions that both match
my general pattern.

Silvio Bierman

unread,

May 15, 2012, 7:22:23 PM5/15/12

to

Yep, with "simplest" I meant the simplest way to get around the
mentioned disadvantage of a global lock.

Eric Sosman

unread,

May 15, 2012, 8:26:45 PM5/15/12

to

Three independent votes for one pattern: What we have here is a

M A N D A T E !!!

--
Eric Sosman
eso...@ieee-dot-org.invalid

Arved Sandstrom

unread,

May 15, 2012, 8:41:06 PM5/15/12

to

Dunno about that, but I did go out and buy a lottery ticket...

AHS
--
Never interrupt your enemy when he is making a mistake.
--Napoleon

Silvio Bierman

unread,

May 16, 2012, 3:26:09 AM5/16/12

to

Let's hope this is inspires the Greeks...

bugbear

unread,

May 16, 2012, 9:33:32 AM5/16/12

to

bugbear wrote:
>
> Can anyone (please) point me at a
> canonical solution/implementation?

The answer is "yes" !!

Thanks to the members for some
great thoughts, analysis
and reasoned discussion.

BugBear

Roedy Green

unread,

May 17, 2012, 1:03:08 AM5/17/12

to

On Wed, 16 May 2012 01:19:07 +0200, Silvio Bierman <sil...@moc.com>

wrote, quoted or indirectly quoted someone who said :

>

>Why do you ask? Do you disagree?

No. I thought there were two possibilities. There is a some great
book you read or website that I should reference in the Java glossary,
or that you figured this out by exhaustive experiment, in which case I
should consider worship.

--
Roedy Green Canadian Mind Products
http://mindprod.com

Plants" with "leaves" no more efficient than today’s solar cells
could out-compete real plants, crowding the biosphere with an
inedible foliage. Tough omnivorous "bacteria" could out-compete
real bacteria: They could spread like blowing pollen, replicate
swiftly, and reduce the biosphere to dust in a matter of days.
Dangerous replicators could easily be too tough, small, and
rapidly spreading to stop -- at least if we make no preparation.
We have trouble enough controlling viruses and fruit flies.
~ Eric Drexler (born: 1955-04-25 age: 57)
Engines of Creation: The Coming Era of Nanotechnology.
.

Robert Klemme

unread,

May 17, 2012, 5:54:22 AM5/17/12

to

On 05/15/2012 11:14 AM, bugbear wrote:

> However, if the underlying function is slow
> and/or resource hungry (consider cacheing
> a ray traced image!) many threads can
> end up calling the real function (second
> and subsequent threads to the first get a miss
> during the first threads call to the underlying function).
>
> "obviously..." what I want is for only
> the thread that FIRST has a cache miss
> calls the underlying function, whilst other
> threads (for the same key) wait.

I provide a variant of Silvio's, Eric's and Daniel's solution which
should yield higher throughput because it works without read write
locking. You can find it as gist in case the code is garbled in the
newsgroup posting:
https://gist.github.com/2717818

Kind regards

robert

The code (untested):

package clj;

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

/**
* The cache works with as few locking as possible. Lookup is done in
two steps
* on cache miss:
* <ol>
* <li>On a cache miss a retriever is inserted into the cache which
will obtain
* the value synchronized from a {@link Calculator}.</li>
* <li>Once calculation has finished a simple lock free reference to
the value
* replaces the retriever in the cache and the value is returned.</li>
* </ol>
*
* @author robert klemme
*
* @param <K>
* key type
* @param <V>
* value type
*/
public final class LazyCache<K, V> {
/**
* Calculate values from given keys.
*
* @param <K>
* key type
* @param <V>
* value type
*/
public interface Calculator<K, V> {
V get(K key);
}

/**
* Obtain a value.
*
* @param <V>
* value type.
*/
private interface Reference<V> {
V get();
}

/**
* Stupid simple reference which only hands out a fixed value all
the time
* without synchronization.
*
* @param <V>
* value type.
*/
private static final class Ref<V> implements Reference<V> {
private final V val;

public Ref(V val) {
this.val = val;
}

@Override
public V get() {
return val;
}
}

/** Mapping from keys to objects which yield values. */
private final ConcurrentMap<K, Reference<V>> map = new
ConcurrentHashMap<K, Reference<V>>();

/** User provided. */
private final Calculator<K, V> calc;

/**
* Create a cache.
*
* @param calc
* user must provide a reasonable implementation, not
* <code>null</code>.
*/
public LazyCache(final Calculator<K, V> calc) {
if (calc == null)
throw new NullPointerException();
this.calc = calc;
}

/**
* Get a value from the cache. The value might have to be calculated.
*
* @param key
* lookup key.
* @return value, might even be <code>null</code> depending on
algorithm.
*/
public V get(final K key) {
Reference<V> ref = map.get(key);

if (ref == null) {
// miss
ref = new Reference<V>() {
@Override
public synchronized V get() {
final V val = calc.get(key);
// next time lock free access:
Reference<V> x = map.put(key, new Ref<V>(val));
assert x == this;
return val;
}
};

final Reference<V> old = map.putIfAbsent(key, ref);

if (old != null)
ref = old; // someone else was faster
}

return ref.get();
}
}

Robert Klemme

unread,

May 17, 2012, 6:06:34 AM5/17/12

to

On 05/17/2012 11:54 AM, Robert Klemme wrote:
> On 05/15/2012 11:14 AM, bugbear wrote:
>> However, if the underlying function is slow
>> and/or resource hungry (consider cacheing
>> a ray traced image!) many threads can
>> end up calling the real function (second
>> and subsequent threads to the first get a miss
>> during the first threads call to the underlying function).
>>
>> "obviously..." what I want is for only
>> the thread that FIRST has a cache miss
>> calls the underlying function, whilst other
>> threads (for the same key) wait.
>
> I provide a variant of Silvio's, Eric's and Daniel's solution which
> should yield higher throughput because it works without read write
> locking. You can find it as gist in case the code is garbled in the
> newsgroup posting:
> https://gist.github.com/2717818

There was one important detail missing. This is the corrected code
(gist is updated as well):

private V val;
private boolean here = false; // superfluous but explicit

@Override
public synchronized V get() {

if (!here) {
val = calc.get(key);
here = true;

Daniel Pitts

unread,

May 17, 2012, 12:51:38 PM5/17/12

to

On 5/17/12 3:06 AM, Robert Klemme wrote:
> On 05/17/2012 11:54 AM, Robert Klemme wrote:
>> On 05/15/2012 11:14 AM, bugbear wrote:
>>> However, if the underlying function is slow
>>> and/or resource hungry (consider cacheing
>>> a ray traced image!) many threads can
>>> end up calling the real function (second
>>> and subsequent threads to the first get a miss
>>> during the first threads call to the underlying function).
>>>
>>> "obviously..." what I want is for only
>>> the thread that FIRST has a cache miss
>>> calls the underlying function, whilst other
>>> threads (for the same key) wait.
>>
>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>> should yield higher throughput because it works without read write
>> locking. You can find it as gist in case the code is garbled in the
>> newsgroup posting:
>> https://gist.github.com/2717818
>
> There was one important detail missing. This is the corrected code (gist
> is updated as well):

[snip]

> if (!here) {
> val = calc.get(key);
> here = true;
> // next time lock free access:
> Reference<V> x = map.put(key, new Ref<V>(val));
> assert x == this;
> }

This assert (while true) is unnecessary and potentially harmful.
Nothing becomes inconsistent if the value you replace is not the
calculating reference. However, your assert may break at runtime if
something else changes about this code. Assert should be used to
validate consistency.

I like this class, though I have a couple of style comments.

I personally would name "Ref" as "ConstantReference", and your anonymous
Reference implementation I would make a private static class
CalculatingReference.

In my own personal library of utilities, I actually have two interfaces
"Factory<T>" with a "T create()" method and "ParameterizedFactory<T,P>"
with a "T create(P parameter)". Including a few interesting
implementations of those. In my case, I would probably implement this
cache as a ParameterizedFactory impl that takes another
ParameterizedFactory as the "calc" field.

Robert Klemme

unread,

May 17, 2012, 2:48:52 PM5/17/12

to

The algorithm works under the assumption that the first element placed
into the map is the one doing the calculation and only after that the
constant reference is inserted. If that is not the case (e.g. because
another calculator is put into the map) then obviously something must be
wrong and we might even have more than one lengthy calculation of the
cached value. This assumption is checked with the assert which also
serves as documentation.

> However, your assert may break at runtime if something else
> changes about this code. Assert should be used to validate consistency.

Well, if the logic is changed then of course asserts need to change,
too. There is nothing strange or harmful about that. With your
argument no asserts would make sense because they would need to be
changed if a class is changed. Note that internal class invariants can
change without changing the observable invariant. So even though a
class "looks" the same and behaves the same from the outside asserts
might have to be changed if internal logic of the class changes.

> I like this class, though I have a couple of style comments.

Thank you!

> I personally would name "Ref" as "ConstantReference",

Well, I was lazy typing and the class isn't public anyway. This is not
production code but an illustrating example for cljp.

> and your anonymous
> Reference implementation I would make a private static class
> CalculatingReference.

Why? You would need to either pass in a reference to LazyCache or pass
map and calc - plus declaring a constructor and type parameters. In
this case I don't really see the benefit.

> In my own personal library of utilities, I actually have two interfaces
> "Factory<T>" with a "T create()" method and "ParameterizedFactory<T,P>"
> with a "T create(P parameter)". Including a few interesting
> implementations of those. In my case, I would probably implement this
> cache as a ParameterizedFactory impl that takes another
> ParameterizedFactory as the "calc" field.

That could be done. But then method naming could be considered
misleading because the create() of the cache isn't really a create in
all cases.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Silvio Bierman

unread,

May 18, 2012, 2:42:26 PM5/18/12

to

On 05/17/2012 11:54 AM, Robert Klemme wrote:

I think you have as many locks as I suggested (being one)? My initial
implementations of something like this used a plain map with an extra
lock but later cases used the by then available ConcurrentHashMap as
well, making one lock redundant.

Silvio

Robert Klemme

unread,

May 18, 2012, 5:45:30 PM5/18/12

to

On 18.05.2012 20:42, Silvio Bierman wrote:
> On 05/17/2012 11:54 AM, Robert Klemme wrote:

>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>> should yield higher throughput because it works without read write
>> locking. You can find it as gist in case the code is garbled in the
>> newsgroup posting:
>> https://gist.github.com/2717818

> I think you have as many locks as I suggested (being one)? My initial
> implementations of something like this used a plain map with an extra
> lock but later cases used the by then available ConcurrentHashMap as
> well, making one lock redundant.

You didn't show it here, did you? I can's seem to find it in the
thread. Note that CHM does also do synchronization. I am not sure from
your statement what exact locking scheme you apply. There does seem to
be one difference though: I my version the second lock goes away after
the value has been computed so there is only the sync of CHM left.

Eric Sosman

unread,

May 18, 2012, 6:31:10 PM5/18/12

to

On 5/18/2012 5:45 PM, Robert Klemme wrote:
> On 18.05.2012 20:42, Silvio Bierman wrote:
>> On 05/17/2012 11:54 AM, Robert Klemme wrote:
>
>>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>>> should yield higher throughput because it works without read write
>>> locking. You can find it as gist in case the code is garbled in the
>>> newsgroup posting:
>>> https://gist.github.com/2717818
>
>> I think you have as many locks as I suggested (being one)? My initial
>> implementations of something like this used a plain map with an extra
>> lock but later cases used the by then available ConcurrentHashMap as
>> well, making one lock redundant.
>
> You didn't show it here, did you? I can's seem to find it in the thread.
> Note that CHM does also do synchronization. I am not sure from your
> statement what exact locking scheme you apply. There does seem to be one
> difference though: I my version the second lock goes away after the
> value has been computed so there is only the sync of CHM left.

It seems to me that if N threads query the same key at about
the same time, they may all miss the map and go off to perform
the slow computation. If "slow" is large compared to the cost of
a lock-release pair (and if it weren't, why cache?), the tradeoff
seems suspect.

Also, different threads may wind up using different value
instances. If the cache is purely a convenience for a value-only
object that may be all right, but it's not all right if the values
are supposed to be singletons.

Finally, there's more than a whiff of the double-checked locking
antipattern about what you're doing with the `here' flag. I'm not
absolutely sure what you've got is in fact DCL (hence, wrong), but
I'm also not absolutely sure it's DCL-free. Before using the pattern
in any important way, I'd want to check with a major-league guru,
just as "due diligence."

--
Eric Sosman
eso...@ieee-dot-org.invalid

Daniel Pitts

unread,

May 19, 2012, 1:15:19 AM5/19/12

to

On 5/18/12 3:31 PM, Eric Sosman wrote:
> On 5/18/2012 5:45 PM, Robert Klemme wrote:
>> On 18.05.2012 20:42, Silvio Bierman wrote:
>>> On 05/17/2012 11:54 AM, Robert Klemme wrote:
>>
>>>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>>>> should yield higher throughput because it works without read write
>>>> locking. You can find it as gist in case the code is garbled in the
>>>> newsgroup posting:
>>>> https://gist.github.com/2717818
>>
>>> I think you have as many locks as I suggested (being one)? My initial
>>> implementations of something like this used a plain map with an extra
>>> lock but later cases used the by then available ConcurrentHashMap as
>>> well, making one lock redundant.
>>
>> You didn't show it here, did you? I can's seem to find it in the thread.
>> Note that CHM does also do synchronization. I am not sure from your
>> statement what exact locking scheme you apply. There does seem to be one
>> difference though: I my version the second lock goes away after the
>> value has been computed so there is only the sync of CHM left.
>
> It seems to me that if N threads query the same key at about
> the same time, they may all miss the map and go off to perform
> the slow computation.

Nope, they will all create a "Reference" object that *can* perform the
calculation, however because the method uses "putIfAbsent", only the
first calculating "Reference" object is actually ever used.

After that, the Reference.get() method will block until that particular
calculation is complete. Once that calculation is done, the calculating
Reference will replace itself with a constant Reference for future
accesses to use.

> If "slow" is large compared to the cost of
> a lock-release pair (and if it weren't, why cache?), the tradeoff
> seems suspect.

It would be suspect, but there isn't a tradeoff here. All threads lock
until one value is calculated, basically coalescing the requests for
that value.

>
> Also, different threads may wind up using different value
> instances. If the cache is purely a convenience for a value-only
> object that may be all right, but it's not all right if the values
> are supposed to be singletons.

Again, invalid if you recognize what is actually happening.

>
> Finally, there's more than a whiff of the double-checked locking
> antipattern about what you're doing with the `here' flag. I'm not
> absolutely sure what you've got is in fact DCL (hence, wrong), but
> I'm also not absolutely sure it's DCL-free. Before using the pattern
> in any important way, I'd want to check with a major-league guru,
> just as "due diligence."
>

"double-checked" locking is only broken if you don't have the
appropriate memory semantics enforced elsewhere. The reason the
"classical" double-check locking fails is because there could be a
data-race condition, where the reference is not null, but the object
state hasn't been flushed between threads.

In this case, the "first-check" of obtaining a value from the
ConcurrentMap causes happens-before relationships, and causes the object
state to flush.

Robert Klemme's code is correct, according to everything I've read in
JCIP and elsewhere.

Cheers,
Daniel.

Robert Klemme

unread,

May 19, 2012, 6:33:23 AM5/19/12

to

There is no double checked locking going on - neither in LazyCache
itself nor in the retriever instance.
http://en.wikipedia.org/wiki/Double-checked_locking

Daniel has explained all the details already. I'd just add that
creation of multiple retriever objects is the price you pay for
increased concurrency due to CHM. But only one of them is ever going to
be used per key. (Documenting this is one of the tasks of the assert
Daniel questioned earlier.)

I think rw-locking is an inferior technique compared to CHM in this case
because in case of cache miss you cannot escalate a r-lock to a w-lock
(to avoid deadlocks) and hence need to release the r-lock, acquire the
w-lock, *and check again*. In other words you have two more
synchronization points with memory barriers, scheduling effects etc.

In case of cache hit you might still suffer throughput because of thread
starvation caused by all lookups for values missing from the cache might
be blocked for indefinite time when using an unfair rw-lock. Using a
fair rw-lock on the other hand is more expensive and still has the
downside of a global lock which affects the complete range of keys.

CHM improves this by partitioning value space and only lock individual
partitions for atomic operations. These partitions are completely
independent from a locking perspective. That's the main advantage of CHM.

Eric Sosman

unread,

May 19, 2012, 7:09:31 AM5/19/12

to

On 5/19/2012 1:15 AM, Daniel Pitts wrote:
> On 5/18/12 3:31 PM, Eric Sosman wrote:
>> On 5/18/2012 5:45 PM, Robert Klemme wrote:

>>>[...]

>>> You didn't show it here, did you? I can's seem to find it in the thread.
>>> Note that CHM does also do synchronization. I am not sure from your
>>> statement what exact locking scheme you apply. There does seem to be one
>>> difference though: I my version the second lock goes away after the
>>> value has been computed so there is only the sync of CHM left.
>>
>> It seems to me that if N threads query the same key at about
>> the same time, they may all miss the map and go off to perform
>> the slow computation.
> Nope, they will all create a "Reference" object that *can* perform the
> calculation, however because the method uses "putIfAbsent", only the
> first calculating "Reference" object is actually ever used.

> [...]

Re-reading, I think you're right. "Never mind," as Emily
Litella used to say.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Lew

unread,

May 19, 2012, 5:24:43 PM5/19/12

to

Robert Klemme wrote:
> Eric Sosman wrote:
>> Robert Klemme wrote:

First off, thank you for the very professional and elegant code.

I shall study it frequently.

I had experience with CHM on an Enterprise Java project a few years back that
involved processing documents up to 1GB or so at millions of
documents per hour.

As you can imagine, concurrency was a consideration there, but due to
bureaucracy was not properly managed for a few years. I was hired around the
time they started to pay attention to such issues.

The code involved a properly but naively synchronized Map at one point.
Detailed profiling revealed that lock contention for the Map was the number
one throughput chokepoint in the whole system. Even above database concurrency
and I/O.

Boy howdy, the pundits are right to recommend hard measurement.

Lock contention has a cascade effect. In modern JVMs, like IBM's
mainframe-level ones that we used, uncontended locks process quite quickly.
Contention introduces roadblocks that delay threads, allowing more to queue
up, causing more contention, slowing things down still more, causing more,
yada. It only takes one skunk in the middle of the main road to completely tie
up rush hour everywhere.

CHM by default partitions lock space (though I'm not clear it uses locks
exactly) into sixteen independent slices. This meant far more than sixteen
times faster for us. Writes tend to happen in a solitary thread without a
whole lot of fight while reads run like greased pigs through the other
fifteen. With our mix of reads and writes, and transaction volume, CHM pretty
near eliminated lock contention. YMMV, as always, but in this case that
chokepoint went from number one to off the list.

It was still the wrong solution, since a simple, effortless, non-concurrent
better one that would also have eliminated a raft of other problems was
available, but had no political traction. However, good enough to eliminate
the throughput impact was good enough, so I didn't raise a fuss when they
decided against it.

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

Sebastian

unread,

May 19, 2012, 7:35:30 PM5/19/12

to

Am 17.05.2012 12:06, schrieb Robert Klemme:
> On 05/17/2012 11:54 AM, Robert Klemme wrote:
>> On 05/15/2012 11:14 AM, bugbear wrote:
>>> However, if the underlying function is slow
>>> and/or resource hungry (consider cacheing
>>> a ray traced image!) many threads can
>>> end up calling the real function (second
>>> and subsequent threads to the first get a miss
>>> during the first threads call to the underlying function).
>>>
>>> "obviously..." what I want is for only
>>> the thread that FIRST has a cache miss
>>> calls the underlying function, whilst other
>>> threads (for the same key) wait.
>>
>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>> should yield higher throughput because it works without read write
>> locking. You can find it as gist in case the code is garbled in the
>> newsgroup posting:
>> https://gist.github.com/2717818
>
> There was one important detail missing. This is the corrected code (gist
> is updated as well):
>

[snip]
The important detail obiously is the extra check with the "here" flag.
Would you mind explaining why that is necessary? Everyone seems to have
got it, but I must admit I haven't. Why is it not enough that
Reference#get() is synchronized? As you yourself have missed this at
your first attempt, I hope the reason isn't trivial...
-- Sebastian

Sebastian

unread,

May 19, 2012, 7:48:04 PM5/19/12

to

I think I've got it - the check prevents double calculation of the
value by two threads calling get() on that same reference instance.
Which may occur if putIfAbsent() returns a non-null value before the
"calculating reference" has had time to replace itself with the
constant reference. Is that right? -- Sebastian

Lew

unread,

May 19, 2012, 9:11:29 PM5/19/12

to

Sebastian wrote:
> schrieb Sebastian:

>> The important detail obiously is the extra check with the "here" flag.
>> Would you mind explaining why that is necessary? Everyone seems to have
>> got it, but I must admit I haven't. Why is it not enough that
>> Reference#get() is synchronized? As you yourself have missed this at
>> your first attempt, I hope the reason isn't trivial...
>>

> I think I've got it - the check prevents double calculation of the value by
> two threads calling get() on that same reference instance.
> Which may occur if putIfAbsent() returns a non-null value before the
> "calculating reference" has had time to replace itself with the
> constant reference. Is that right? -- Sebastian

Yep.

https://gist.github.com/2717818

============
LazyCache
============

public V get(final K key) {
Reference<V> ref = map.get(key);

if (ref == null) {
// miss
ref = new Reference<V>() {
private V val;
private boolean here = false; // superfluous but explicit

@Override
public synchronized V get() {
if (!here) {
val = calc.get(key);
here = true;
// next time lock free access:
Reference<V> x = map.put(key, new Ref<V>(val));
assert x == this;
}

return val;
}
};

final Reference<V> old = map.putIfAbsent(key, ref);

if (old != null)
ref = old; // someone else was faster
}

return ref.get();
}

============

The first cluster of requestors to 'LazyCache#get(K key)' enter the
'if (ref == null) {' block.

They all construct a new anonymous-type instance of 'Reference<V>'.

The first one to hit 'putIfAbsent()' wins. The rest replace their
painstakingly - actually very, very quickly - contructed anonymous-type
'Reference<V>' instances with the anonymous-type instance the first client put
into the 'Map'.

So only one 'Reference<V>' comes back from all the clients that first hit the
map and saw 'null'.

It is of the anonymous type.

The purpose of the 'here' variable is for the anonymous-type instance. That
one-and-only possible instance can tell that it has never been 'here' before,
i.e., inside the 'get()' method.

The next batch of 'LazyCache' clients hit the map when that anonymous-type
instance is still in there.

They all call 'get()' on that anonymous-type instance. First one sees 'here'
is still false. Up until now, no one has called the expensive operation
inside of 'calc'. No matter how many threads hit the map when the value was
'null', and no matter how many of them hit it when the map contained the
anonymous-type instance, all those threads have to wait until that one and
only instance calls 'get()' in turn, and the first one of that whole cluster
is the only one to see '!here'.

It calls the expensive operation and replaces its own internal value with the
newly-calculated one.

In addition, it replaces itself in the map with a simple, final-value holder.

All the threads currently blocked on the anonymous-type instance 'get()' will
see the stashed value.

None of them will call the expensive operation again.

Any clients thereafter will hit the 'LazyCache' map and find only the simple,
final-value named-type instance, whose 'get()' returns the pre-calculated
value in a completely thread-safe manner without any more locking.

Once the named-type instance is in the map, and all threads that found the
anonymous one finish their 'get()' calls, the anonymous-type instance is
eligible for garbage collection, thus letting go of an extra reference to the
calculator. I've chased that rabbit down the hole a bit without finding a real
danger to the circular reference in this scenario, but it's never bad to break
reference-graph cycles. They are often a major bug breeding ground, especially
on resource managers and types with non-trivial 'finalize()' methods.

Again, while here the lifetime of the calculator is that of the 'LazyCache'
instance, it's good practice to decouple value clients from their producers'
finalizers. It simplifies memory management. See Josh Bloch's /Effective Java/
for details on finalizers.

You get tremendous results from the twin engines of scope and lifetime management.

Silvio Bierman

unread,

May 21, 2012, 9:21:46 AM5/21/12

to

Actually, I didn't, and I did mention the word "lock" alongside the
wrapper object.

Since the only code I have at hand is a Scala version there was little
point in posting it. It uses a Scala "lazy val" in the wrapper object
which translates to a compiler generated DCL (assuming 1.5+). So the
only lock that would remain would be the CHM one.

I vaguely remember a DCL in one of the more recent Java versions I
coughed up.

Silvio Bierman

unread,

May 21, 2012, 9:25:11 AM5/21/12

to

On 05/17/2012 07:03 AM, Roedy Green wrote:
> On Wed, 16 May 2012 01:19:07 +0200, Silvio Bierman<sil...@moc.com>
> wrote, quoted or indirectly quoted someone who said :
>
>>
>> Why do you ask? Do you disagree?
>
> No. I thought there were two possibilities. There is a some great
> book you read or website that I should reference in the Java glossary,
> or that you figured this out by exhaustive experiment, in which case I
> should consider worship.

Actually, exhaustive experiment would be a huge overstatement. It was
the second thing that came up after I immediately discarded the first
impulse of a global lock.

Some people would call that premature optimization...

Robert Klemme

unread,

May 21, 2012, 3:15:57 PM5/21/12

to

On 19.05.2012 23:24, Lew wrote:

> First off, thank you for the very professional and elegant code.
>
> I shall study it frequently.

Thank you!

Thanks for sharing that story. What I find amazing about this is that
what you did isn't exactly rocket science and yet they didn't do it
before. You would guess that it's just what every engineer would do but
no: something prevents this from happening.

And it is true for a number of other techniques I would consider bread
and butter tools:

- Ensuring requirements are gathered properly and understood before
starting to code (and design of course).

- Testing code _before_ shipping.

- When writing unit tests, making sure to also include tests for
critical values (usually corner cases such as -1, 0, 1, limit, limit -
1, limit + 1, null, "", etc.).

- Thinking about the person who must use what you produce, regardless
whether it's a document, a configuration file layout, a DSL, a class, a
library. It seems many people in software development are far more
concerned with the inner workings of what they create instead of
considering how it will be used. Maybe it's easier or it is because
making it work takes the largest part of coding - still the outcome
often is dissatisfying and a little more thought upfront goes a long way
at avoiding maintenance headaches and worse things.

...

This can't be so difficult, can it?

Lew

unread,

May 24, 2012, 2:11:20 AM5/24/12

to

Robert Klemme wrote:
> Thanks for sharing that story. What I find amazing about this is that what you
> did isn't exactly rocket science and yet they didn't do it before. You would
> guess that it's just what every engineer would do but no: something prevents
> this from happening.

It was worse.

Turns out the resource acquisition managed by the inter-thread map coulda
woulda shoulda been handled better by an object instantiated with a simple new
..() call.

The returned reference and access to the object would perforce have been
limited to the instantiating thread. Any other thread woulda shoulda coulda
had its own independent resource, object, reference, whatever. Sharing was not
needed.

Loosely speaking -

Desirable desire = getFromMap(FANCY_CAR);
// map is shared, with fancy-schmancy synch

vs.

Desirable desire = new FancyCar();
// Map? Meshugge! I'm not sharing!

> And it is true for a number of other techniques I would consider bread and
> butter tools:
>
> - Ensuring requirements are gathered properly and understood before starting
> to code (and design of course).

True, with provisos.

= Requirements are not fixed. What you should have now is the best available
for now.

= "Properly"? "Understood"? There are books on this, but it boils down to:

Your software will be used. By whom? Why? What would annoy them? Don't do
that! What would make them forget the software is there and just get their job
done? Do that!

"Proper" is always from the user's standpoint. Not what they say, by the way,
but what they think and feel - really. Behind the bullshit and self-deception,
when they aren't posturing for your benefit and really are just trying to get
their work done. And that's where "understood" comes in. Give'em what they
want, not necessarily what they ask for.

> - Testing code _before_ shipping.

Some espouse before even writing it. I say, sure, great idea, when you make it
easy to express my ideas in tests instead of imperative code.

Hmm.

BTW, this being Java, have you guys tried EasyMock? (AndroidMock in Android
Java.) That shtuff rocks. It makes your very code better, not just tested but
idiomatically. Tricky stuff. I'll post about that when I get back into it.
I've discovered a Right Way and a Wrong Way to do mocks, just like I did with JPA.

> - When writing unit tests, making sure to also include tests for critical
> values (usually corner cases such as -1, 0, 1, limit, limit - 1, limit + 1,
> null, "", etc.).

Consider this. No matter what input a module receives, it must achieve a valid
program state in response.

No matter what.

Coding to this formalism means, for example, that a method can handle all
possible argument values.

public interface Caromer
{
public Fromitz carom(Flamo flame);
}

As Herr Klemme points out, 'null' is one possible input for 'flame'.
Non-'null' is another. Presumably a 'Flamo' has observable state that
distinguishes input values. If the type were 'Integer' or 'int', Robert's
example int values might pertain.

But you cannot give infinite test cases to a unit test. So you pick edge
cases, on both the "happy path" (positive) and "unhappy path" (negative)
scenarios. Consider this:

public interface Quadrupler
{
public int quadruple(int val);
}

What if it were called with an auto-unboxed Integer?

public int someClient(Integer val)
{
Quadrupler quadrupler = QuadFactory.get();
Integer quadded = quadrupler.quadruple(val);

}

> - Thinking about the person who must use what you produce, regardless whether
> it's a document, a configuration file layout, a DSL, a class, a library. It
> seems many people in software development are far more concerned with the
> inner workings of what they create instead of considering how it will be used.
> Maybe it's easier or it is because making it work takes the largest part of
> coding - still the outcome often is dissatisfying and a little more thought
> upfront goes a long way at avoiding maintenance headaches and worse things.
>
> ...
>
> This can't be so difficult, can it?

You have a fine sense of humor.

It's as easy as empathy, and getting inside the mind of a professional in a
field of which you know little to nothing.

Testing certainly helps. Functional tests (as opposed to unit tests) can and
should be part of and produced from requirements. At the user-facing level you
can send test scenarios back as memos of understanding (MoUs) - what better
way to make clear what you understand and how it will play out?

Prototypes count, too. They can meet tests, even. Building a prototype that
meets tests, starting within a day or so of starting, kicks ass for converging
on clarity. Keep it up - every day the prototype runs, works (up to whatever
you've done that day), can be checked out and run in any directory, by any
developer, and run, etc.

Add a CI tool like Jenkins to run your tests every time you "git commit" to
"origin/master" (or whatever); you got yourself a one-entity professional shop.

<http://www.junit.org/>
<http://www.easymock.org/>
<http://code.google.com/p/android-mock/>
<http://jenkins-ci.org/>

Robert Klemme

unread,

May 24, 2012, 5:05:19 PM5/24/12

to

On 24.05.2012 08:11, Lew wrote:

> Robert Klemme wrote:

>> And it is true for a number of other techniques I would consider bread
>> and
>> butter tools:
>>
>> - Ensuring requirements are gathered properly and understood before
>> starting
>> to code (and design of course).
>
> True, with provisos.
>
> = Requirements are not fixed. What you should have now is the best
> available for now.

But there is a minimum standard: there must be enough information to at
least get an idea of what is to be done. I have seen three sentence
requirements for a feature in a complex system which left engineers
wondering what they are supposed to do.

> = "Properly"? "Understood"? There are books on this, but it boils down to:
>
> Your software will be used. By whom? Why? What would annoy them? Don't
> do that! What would make them forget the software is there and just get
> their job done? Do that!

!

> "Proper" is always from the user's standpoint. Not what they say, by the
> way, but what they think and feel - really. Behind the bullshit and
> self-deception, when they aren't posturing for your benefit and really
> are just trying to get their work done. And that's where "understood"
> comes in. Give'em what they want, not necessarily what they ask for.

Pretty early in my career I had a quite instructive experience: we were
building a web catalog for a business customer with user management, a
lot of technical data about their products - and of course a link to
their ERP system. They wanted to have a price calculation in that
system which could take several routes depending on the user. They
explained it, I wrote a document which also included an UML activity
diagram of the flow with explanations. We sent it in. They rejected:
"No, it must be completely different." So we sat together. End of the
story: it was exactly what they wanted. So they either did not
understood the document and didn't want to ask - or their own
requirements. :-)

>> - Testing code _before_ shipping.
>
> Some espouse before even writing it. I say, sure, great idea, when you
> make it easy to express my ideas in tests instead of imperative code.
>
> Hmm.
>
> BTW, this being Java, have you guys tried EasyMock? (AndroidMock in
> Android Java.) That shtuff rocks. It makes your very code better, not
> just tested but idiomatically. Tricky stuff. I'll post about that when I
> get back into it. I've discovered a Right Way and a Wrong Way to do
> mocks, just like I did with JPA.

Not yet, thank you for the pointer!

>> - When writing unit tests, making sure to also include tests for critical
>> values (usually corner cases such as -1, 0, 1, limit, limit - 1, limit
>> + 1,
>> null, "", etc.).
>
> Consider this. No matter what input a module receives, it must achieve a
> valid program state in response.
>
> No matter what.

Absolutely!

> Coding to this formalism means, for example, that a method can handle
> all possible argument values.
>
> public interface Caromer
> {
> public Fromitz carom(Flamo flame);
> }
>
> As Herr Klemme points out, 'null' is one possible input for 'flame'.
> Non-'null' is another. Presumably a 'Flamo' has observable state that
> distinguishes input values. If the type were 'Integer' or 'int',
> Robert's example int values might pertain.
>
> But you cannot give infinite test cases to a unit test. So you pick edge
> cases, on both the "happy path" (positive) and "unhappy path" (negative)
> scenarios.

Exactly.

>> - Thinking about the person who must use what you produce, regardless
>> whether
>> it's a document, a configuration file layout, a DSL, a class, a
>> library. It
>> seems many people in software development are far more concerned with the
>> inner workings of what they create instead of considering how it will
>> be used.
>> Maybe it's easier or it is because making it work takes the largest
>> part of
>> coding - still the outcome often is dissatisfying and a little more
>> thought
>> upfront goes a long way at avoiding maintenance headaches and worse
>> things.
>>
>> ...
>>
>> This can't be so difficult, can it?
>
> You have a fine sense of humor.

Well, the fun disappears when you see the same mistakes made over and
over again and attempts to improve it do not have effects...

> It's as easy as empathy, and getting inside the mind of a professional
> in a field of which you know little to nothing.

I agree for library API design: that takes a bit of experience to get
right. But when it comes to writing documents I think the rules for
readability are quite straight forward (some universities even have them
in their curriculum): do not start with the details but give the reader
an overview first. Give previews and summaries. Generally watch
document structure. Do not put too much in a single diagram. Things
like that.

> Testing certainly helps. Functional tests (as opposed to unit tests) can
> and should be part of and produced from requirements. At the user-facing
> level you can send test scenarios back as memos of understanding (MoUs)
> - what better way to make clear what you understand and how it will play
> out?

Good point.

> Prototypes count, too. They can meet tests, even. Building a prototype
> that meets tests, starting within a day or so of starting, kicks ass for
> converging on clarity. Keep it up - every day the prototype runs, works
> (up to whatever you've done that day), can be checked out and run in any
> directory, by any developer, and run, etc.

And prototypes are a great way to test whether ideas work with low
effort before basing a whole design on an idea.

> Add a CI tool like Jenkins to run your tests every time you "git commit"
> to "origin/master" (or whatever); you got yourself a one-entity
> professional shop.

Jenkins is great. We use it. Broken tests are quickly discovered. :-)

Kind regards

robert

PS: Hey, you managed to sneak in a German word. :-)

Gene Wirchenko

unread,

May 24, 2012, 11:11:04 PM5/24/12

to

On Thu, 24 May 2012 23:05:19 +0200, Robert Klemme
<short...@googlemail.com> wrote:

[snip]

>Pretty early in my career I had a quite instructive experience: we were
>building a web catalog for a business customer with user management, a
>lot of technical data about their products - and of course a link to
>their ERP system. They wanted to have a price calculation in that
>system which could take several routes depending on the user. They
>explained it, I wrote a document which also included an UML activity
>diagram of the flow with explanations. We sent it in. They rejected:
>"No, it must be completely different." So we sat together. End of the
>story: it was exactly what they wanted. So they either did not
>understood the document and didn't want to ask - or their own
>requirements. :-)

^^^
What is this?

(I think you called it right, no joking.)

A UML diagram might as well be code. It is something of ours.
Why expect them to be able to read it?

As to possibly not knowing their own requirements, that is quite
possible. Many a person knows how to do his job as it is currently
configured. That does not mean that he knows how to discuss the whys
and wherefores of it.

[snip]

Sincerely,

Gene Wirchenko

Kevin McMurtrie

unread,

May 25, 2012, 2:22:37 AM5/25/12

to

In article <zsWdnVufD87Egy_S...@brightview.co.uk>,
bugbear <bugbear@trim_papermule.co.uk_trim> wrote:

> I'm using various Apache Commons Maps as a
> multithread cache, protected using
> ReentrantReadWriteLock, so that getting() uses a read lock,
> and putting() uses a write lock.
>
> But I've got an issue; in the
> case of a cache miss (protected by a read lock),
> the required value is acquired using the "underlying function"
> that the cache is over; this value is then put() into
> the cache (protected by a write lock)
>
> This is all perfectly thread safe, and gives
> correct results.

>
> However, if the underlying function is slow
> and/or resource hungry (consider cacheing
> a ray traced image!) many threads can
> end up calling the real function (second
> and subsequent threads to the first get a miss
> during the first threads call to the underlying function).
>
> "obviously..." what I want is for only
> the thread that FIRST has a cache miss
> calls the underlying function, whilst other
> threads (for the same key) wait.
>

> This seems "obvious" enough that I'm guessing
> there's a standard solution.
>
> Googling led me to several "heavy" libraries;
>
> This appears more a locking/cacheing issue
> than a Java issue, although my implementation
> is in Java.

>
> Can anyone (please) point me at a
> canonical solution/implementation?
>

> BugBear

Ha, this is an interview question that I use.

What you need is row level locking for the cache load.

Step 1)
Use synchronized operations to map your key to a value; creating an
uninitialized value in the map if needed. Use whatever tech you want.
A synchronized block on a HashMap is simplest and performs the fastest
on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
with 4+ core systems.

Step 2)
Synchronize on the value. Initialize it if needed.

Step 1 blocks all cache access for only for a very short moment to make
sure that a key always has a value. Step 2 blocks access independently
for each cache value to make sure that it is loaded. It will perform
well for continuous use by several CPU cores. Google has some high
concurrency Maps that aren't too bad either.

In the 16 core range you'll find that any kind of exclusive lock causes
stalls where threads suspend while holding locks, causing a backlog that
reinforces itself. A concurrency expert can fix that using complex
Compare-And-Swap designs. In the 32+ core range you'll sometimes find
that harmless race conditions are better than multiple threads
frequently writing to the same memory page.
--
I will not see posts from Google because I must filter them as spam

Arved Sandstrom

unread,

May 25, 2012, 6:09:14 AM5/25/12

to

On 12-05-21 04:15 PM, Robert Klemme wrote:
>
[ SNIP ]>

> And it is true for a number of other techniques I would consider bread
> and butter tools:
>
> - Ensuring requirements are gathered properly and understood before
> starting to code (and design of course).

I want to blame an improper understanding of agile and lean and
iterative here, but I won't. On the other hand, I do think that a
widespread kneejerk rejection of waterfall has also led, in many cases,
to people rejecting formal requirements analysis and intensive BA work.
Maybe they've read enough about agile to think that they'll find out
everything they need to find out when it's time to build Slice 3, and
it's OK to find out about Slice 3 requirements then, not upfront.

My philosophy - perhaps old-school - is based on these observations over
the years:

1) In the vast majority of projects of all sizes, the large majority of
requirements were known and identifiable at the beginning;

2) Corollary to #1: the large majority of requirements do not change
over the lifetime of a project, of any size. They may be identified
late, or not at all (before release), but they were always knowable;

3) An impression of changing requirements is almost always due to
imprecise stakeholder/BA/architect/designer interactions and
conversations. If you get one answer in April and a different one in
October, it's either down to who you are asking or what your questions
are, or both. The actual final "real" requirement probably never changed.

As I say, these are my observations. YMMV. In my case I do substantially
more requirements analysis up front than maybe other folks do. It has
served me well. I also do a lot of prototyping and wire-framing and
proofs-of-concept/technology, all of which in some developer camps seem
to have been relegated.

> - Testing code _before_ shipping.
>
> - When writing unit tests, making sure to also include tests for
> critical values (usually corner cases such as -1, 0, 1, limit, limit -
> 1, limit + 1, null, "", etc.).

Testing is a different art. There is a lot of truth to the saying that a
good tester is a mediocre coder, and vice versa. At a minimum, if a
single person does have the expertise to do both, they need to be
schizophrenic.

Although I've not read anything about it, I've occasionally thought that
by generally accepted testing principles, that a person who writes unit
tests (just like any other tests) cannot, and should not, be the
developer. And yet it's the developer who invariably writes unit tests
for his own code. Food for thought.

> - Thinking about the person who must use what you produce, regardless
> whether it's a document, a configuration file layout, a DSL, a class, a
> library. It seems many people in software development are far more
> concerned with the inner workings of what they create instead of
> considering how it will be used. Maybe it's easier or it is because
> making it work takes the largest part of coding - still the outcome
> often is dissatisfying and a little more thought upfront goes a long way
> at avoiding maintenance headaches and worse things.
>
> ...

See my thoughts about prototypes, wireframes, Pocs/PoTs above. Using
those methods I've saved a lot of time over the years. All of them show
the end-user a good impression of *their* final interface. Which is what
matters.

>
> This can't be so difficult, can it?
>
> Kind regards
>
> robert
>
>

Robert Klemme

unread,

May 25, 2012, 11:36:08 AM5/25/12

to

On 25.05.2012 08:22, Kevin McMurtrie wrote:
> Ha, this is an interview question that I use.
>
> What you need is row level locking for the cache load.

But not all the time. See https://gist.github.com/2717818

> Step 1)
> Use synchronized operations to map your key to a value; creating an
> uninitialized value in the map if needed. Use whatever tech you want.
> A synchronized block on a HashMap is simplest and performs the fastest
> on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
> with 4+ core systems.

In my experience a CHM performs better even on a 1 or 2 core system.

> Step 2)
> Synchronize on the value. Initialize it if needed.
>
>
> Step 1 blocks all cache access for only for a very short moment to make
> sure that a key always has a value. Step 2 blocks access independently
> for each cache value to make sure that it is loaded. It will perform
> well for continuous use by several CPU cores. Google has some high
> concurrency Maps that aren't too bad either.

Actually once the value is in the cache you do not need any step 2
synchronization any more.

> In the 16 core range you'll find that any kind of exclusive lock causes
> stalls where threads suspend while holding locks, causing a backlog that
> reinforces itself. A concurrency expert can fix that using complex
> Compare-And-Swap designs.

Basically this is what CHM does with putIfAbsent() internally.

Kind regards

robert

Robert Klemme

unread,

May 25, 2012, 11:42:35 AM5/25/12

to

On 25.05.2012 05:11, Gene Wirchenko wrote:
> A UML diagram might as well be code. It is something of ours.
> Why expect them to be able to read it?

I did not expect them to read it out of the box. But it was embedded in
a document and accompanied with explanation of the process. Also, if
they did not understand it, my expectation would have been for them to
ask. But they simply rejected it as if they had understood it. It
always takes two sides for successful communication - in this case we
luckily resolved it but I think generally this communication pattern is
more likely to produce errors than success.

> As to possibly not knowing their own requirements, that is quite
> possible. Many a person knows how to do his job as it is currently
> configured. That does not mean that he knows how to discuss the whys
> and wherefores of it.

That is true. Doing and reasoning about what one does are really two
things. However, in this case they actually _had_ described their
requirements - and even in a way that I was capable of coming up with
the proper algorithm. This means a certain amount of thought must have
been done already. They just did not recognize it when they saw the
result and for an unknown reason immediately jumped to a conclusion.

Kind regards

robert

Mike Schilling

unread,

May 26, 2012, 5:54:55 PM5/26/12

to

"Silvio Bierman" <sil...@moc.com> wrote in message
news:4fb35691$0$6944$e4fe...@news2.news.xs4all.nl...
> On 05/16/2012 02:26 AM, Eric Sosman wrote:
>> On 5/15/2012 7:19 PM, Silvio Bierman wrote:
>>> On 05/16/2012 12:57 AM, Roedy Green wrote:
>>>> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<sil...@moc.com>

>>>> wrote, quoted or indirectly quoted someone who said :
>>>>

>>>>> The simplest approach is to wrap your cached values in an object and
>>>>> make all cache access go through two stages.
>>>>
>>>> Where did you learn this?

>>>
>>> Why do you ask? Do you disagree?
>>>

>>> It is something I came up with the first time I encountered the same
>>> situation, probably many years ago.
>>> It is obvious you need to decouple key lookups/insertions, which are
>>> inherently cache-global, from value insertions/updates, which could be
>>> considered key-local.
>>>
>>> Eric and Daniel responded with more specific solutions that both match
>>> my general pattern.
>>
>> Three independent votes for one pattern: What we have here is a
>>
>> M A N D A T E !!!
>>
>
> Let's hope this is inspires the Greeks...

They've been dating men for millenia.

Kevin McMurtrie

unread,

Jun 2, 2012, 3:27:31 PM6/2/12

to

In article <a29n7c...@mid.individual.net>,

Robert Klemme <short...@googlemail.com> wrote:

> On 25.05.2012 08:22, Kevin McMurtrie wrote:
> > Ha, this is an interview question that I use.
> >
> > What you need is row level locking for the cache load.
>
> But not all the time. See https://gist.github.com/2717818

The original post stated that duplicate element creation was too
expensive. That code may create duplicate elements for a key and
discard the extras.

>
> > Step 1)
> > Use synchronized operations to map your key to a value; creating an
> > uninitialized value in the map if needed. Use whatever tech you want.
> > A synchronized block on a HashMap is simplest and performs the fastest
> > on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
> > with 4+ core systems.
>
> In my experience a CHM performs better even on a 1 or 2 core system.

This depends on usage patterns. The Java 1.5 Concurrency locks used in
some parts of CHM are very slow compared to synchronization. The
Concurrency locks only have a chance of performing better when there are
a lot of concurrent shared read locks.

> > Step 2)
> > Synchronize on the value. Initialize it if needed.
> >
> >
> > Step 1 blocks all cache access for only for a very short moment to make
> > sure that a key always has a value. Step 2 blocks access independently
> > for each cache value to make sure that it is loaded. It will perform
> > well for continuous use by several CPU cores. Google has some high
> > concurrency Maps that aren't too bad either.
>
> Actually once the value is in the cache you do not need any step 2
> synchronization any more.

Step 2 is initializing the value. It's critical to synchronize there or
the second thread to request the same key may see an element that is not
yet fully initialized. (Again, the original post stated that elements
were expensive to create.)

>
> > In the 16 core range you'll find that any kind of exclusive lock causes
> > stalls where threads suspend while holding locks, causing a backlog that
> > reinforces itself. A concurrency expert can fix that using complex
> > Compare-And-Swap designs.
>
> Basically this is what CHM does with putIfAbsent() internally.

Yes, but you don't have access to its internal container class so CHM is
not much use for caching elements that are very expensive to create.
The Googlely version does provide a way to initialize the container with
a factory callback; combining steps 1 and 2 here.

>
> Kind regards
>
> robert

Robert Klemme

unread,

Jun 2, 2012, 3:54:12 PM6/2/12

to

On 02.06.2012 21:27, Kevin McMurtrie wrote:
> In article<a29n7c...@mid.individual.net>,
> Robert Klemme<short...@googlemail.com> wrote:
>
>> On 25.05.2012 08:22, Kevin McMurtrie wrote:
>>> Ha, this is an interview question that I use.
>>>
>>> What you need is row level locking for the cache load.
>>
>> But not all the time. See https://gist.github.com/2717818
>
> The original post stated that duplicate element creation was too
> expensive. That code may create duplicate elements for a key and
> discard the extras.

No, it does not create duplicate elements and hence does not discard
them. The only thing which might be duplicated is the proxy instance
which calls the factory method (created in line 98). But that is cheap.
The design ensures that a value is only calculated once per key and
neither CPU is wasted nor duplicate value objects. Please see Lew's
excellent explanation of what happens or look at the code again.

>>> Step 1)
>>> Use synchronized operations to map your key to a value; creating an
>>> uninitialized value in the map if needed. Use whatever tech you want.
>>> A synchronized block on a HashMap is simplest and performs the fastest
>>> on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
>>> with 4+ core systems.
>>
>> In my experience a CHM performs better even on a 1 or 2 core system.
>
> This depends on usage patterns. The Java 1.5 Concurrency locks used in
> some parts of CHM are very slow compared to synchronization. The
> Concurrency locks only have a chance of performing better when there are
> a lot of concurrent shared read locks.

Even if they are slower than synchronization they perform better than a
single lock on the whole Map (what you suggested above) with multiple
threads even on a few core machine just because there are more of them
and the whole Map is partitioned. I haven't strictly measured the
synchronization mechanisms used inside CHM but I did also not notice bad
timing (and frankly, I cannot believe Doug Lea would have used
inefficient mechanisms). Bottom line: when in doubt measure.

>>> Step 2)
>>> Synchronize on the value. Initialize it if needed.
>>>
>>>
>>> Step 1 blocks all cache access for only for a very short moment to make
>>> sure that a key always has a value. Step 2 blocks access independently
>>> for each cache value to make sure that it is loaded. It will perform
>>> well for continuous use by several CPU cores. Google has some high
>>> concurrency Maps that aren't too bad either.
>>
>> Actually once the value is in the cache you do not need any step 2
>> synchronization any more.
>
> Step 2 is initializing the value. It's critical to synchronize there or
> the second thread to request the same key may see an element that is not
> yet fully initialized. (Again, the original post stated that elements
> were expensive to create.)

Maybe my wording was not clear enough. Once the value is in the Map
synchronization is needed no more for this key and only the Map level
locking remains. Again, please look at the code.

>>> In the 16 core range you'll find that any kind of exclusive lock causes
>>> stalls where threads suspend while holding locks, causing a backlog that
>>> reinforces itself. A concurrency expert can fix that using complex
>>> Compare-And-Swap designs.
>>
>> Basically this is what CHM does with putIfAbsent() internally.
>
> Yes, but you don't have access to its internal container class so CHM is
> not much use for caching elements that are very expensive to create.

I beg to differ (see code).

> The Googlely version does provide a way to initialize the container with
> a factory callback; combining steps 1 and 2 here.

Which version? Please share a reference. Thank you!