Single writer multiple readers

1,176 views
Skip to first unread message

Rajiv Kurian

unread,
Jun 7, 2013, 3:52:48 AM6/7/13
to mechanica...@googlegroups.com
I am maintaining a per machine cache. This cache is written by a single writer thread and read by multiple threads. Currently I am using Java's concurrent map implementation for the cache. The concurrent data structure is designed to have multiple readers and writers. Is any one aware of optimizations that could be applied when there is a single producer.

Thanks,
Rajiv

Peter Lawrey

unread,
Jun 7, 2013, 8:02:38 AM6/7/13
to Rajiv Kurian, mechanica...@googlegroups.com
There are optimisations the CPU might make.  Having just one writer can be faster than having multiple writers of a shared resource.

While there are micro-optimisations you could make which might save a few nano-seconds, I suspect it not worth rewriting the concurrent hash map to take advantage of them.

Do you have a specific performance problem you are trying to solve?


--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Wojciech Kudla

unread,
Jun 7, 2013, 8:44:33 AM6/7/13
to Rajiv Kurian, mechanica...@googlegroups.com
You may look into sun.misc.Unsafe#putOrderedObject that should resolve to storestore which is more lightweight than loadstore for genuine volatile write that takes place in Doug Lea's CHM.
If you don't want to mess around with Unsafe's intrinsics I'd suggest evaluating Cliff Click's NonBlockingHashMap which is a truly non-blocking hashmap implementation as opposed to Doug's approach.

Regards


2013/6/7 Rajiv Kurian <geet...@gmail.com>
I am maintaining a per machine cache. This cache is written by a single writer thread and read by multiple threads. Currently I am using Java's concurrent map implementation for the cache. The concurrent data structure is designed to have multiple readers and writers. Is any one aware of optimizations that could be applied when there is a single producer.

Thanks,
Rajiv

--

Rajiv Kurian

unread,
Jun 7, 2013, 12:53:07 PM6/7/13
to mechanica...@googlegroups.com, Rajiv Kurian
I am not trying to solve a particular performance problem. Just trying to see what kind of data structures there are currently to optimize the single writer, multiple reader case.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Kevin Burton

unread,
Jun 7, 2013, 2:38:56 PM6/7/13
to mechanica...@googlegroups.com, Rajiv Kurian
There might be a better way to think about the problem from a different angle.

ConcurrentHashMap isn't free… concurrent data structures impose an overhead.

Why not just use an async framework like Netty and add all GET/PUT requests into a queue like Disruptor and then have a single thread read from this queue and mutate a regular HashMap or TreeMap (or other data structure).

One of the advantages to this model is that you can also elide put/get requests so that you access the cache more efficiently.

To scale across multiple cores you just have one thread per core listening to a different port.

You can then use something like consistent hashing to determine which daemon to fetch the key from.

This might be more complicated than you are ready for though.  

This is the approach that I'm taking with Peregrine.

Rajiv Kurian

unread,
Jun 7, 2013, 2:41:54 PM6/7/13
to mechanica...@googlegroups.com, Rajiv Kurian
How would readers (on different threads) of the regular Map be synchronized with the writer (yet another thread)? 

Kevin Burton

unread,
Jun 7, 2013, 3:28:19 PM6/7/13
to mechanica...@googlegroups.com, Rajiv Kurian
Is this a network service?  You wouldn't use other threads.

You would take all requests put,get and then serve them on the order they come in.

your executor thread just has an infinite loop and serves all clients in one pass.

One issue you have to take care to handle is stragglers which may have filled your local TCP send buffer so you need to suspend serving these and possible add them back in the queue when they are ready again.

One cool thing about this approach that I like is that your algorithms can be stupid simple. No concurrent anything.  The one downside is you need to grok async IO and that can take a little longer. I'm still sort of wrapping my head around Netty.  

Rajiv Kurian

unread,
Jun 7, 2013, 4:21:57 PM6/7/13
to mechanica...@googlegroups.com, Rajiv Kurian
I am talking about a generic cache written to by a single thread and read by multiple. For eg: A LRU cache to prevent a look up of data on a remote node, not a batch processing or request/response architecture. 

@Wojciech Kudla I'll take a look at Unsafe.putOrderedObject.

Simone Bordet

unread,
Jun 7, 2013, 4:58:00 PM6/7/13
to Rajiv Kurian, mechanica...@googlegroups.com
Hi,

On Fri, Jun 7, 2013 at 10:21 PM, Rajiv Kurian <geet...@gmail.com> wrote:
> I am talking about a generic cache written to by a single thread and read by
> multiple. For eg: A LRU cache to prevent a look up of data on a remote node,
> not a batch processing or request/response architecture.
>
> @Wojciech Kudla I'll take a look at Unsafe.putOrderedObject.

For maximum portability, you may want to use
AtomicReference.lazySet(), AtomicReferenceArray.lazySet(),
AtomicReferenceFieldUpdater.lazySet().

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.

Rajiv Kurian

unread,
Jun 7, 2013, 6:10:09 PM6/7/13
to mechanica...@googlegroups.com, Rajiv Kurian, sbo...@intalio.com
Thanks for the advice. I just need a store-store barrier so I'll take a look at the AtomicReference options first.

Scott Carey

unread,
Jun 8, 2013, 11:55:40 PM6/8/13
to mechanica...@googlegroups.com, Rajiv Kurian
I suggest being very cautious if using an LRU with the JVM.  It is often a recipe for massive garbage churn, as you are forcing a long minimum lifetime for garbage that is only accessed rarely.

I have had great luck with caches with weak values when the cost of creating an item is moderate -- in particular when caching the result of deserializing something from disk.   Soft references are another option, but these can also cause a lot of GC pressure.   Guava's Cache supports weak values, TTL, and other useful features.   It is not especially optimized for a single producer, however.

Kevin Burton

unread,
Jun 9, 2013, 12:23:08 AM6/9/13
to mechanica...@googlegroups.com, Rajiv Kurian
One of the big issues here comes with heap fragmentation.  You're keeping lots of data around in memory for a long time but not really packing it.  So the CMS is going to have problem reclaiming some of this and will eventually have to 'lock the world' as you run out of memory.

both Cassandra and HBase saw this problem and implemented arena caching a few years back.

Basically the strategy is you keep your object data offline and out of the JVM in a direct buffer (or I guess you could keep it on a heap buffer) and then you can reclaim huge chunks at a time.  

One of the reasons I like using memory outside the JVM is that you can expand and shrink your cache on the fly without having to worry about the JVM heap size.

Plus you can get accurate accounting of it.. 
Reply all
Reply to author
Forward
0 new messages