Concurrent CacheLoader implementation

1,674 views
Skip to first unread message

Tim Peierls

unread,
Mar 10, 2012, 5:58:25 PM3/10/12
to guava-...@googlegroups.com
I posted a new blog entry that includes a concurrent CacheLoader implementation, i.e., it implements loadAll to load the values concurrently, using an ExecutorService, rather than sequentially.


The main point of the posting is actually about using Hazelcast to implement a distributed CacheLoader, so there is Hazelcast-specific code, but the non-Hazelcast case portion is easily extractable and seems useful on its own. I'd hate to be wheel-reinventing, but while you'd think there would be dozens of things like this out there, I can't find any. Maybe that's just a sign that Guava Cache/LoadingCache is still a new thing.

--tim

Benjamin Manes

unread,
Mar 10, 2012, 6:47:26 PM3/10/12
to Tim Peierls, guava-...@googlegroups.com
I think its because most of the time the client library to a storage tier includes batched reads to avoid slow sequential calls or query storms. So the concurrency, timeout handling, etc. are mostly abstracted from the caller. For instance when I first implemented a bulk memoizing cache ~5 years ago it was on top of memcached. Are you sure that Hazelcast doesn't have something like this as well? Not knowing that library, naively MapLoader looks promising.


--tim

--
guava-...@googlegroups.com
Project site: http://guava-libraries.googlecode.com
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: http://code.google.com/p/guava-libraries/issues/entry
To get help: http://stackoverflow.com/questions/ask (use the tag "guava")

Tim Peierls

unread,
Mar 10, 2012, 7:52:55 PM3/10/12
to Benjamin Manes, guava-...@googlegroups.com
The limitation with MapLoader is that you have to specify it as part of the configuration supplied when creating the Hazelcast instance -- which normally happens during start-up of a long-running server.

I did a lot of work with MapLoader and MapStore a while back, badgering the Hazelcast team into adding a MapStoreFactory to the configuration so I could inject my MapLoader/MapStore instances using Guice. I even found a way to convey per-IMap loader/store parameters to a MapStoreFactory by encoding them into the name of the IMap instance, using a template: map-{path}-{keyType}-{valueType}-{eagerKeyLoading}. It's perverted, but it works.

But there's no way to convey an arbitrary Function using this scheme, so an IMap-based loadAll is limited to a fixed set of loader functions that you specify in advance of creating the Hazelcast instance. The ConcurrentCacheLoader approach only requires that the Function passed to the ConcurrentCacheLoader Builder (and the key and value types) be Serializable.

However, I do think it would be worth writing an IMap-based Cache and LoadingCache, i.e., using an IMap (without a MapLoader or MapStore) to cache the values so that they are available to Caches with the same name on other nodes of the cluster. That could be a big win if the cost of an optimized call within the cluster to serve a cached value is often significantly less than the cost to load the value from scratch. (Note to Adrian Cole: This might apply in the "distributed jclouds context" setting that you mentioned in a different thread.)

--tim

Charles Fry

unread,
Mar 12, 2012, 11:14:35 AM3/12/12
to Tim Peierls, guava-...@googlegroups.com
Hi Tim,

This is very interesting, and I'm glad to see it written up so clearly! I could easily get behind adding something like com.google.common.cache.CacheLoaders.loadAllInParallel(CacheLoader<K, V>, Executor) which turned the sequential calls into parallel calls. Would that be enough for you?

It ends up that we already have an internal CacheLoaders.asyncReload(CacheLoader<K, V>, Executor), and this seems like a good fit to go right along side it.

I don't see either of these being core enough for deeper integration, but I'm in favor of supporting common but simple CacheLoader extensions in a CacheLoaders class.

Charles


--tim

--

Tim Peierls

unread,
Mar 12, 2012, 12:12:11 PM3/12/12
to Charles Fry, guava-...@googlegroups.com
On Mon, Mar 12, 2012 at 11:14 AM, Charles Fry <f...@google.com> wrote:
This is very interesting, and I'm glad to see it written up so clearly! I could easily get behind adding something like com.google.common.cache.CacheLoaders.loadAllInParallel(CacheLoader<K, V>, Executor) which turned the sequential calls into parallel calls. Would that be enough for you?

(I like the idea of wrapping an existing CacheLoader to add the loadAll implemention!)

It would be great to have, but it wouldn't address my primary motivation, which was to distribute the Function evaluations across a Hazelcast cluster. That's OK -- Guava doesn't have to do everything for me! -- but even more flexible for my purposes would be CacheLoaders.loadAllInParallel(ConcurrentFunction<K, V>), where ConcurrentFunction could be defined along these lines:


except ConcurrentFunction should be an interface (with an accompanying AbstractConcurrentFunction), and you might not want to include the timed version of apply. (For my purposes, I would instead embed the time constraint into my custom ConcurrentFunction instance.)

But adding ConcurrentFunction just for this one use isn't right. (I can think of at least one other possible use: parallel Maps.transformEntries.)

 
It ends up that we already have an internal CacheLoaders.asyncReload(CacheLoader<K, V>, Executor), and this seems like a good fit to go right along side it.

Yup.

Implementing reload in ConcurrentCacheLoader (and DistributedCacheLoader) is on my list.

 
I don't see either of these being core enough for deeper integration, but I'm in favor of supporting common but simple CacheLoader extensions in a CacheLoaders class.

Sounds good to me. 

I'm still planning to create Cache and LoadingCache implementations that use Hazelcast-distributed maps to hold the cached values, which only makes sense if loading is very often much slower than access across the cluster. There's a lot of design space here, including whether to have a local near cache (using a standard Guava cache) with distributed notification of invalidation -- speeds up common accesses but increases network traffic. (Ideally I can support the whole space via a builder.)

In addition, I'll use your idea of wrapping existing CacheLoaders to add distributing loading so that the user won't have to supply a distributed CacheLoader to an already distributed cache.

--tim

Louis Wasserman

unread,
Mar 12, 2012, 12:23:55 PM3/12/12
to Tim Peierls, Charles Fry, guava-...@googlegroups.com
We already do have AsyncFunction.  Might that be appropriate here?

--

Tim Peierls

unread,
Mar 12, 2012, 2:50:36 PM3/12/12
to Louis Wasserman, Charles Fry, guava-...@googlegroups.com

Nice! Yes, I think that would work. In which case, no need for CacheLoaders.loadAllInParall, just add static CacheLoader.from(AsyncFunction<K, V>) I'll recast what I've done in those terms. For my Hazelcast case, I've just now written an AsyncFunction implementation and a static factory method that converts a Function and a (Hazelcast) Executor to an instance of that  AsyncFunction impl.

That keeps the Hazelcast-specific stuff neatly stowed away. 

Is there a static method that takes an AsyncFunction and an Iterable of keys and produces a Map of keys to values? I'll want to write one if not.

--tim

Louis Wasserman

unread,
Mar 12, 2012, 3:57:22 PM3/12/12
to Tim Peierls, Charles Fry, guava-...@googlegroups.com
No, but it'd be straightforward enough to whip up using Futures.allAsList or successfulAsList, depending on your preferences.

Tim Peierls

unread,
Mar 12, 2012, 4:27:04 PM3/12/12
to Louis Wasserman, Charles Fry, guava-...@googlegroups.com
Oh, right. duh

Charles Fry

unread,
Mar 12, 2012, 8:04:10 PM3/12/12
to Tim Peierls, Louis Wasserman, guava-...@googlegroups.com
Ah, this is getting interesting.

Actually I've been planning to create an AsyncCacheLoader and AsyncLoadingCache, though I haven't worked through all the details in the design yet. It sounds like we might be able to at least create AsyncCacheLoader now. Then we could have AsyncCacheLoader.from(AsyncFunction<K, V>).

Note that the advantage of going the AsyncCacheLoader route instead of just adding CacheLoader.from(AsyncFunction<K, V>) is that it facilitates the selective over-riding of reload or loadAll.

How does that sound?

Charles

Louis Wasserman

unread,
Mar 12, 2012, 8:07:08 PM3/12/12
to Charles Fry, Tim Peierls, guava-...@googlegroups.com
Ooooh.  I like that. Asynchronous loading and caching isn't exactly addressed by LoadingCache, even now, is it...

Charles Fry

unread,
Mar 12, 2012, 8:14:51 PM3/12/12
to Louis Wasserman, Tim Peierls, guava-...@googlegroups.com
Right. Having a Cache<K, Future<V>> has all kinds of unexpected gotchas too.

You'll notice that I've already prepped the internals of LocalCache to speak in terms of Futures, so the main remaining task now is to agree on the API.

Here are my specific thoughts so far; I welcome feedback:

AsyncCacheLoader

ListenableFuture<V> loadFuture(K key) throws Exception;
ListenableFuture<V> refresh(K key) throws Exception;
Map<K, ListenableFuture<V>> loadAllFutures(Iterable<? extends K> keys) throws Exception;

AsyncLoadingCache

The LoadingCache methods we need are:

Future<V> getFuture(K key);
void refresh(K key);
ImmutableMap<K, Future<V>> getAllFutures(Iterable<? extends K> keys);

The Cache methods are a little more tricky. On the one hand, most of them are less obviously asynchronous. On the other hand, other Cache implementations might need to accomodate asynchronous errors even from void methods. The simplest methods are these ones:

void cleanUp();
ListenableFuture<V> getFuture(K key, Callable<? extends V> valueLoader);
ImmutableMap<K, V> getAllPresent(Iterable<? extends K> keys);
V getIfPresent(K key);
long size();
CacheStats stats();

That still leaves all of the invalidate and the put methods. Do we want to make those asynchronous, exposing some error handler or some Future<Void> to return errors?

Open Questions

  • What should the asMap() view do? Expose V or Future<V>?
  • Do we want to make it avoid name collisions between CacheLoader and AsyncCacheLoader, and between LoadingCache and AsyncLoadingCache?

Tim Peierls

unread,
Mar 12, 2012, 9:25:46 PM3/12/12
to Charles Fry, Louis Wasserman, guava-...@googlegroups.com
I've gone ahead and written CacheLoaders.fromAsyncFunction(AsyncFunction<K, V>) -- thank you, Louis, for suggesting this -- and it addresses my needs nicely. (Also wrote a HazelcastAsyncFunction that hides all the Hazelcast details from the rest of the machinery.)

But I don't see what AsyncCacheLoader and AsyncLoadingCache add to the picture.

--tim

Tim Peierls

unread,
Mar 12, 2012, 9:54:39 PM3/12/12
to Charles Fry, Louis Wasserman, guava-...@googlegroups.com
On Mon, Mar 12, 2012 at 9:25 PM, Tim Peierls <t...@peierls.net> wrote:
But I don't see what AsyncCacheLoader and AsyncLoadingCache add to the picture.

Ah, I missed Charles' note:
Note that the advantage of going the AsyncCacheLoader route instead of just adding CacheLoader.from(AsyncFunction<K, V>) is that it facilitates the selective over-riding of reload or loadAll.

But it's not hard to imagine additional static factory methods taking extra Function or AsyncFunction arguments to specify reload and/or loadAll behavior, without having to add the full AsyncCacheLoader machinery, which feels like a lot of work for no compelling uses that I've seen so far.

--tim


Tim Peierls

unread,
Mar 12, 2012, 10:56:18 PM3/12/12
to Charles Fry, Louis Wasserman, guava-...@googlegroups.com
Summarized some of today's developments here:


I left the AsyncCacheLoader stuff out, because I'm missing the use cases.

--tim

Louis Wasserman

unread,
Mar 12, 2012, 11:02:56 PM3/12/12
to Tim Peierls, Charles Fry, guava-...@googlegroups.com
I'm not convinced of the correctness of this implementation -- in particular, I don't believe that the map is guaranteed to be filled when you return it from loadAll; just adding the callback could cause the entries to get added to the map at any time.

Instead, I might consider going ahead and paying the cost of sequentially filling an ImmutableMap with the entries before returning; that should still be cheap compared to the computations you're farming out.  Not sure if there's a better way.

Tim Peierls

unread,
Mar 12, 2012, 11:20:45 PM3/12/12
to Louis Wasserman, Charles Fry, guava-...@googlegroups.com
Responded briefly to your comment on my blog before seeing this. 

But my original thinking was that the Future returned by successfulAsList only returns its List when all the (Settable)Futures have completed, through a call to either set or setException (leaving aside the possibility of IE). And Futures.addCallback with two arguments runs the callback in the same thread, right? So returning the results map directly should be OK.

I'm not thrilled about the unused List<V> returned from successfulAsList, but it was sooo easy to write... :-)

--tim

Tim Peierls

unread,
Mar 13, 2012, 9:35:06 AM3/13/12
to Louis Wasserman, Charles Fry, guava-...@googlegroups.com
Thanks (again) to Louis showing me a race that I obstinately didn't want to see (to the point of misreading the Javadocs!), I've fixed the CacheLoaders.fromAsyncFunction implementation. It's much simpler now.

Blog entry and code snippet updated.

--tim

Charles Fry

unread,
Mar 13, 2012, 10:56:32 AM3/13/12
to Tim Peierls, Louis Wasserman, guava-...@googlegroups.com
I left the AsyncCacheLoader stuff out, because I'm missing the use cases.

Here's what it comes down to: By focusing on AsyncFunction we allow users an easy way to specify an asynchronous load, and then we bind the refresh and loadAll implementations. So as soon as you want asynchronous load and a custom loadAll, you have to re-implement the asynchronous load yourself.

By introducing AsyncCacheLoader it is no longer necessary for users to implement AsyncFunction, as they could with the same amount of work simply extend AsyncCacheLoader. (Of course we still might want a convenience method in case they already had an AsyncFunction.) By extending AsyncCacheLoader instead of AsyncFunction, it would then become trivial to later decide to override loadAll, and to be presented with an asynchronous API for doing so.

Does that sound right?

Charles

Kevin Bourrillion

unread,
Mar 13, 2012, 10:58:58 AM3/13/12
to Charles Fry, Tim Peierls, Louis Wasserman, guava-...@googlegroups.com
Yeah; the way I see it, if the use cases are considered important, then AsyncCacheLoader is a more robust approach to addressing them.


--



--
Kevin Bourrillion @ Google
Java Core Libraries Team
http://guava-libraries.googlecode.com

Louis Wasserman

unread,
Mar 13, 2012, 11:02:50 AM3/13/12
to Kevin Bourrillion, Charles Fry, Tim Peierls, guava-...@googlegroups.com
Isn't it the caes that with Cache alone, you can't support ListenableFuture<V> Cache.getAsync(K key, Callable<? extends V> load), or the equivalent -- at least not without a lot of very tricky logic (I think?).  And I can totally see that being a thing that deserves support!

Charles Fry

unread,
Mar 13, 2012, 12:26:20 PM3/13/12
to Tim Peierls, Louis Wasserman, guava-...@googlegroups.com
Tim, I spoke with Chris and we agreed that the short term thing we want to add is indeed CacheLoader.from(AsyncFunction<K, V>).

Would you be interested in putting together a patch for us based on what you've already done, and sending it to Louis for review?

Charles

Tim Peierls

unread,
Mar 13, 2012, 1:24:11 PM3/13/12
to Charles Fry, Louis Wasserman, guava-...@googlegroups.com
Sure, will send to Louis directly.

Charles Fry

unread,
Mar 13, 2012, 1:25:36 PM3/13/12
to Tim Peierls, Louis Wasserman, guava-...@googlegroups.com
FYI, we generally use http://codereview.appspot.com/ for external code reviews. Louis could tell you more about the process than I can...

Adrian Cole

unread,
Apr 3, 2012, 5:42:14 PM4/3/12
to Charles Fry, Tim Peierls, Louis Wasserman, guava-...@googlegroups.com
jclouds would love to beta test this: all of our rest clients return
ListenableFuture<T>, leading to extremely straightforward cache
management once this is in.

-A

Charles Fry

unread,
Apr 5, 2012, 3:34:32 PM4/5/12
to Adrian Cole, Tim Peierls, Louis Wasserman, guava-...@googlegroups.com
We'll definitely update this thread once the review is complete, and the code is submitted.

Tim Peierls

unread,
Mar 15, 2013, 6:53:24 PM3/15/13
to guava-...@googlegroups.com, Adrian Cole, Louis Wasserman, Charles Fry, Kevin Bourrillion
Resurrecting nearly-year-old conversation: Whatever happened to CacheLoader.from(AsyncFunction)? 

--tim
Reply all
Reply to author
Forward
0 new messages