CompletableFuture caching

1,587 views
Skip to first unread message

Jakub Narloch

unread,
Jan 31, 2016, 4:47:59 PM1/31/16
to mechanica...@googlegroups.com
Hi,

I was hopping to find a place where I could challenge my idea. Lately I had been working on the web project that used heavily Java 8 CompletableFuture for processing and despite that we gain all of benefits of concurrent task executions, we had to give up some others, one of which is cacheing on application logic level. The point is that all of our interfaces were relaying on promises. While it probably would not be a problem to address the issue in simple manner I wanted to have more general approach, so I've made in my spare time a PoC:


This is a "specialized" async kind of cache implementation to which one supplies the unit of work, and receives a promise in exchange. I had to solve the issue on how to co-op with the fact that CompletableFuture will end it work in some future in time and until that time all other attempts to store the same value will result in unnecessary duplications of work. So unlike normal cache where you generally store the values, this one also caches the computations. As long the task is not completed any subsequent attempt to "put" the value will result in returning the same (and only one!) cached instance of CompletableFuture.

Now what I want to ask is whether someone would be willing to take a look at the implementation and share some ideas in terms of correctness and possible extensions and/or optimizations. I'm open to discussion.

I see potential in this idea in general.

Nakamura

unread,
Jan 31, 2016, 6:08:09 PM1/31/16
to mechanica...@googlegroups.com
Hi Jakub,
We ran into the same problem when doing asynchronous work.  As far as I can tell, there are three main things that need to be taken care of carefully.

1.  Evict failures properly
2.  Avoid races (which you spoke to)
3.  Interruptions should not cancel work for other workloads

1. is pretty easy, you simply need to set up a handler so that when the future fails, it's evicted.
2. is also pretty easy–it simply requires that you cache the actual future, not the result.  it can be done in other ways, but this is the simplest.
3. is a little tricky, and depends on the future implementation.  In Twitter's, we have a special facility for "detachable" Promises, which can be interrupted efficiently without cancelling the underlying work, but can still cancel the work that would have been done on that future.  As an example:

The first thread comes in, and tries to read a key, "FANCY_KEY" from the cache.  It sees that it isn't cached, so it populates it asynchronously, and we get a handle on a Future.  We add a handler to the Future, so that when it's returned, it logs the returned message.

The second thread comes in, and tries to read the same key, and gets a handle on a Future (the same one, so we don't duplicate work).

The first thread passes some timeout, and cancels the work.  We want it to tear down the handler that it had registered on the Future, but we don't want it to cancel the underlying work–the second thread will need it, after all.

You can see the library here, although it uses Twitter Futures.

The core idea is that we combine a few simple primitives, and it gives us everything that we need.  Those primitives are:

A.  AsyncMemoize (caching)
B.  EvictingCache (eviction)
C.  interruption (this is simple with Twitter Futures so it doesn't need its own class)

We have a wrapper for the guava caches (their cache APIs are quite good), and when we move to jdk8, we will probably look into supporting caffeine too.

Best,
Moses

On Sun, Jan 31, 2016 at 1:47 PM, Jakub Narloch <jmna...@gmail.com> wrote:
Hi,

I was hopping to find a place where I could challenge my idea. Lately I had been working on the web project that used heavily Java 8 CompletableFuture for processing and despite that we gain all of benefits of concurrent task executions, we had to give up some others, one of which is cacheing on application logic level. The point is that all of our interfaces were relaying on promises. While it probably would not be a problem to address the issue in simple manner I wanted to have more general approach, so I've made in my spare time a PoC:


This is a "specialized" async kind of cache implementation to which one supplies the unit of work, and receives a promise in exchange. I had to solve the issue on how to co-op with the fact that CompletableFuture will end it work in some future in time and until that time all other attempts to store the same value will result in unnecessary duplications of work. So unlike normal cache where you generally store the values, this one also caches the computations. As long the task is not completed any subsequent attempt to "put" the value will result in returning the same (and only one!) cached instance of CompletableFuture.

Now what I wan't to ask is, whether someone would be willing to take a look at the implementation and share some ideas in terms of correctness and possible extensions and/or optimizations. I'm open to discussion.

I see potential in this idea in general.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Manes

unread,
Jan 31, 2016, 7:37:47 PM1/31/16
to mechanical-sympathy
Hi Jakub,

Its not clear from your JavaDoc whether your computeIfAbsent is intended to be atomic. I suspect it is, but presently isn't. The Map view of Guava's caches inherit the default method's implementation, which uses a retry loop to insert the entry. That could result multiple futures being scheduled and only one cached, leaving the others to perform unnecessary work. If those futures require some resource cleanup, e.g. closing a file handler, then this could be problematic. If you use Cache#get(key, callable) instead you won't have this problem. The JDK8 version of Guava (mid 2016) will should fix this.

The invalidateIfPresent(key, consumer) seems to be a little racy too. Perhaps use a prescreen contains, remove, and then notify the consumer?

The expiration duration counts the computation time of the entry. I don't think that is intuitive and that most users would say that it hasn't materialized yet, so the duration should only apply to the completed value. That would be inline with Guava's synchronous computation.

For tests you might enjoy using Awaitility instead of the CountDownLatch trick. This is nice in complex cases, though for your current suite its more stylistic than necessary.

Moses' remark on cancellation is quite insightful. It seems reasonable to have the cache returns a defensive copy of the future to guard against clients cancelling / completing a shared future. That could be an understandable mistake due to the client code not realizing the future is shared, so contextually cancelling makes sense. Yet that protection removes a useful feature for developers who do know and intended to propagate the completion.

Caffeine's AsyncLoadingCache does (1) and (2), but not (3). Instead users have to call #copy() in their code if that behavior is desirable, which is a tiny inconvenience. But perhaps Moses' approach is better and more inline with user's expectation. I've been conservative by waiting for feedback to see if there are many complaints. If there are then I might make the change, as currently I'm not sure which approach better abides by the principle of least surprise.

Cheers,
Ben

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Nakamura

unread,
Jan 31, 2016, 7:58:49 PM1/31/16
to mechanica...@googlegroups.com
Or a sophisticated user can pick their poison :].

For example, we can use the default, where the interrupt isn't propagated:

val fn: K => Future[V] = ??? // we assume this is provided
val cache: FutureCache = ???  // we assume this is provided, probably using GuavaCache
val shared = FutureCache.default(fn, cache)

which under the hood is doing:

val fn: K => Future[V] = ??? // we assume this is provided
val cache: FutureCache = ???  // we assume this is provided, probably using GuavaCache
val shared = AsyncMemoize(fn, new EvictingCache(cache)) andThen { f: Future[V] => f.interruptible() }

so we could just as easily instead use:

val fn: K => Future[V] = ??? // we assume this is provided
val cache: FutureCache = ???  // we assume this is provided, probably using GuavaCache
val owned = AsyncMemoize(fn, new EvictingCache(cache))

The downside to exposing composable primitives like this is that it complicates the contract of the Cache, but I think it's worth it for the extra power it provides.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Will Sargent

unread,
Feb 4, 2016, 11:02:21 PM2/4/16
to mechanica...@googlegroups.com
This sounds a bit like Spray-Cache:

Reply all
Reply to author
Forward
0 new messages