Guava Cache - using an expiration time of 1 min, is that an overkill/abuse?

1,543 views
Skip to first unread message

roy....@gmail.com

unread,
Jul 19, 2015, 5:09:06 PM7/19/15
to guava-...@googlegroups.com
Hi,
I'm using the LoadingCache to create a cache with .expireAfterWrite(1, TimeUnit.MINUTES) and the cache might have a lot(hundreds) of different keys in it in a given minute.

My question is, isn't choosing 1 minute can cause the garbage collector or some internal inner loop call to extensively digest and mange the life cycle of these key value pairs?

Is it safe to use without impacting the performance of our system?

p.s., we cannot increase the expiration time, we need to be able to get the latest data from the database roughly after every minute.

Thanks,
Roy

Benjamin Manes

unread,
Jul 19, 2015, 5:32:29 PM7/19/15
to roy....@gmail.com, guava-discuss
I think you are asking if the cache's performance will degrade if it has a lot of entries that are expired and need to be evicted. If so, then yes this will impact the calling thread performing this maintenance work (but no other readers). You can reduce the chance of this occurring on user-facing threads by scheduling a background task that calls cleanUp().

There is an internal constant, DRAIN_MAX, which is to guard against this behavior. Presently it is only used for reference collection, but should be expanded to incorporate expiration as well.

Note that expiration maintenance is extremely cheap as it is implemented in O(1) time, so most likely this will not be a problem.

In Caffeine, my Java 8 rewrite, the difference is that maintenance work is always delegated to a background executor (by default ForkJoinPool.commonPool()). This avoids performing the work on user-facing threads, so no drain bounding is needed. This wasn't feasible in Guava due to Java not having a standard threadpool to delegate to, so adding an executor() configuration wouldn't carry its conceptual weight.

Cheers,
Ben

--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/b3bbf346-02e0-4c68-9181-f19a297209a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nathan Bryant

unread,
Jul 19, 2015, 8:33:44 PM7/19/15
to guava-...@googlegroups.com

Hi Ben--

So if I read you right, in a cache use-case with a high rate of expired entries per caller, there is a high(er) probability of a user-facing call site incurring an O(1) expiration cleanup cost. If you're saying this is O(1) regardless of the number of items pending expiration, I don't get the point about DRAIN_MAX--it seems contradictory.

Benjamin Manes

unread,
Jul 19, 2015, 8:57:58 PM7/19/15
to Nathan Bryant, guava-discuss
Sorry, I should have been clearer as my explanation was confusing.

The per element time is O(1) in all scenarios. If M entries have expired then it will take O(m) time to clean them up. This differs from many other designs that use a O(lg n) priority queue, where the total time would be O(m lg n).

The drain_max constant is to guard against hostile situations, such as a weak reference cache having all of its entries garbage collected. The goal is to amortize the penalty, so a cap is used so that other callers can share in the work of fully cleaning up the cache. This could also occur for expiration, but is presently not guarded against.

In Guava all of this work is performed per segment (aka concurrencyLevel), which is defaulted to 4. This means that the penalty at O(m / s), where S is the number of segments. Increasing the segments is primarily for improved write performance as each segment is independent (multiple can written to or cleaned up in parallel).

The maintenance work is performed under the segment's lock, which is also acquired for writes. This means that the penalty will increase the wait time for writers, but readers are never blocked. A loading entry is only a write in terms of inserting a reservation entry, with the computation not holding the segment's lock. So this lets us assume that, given the low write rate of a cache, the maintenance work will have a small impact on the overall performance.

I do think that Guava should guard against hostile expiration by using the drain_max cap. However, I don't think it will make a significant difference in practice given how cheap the maintenance work is. You can use #cleanUp() on a background thread if you are concerned, which will reduce the impact.

For more details you can see our Strangeloop '11 slides. Caffeine's design document may also be of interest, since there are many similarities.

Reply all
Reply to author
Forward
0 new messages