Guava Cache Clanup behaviour

1,275 views
Skip to first unread message

sau...@gmail.com

unread,
Oct 31, 2017, 1:24:16 PM10/31/17
to guava-discuss
Hi,
 
I am trying to understand guava cache cleanup behaviour. Based on doc I read, it is not necessary that expired items in the cache would be removed  until cache is accessed again. To ensure expired items are removed, we should invoke cache.cleanup(). Also, I observed in the code base that maximum number of items during single cleanup is limited to 16.

Consider the following scenario:
The cache is defined with expireAfterAccess defined at 10 seconds. Suppose after 1 minute of cache access, the requests to cache stop and restart after 5 minutes later. Let us assume that cache contained 10,000 items when traffic to cache stopped. Thus, when the traffic to cache restarts, all items in the cache would be expired. So when the first request after the 5 minute duration tries to access the cache, only 16 of these items would be removed. Even if I forcefully invoke, cache.cleanup() before the first access(either read or write), only 16 items would get removed. I wanted to define business logic based on RemovalListener so I need all expired items to be removed from cache.

Is my understanding correct? If so, my calculation for the first request would be wrong as not all expired items would get removed from cache. What is the best alternative around this?

Thanks

Louis Wasserman

unread,
Oct 31, 2017, 2:34:33 PM10/31/17
to sau...@gmail.com, guava-discuss
That doc you linked to goes on to say: 
> Instead, we put the choice in your hands. If your cache is high-throughput, then you don't have to worry about performing cache maintenance to clean up expired entries and the like. If your cache does writes only rarely and you don't want cleanup to block cache reads, you may wish to create your own maintenance thread that calls Cache.cleanUp() at regular intervals.
> If you want to schedule regular cache maintenance for a cache which only rarely has writes, just schedule the maintenance using ScheduledExecutorService.

Does that help address your issue?

--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/2cd6579d-8b69-40c9-b3d6-5b5d8e19604e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sau...@gmail.com

unread,
Oct 31, 2017, 5:11:51 PM10/31/17
to guava-discuss
Not really.

I understand that high throughput would assist in performing cleanUp(). In absence of high throughput, we should schedule regular maintenance. I am however, looking at a very particular case.
As I mentioned, I am interested in the value that get removed from the cache everytime the cache gets accessed. The rate of access can be vary over a period of time (sometimes high throughput sometimes very low). Even I very to schedule regular maintenance using  ScheduledExecutorService, access to the cache between two executions will result in incorrect data as no items even though expired will get removed.
My current approach is to invoke cleanUp() as part of my request path prior to the request accessing the cache. However, the code linked above mentions that MAX_DRAIN =16. So in the scenario described in the original mail, the cache would remove only 16 items for every cleanUp() invocation even though more items are expired
Therefore it will require multiple requests to invoke cleanUp() over a period of time before the cache catches up on items that need to be removed.

A "potential option" is used combine the two approaches i.e. invoke cleanUp() prior to cache access and also have a regular scheduled maintenance take place. However that approach seems does not seem to be an appropriate option .

So what I am trying to identify: Is my understanding of usage of MAX_DRAIN correct? Does cleanup() remove only a selected number of items from cache even though there might be more expired items in the cache. If so, other than using potential option, what would be best way to ensure cache removes all expired items whenever cleanUp() is invoked irrespetive of the number of expired items

Thanks

Louis Wasserman

unread,
Oct 31, 2017, 5:18:11 PM10/31/17
to sau...@gmail.com, guava-discuss
>access to the cache between two executions will result in incorrect data as no items even though expired will get removed. 
What data will be incorrect?

Calls to cache.get, getUnchecked, getIfPresent, etc. will never return expired entries.  The only incorrect data you could get from a cache that has not fully evicted expired entries is from Cache.size(); everything else correctly ignores expired entries.

I don't believe there is any way to guarantee that all expired entries are evicted at any particular time.  But for most applications this isn't actually a problem.  I'm still not convinced, based on what you've told me, that that's the case for you.

sau...@gmail.com

unread,
Oct 31, 2017, 5:28:27 PM10/31/17
to guava-discuss
>>What data will be incorrect?

>>Calls to cache.get, getUnchecked, getIfPresent, etc. will never return expired entries.  The only incorrect data you could get from a cache that has not fully evicted expired entries is from Cache.size(); everything else correctly ignores expired entries

I have overriden onRemoval() in RemovalListener to perform some business logic. Since this gets invoked only when entries are "removed" from cache, it is necessary that all expired items get removed.

Benjamin Manes

unread,
Oct 31, 2017, 5:32:24 PM10/31/17
to Louis Wasserman, sau...@gmail.com, guava-discuss
There is a conflict in assumptions. In the original post,

I wanted to define business logic based on RemovalListener so I need all expired items to be removed from cache.

That indicates an expectation of active expiration by a background thread, rather than passive best-effort behavior. The lack of guaranteed promptness of the notification and the amortized limit on an explicit call, means the dependent business logic's requirements are not met. However, cleanUp() is an expensive call since it acquires the lock making it poor do perform after every read. That's why the Guava docs suggest a scheduled executor to help coerce promptness, but does not imply any strictness either.

This came up recently [1] for a user who wanted to use expiration for API request timeouts. The lack of promptness surprised him, since that would be how he was going to reply to the caller. Instead he used a ScheduledExecutor to cancel the future rather than expiration, or in Java 9 the orTimeout would have been an option. Some users want to use cache expiration as a timer service, which isn't how we typically think of caching. It might be worth looking into for Java 9 using the new dedicated global scheduler thread, but isn't an approach supported by the caches today.


To unsubscribe from this group and stop receiving emails from it, send an email to guava-discuss+unsubscribe@googlegroups.com.

--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/CAGB9EW_aEqHWGdOi9iwhuCPF6JARdNSCaH0nTOP4E2_9GyCdRQ%40mail.gmail.com.

sau...@gmail.com

unread,
Oct 31, 2017, 5:44:13 PM10/31/17
to guava-discuss
>> That indicates an expectation of active expiration by a background thread, rather than passive best-effort behavior. The lack of guaranteed promptness of the notification and the amortized limit on an explicit call, means the dependent business logic's requirements are not met. However, cleanUp() is an expensive call since it acquires the lock making it poor do perform after every read. That's why the Guava docs suggest a scheduled executor to help coerce promptness, but does not imply any strictness either.

Thanks for the reply. While I agree that cleanUp() is an expensive call, what got me surprised was that only 16 (MAX_DRAIN) expired Items get removed from the cache (this might be performance-related). I was expecting a cleanUp() to be a "total" cleanup rather than a partial one. In my scenario, it would have been acceptable if cleanup took longer assuming that it did a "total" cleanup.
Probably using cache, might not be best data structure for me given my use case. I probably need to come up with custom solution.

--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.

Benjamin Manes

unread,
Oct 31, 2017, 5:59:31 PM10/31/17
to sau...@gmail.com, guava-discuss
Typically the cleanUp is amortized across callers during read and writes. To avoid overly penalizing one in a bad scenario, causing odd timeouts, a limit is used to spread the cost if the cache was way behind. This isn't parameterized so that an explicit Cache.cleanUp() does the full work as you want. There's no good reason why it couldn't be changed, it merely hadn't been an issue.

Probably using cache, might not be best data structure for me given my use case. I probably need to come up with custom solution.

Yes, for it doesn't fit what the APIs guarantee. Most likely coordinating with your own ScheduledExecutorService is preferable. Unless you have a large number of entries and a high churn rate, that shouldn't be a problem. In the cases where one would, support by the cache could be made more efficient by only having 1 task scheduled rather than all N. Since there is a desire to not have dedicated threads owned by a cache, I think this feature (if ever implemented) would be delayed until Java 9.

To unsubscribe from this group and stop receiving emails from it, send an email to guava-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/ce482c1b-871b-43a5-9bc5-7ebaedfd6d4b%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages