disabling CacheStats collection by default

318 views
Skip to first unread message

Charles Fry

unread,
Apr 2, 2012, 10:30:16 PM4/2/12
to guava-discuss
We're looking at making some changes to how CacheBuilder-created caches collect statistics (a la Cache.stats()).

Specifically, we have hitherto turned on stats collection by default, without providing a way to turn it off. As some of you may be aware this was requested long ago in Issue 863, and we regretfully ignored it at the time. But we've come to realize that stats counting does indeed impose undesirable overhead.

Our current plan (a) is to disable stats collection by default (making Cache.stats() throw), unless CacheBuilder.enableStats() (or something like that) is called. Obviously this will break any code that currently calls Cache.stats() at upgrade time. We don't like that, but so far we consider it preferable to the other options.

Other alternatives we've considered are:

(b) don't make stats() throw, but instead turn on stats collection the first time stats() is called. This sounds nice, but has the problem of potentially generating misleading stats, especially as the cache hit rate will be much lower at startup time.

(c) Add CacheBuilder.disableStats() instead of enableStats(). This seems undesirable because stats are extra functionality which it makes clear sense to opt into, whereas forcing users to opt-out means that you only need to look for it when you're having performance issues, which feels a little backwards.

We'd love to hear what you think about all of these options. We're sensitive to the one-time difficulty this upgrade will represent to users of CacheStats, but at the same time we want to make the right long-term API decision here. Please share whatever relevant thoughts you have to this decision.

thanks,
Charles

Blair Zajac

unread,
Apr 2, 2012, 11:11:51 PM4/2/12
to guava-...@googlegroups.com
On 4/2/12 7:30 PM, Charles Fry wrote:
> We're looking at making some changes to how CacheBuilder-created caches collect
> statistics (a la Cache.stats()).
>
> Specifically, we have hitherto turned on stats collection by default, without
> providing a way to turn it off. As some of you may be aware this was requested
> long ago in Issue 863, and we regretfully ignored it at the time. But we've come
> to realize that stats counting does indeed impose undesirable overhead.

I'm curious, can you share what amount of overhead you saw?

> Our current plan (a) is to disable stats collection by default (making
> Cache.stats() throw), unless CacheBuilder.enableStats() (or something like that)
> is called. Obviously this will break any code that currently calls Cache.stats()
> at upgrade time. We don't like that, but so far we consider it preferable to the
> other options.

One vote from me for this plan.

Blair

Benjamin Manes

unread,
Apr 2, 2012, 11:19:38 PM4/2/12
to Blair Zajac, guava-...@googlegroups.com
Any chance an Optional type can be returned instead, or is it too late to make those type of API changes?

I would expect striped counters would be efficient enough. While there isn't the ability to stripe by processor yet, you can hash by thread id to reduce the likelihood of contention. JDK8 LongAdder and Cliff Click's Counter perform pretty well.

cowwoc

unread,
Apr 2, 2012, 11:42:10 PM4/2/12
to guava-...@googlegroups.com

Alternatively, add start/stop() methods to the CacheStats class and
don't touch the Cache interface. The upside is that stats() doesn't
throw an exception (since it's never in a bad state). The downside is
that users are less likely to notice that the stats are now disabled by
default.

Gili

tsuna

unread,
Apr 3, 2012, 12:32:32 AM4/3/12
to Blair Zajac, Charles Fry, guava-...@googlegroups.com
On Mon, Apr 2, 2012 at 8:11 PM, Blair Zajac <bl...@orcaware.com> wrote:
> On 4/2/12 7:30 PM, Charles Fry wrote:
>> long ago in Issue 863, and we regretfully ignored it at the time. But
>> we've come to realize that stats counting does indeed impose undesirable overhead.
>
> I'm curious, can you share what amount of overhead you saw?

Yeah I'm curious too. Guava's Cache stats are pretty much the only
place in my applications that still use AtomicLong for frequently
incremented counters (I use jsr166e's LongAdder everywhere else).

I don't know about the others, but my organization hasn't even started
to evaluate Java 7, and I don't know any other that has (in the Hadoop
/ HBase world), so JDK8 is waaaaaaaays out. I'm not going to wait
years to be able to use some of its library features. So if Guava
could bundle something simple like LongAdder and use it for
CacheStats, that would be a double win and could maybe alleviate (if
not eliminate) your performance concerns. This way stats could be
still enabled by default, and could be disabled through CacheBuilder
if absolutely necessary.

--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Louis Wasserman

unread,
Apr 3, 2012, 12:42:00 AM4/3/12
to tsuna, Blair Zajac, Charles Fry, guava-...@googlegroups.com
I'd still prefer investigating LongAdder-based concurrent stats before making API changes.
Louis Wasserman
wasserm...@gmail.com
http://profiles.google.com/wasserman.louis


Philipp Wendler

unread,
Apr 3, 2012, 2:12:46 AM4/3/12
to guava-...@googlegroups.com
Hi,

Am 03.04.2012 04:30, schrieb Charles Fry:
> We're looking at making some changes to how CacheBuilder-created caches
> collect statistics (a la Cache.stats()).
>
> Specifically, we have hitherto turned on stats collection by default,
> without providing a way to turn it off. As some of you may be aware this
> was requested long ago in Issue 863, and we regretfully ignored it at the
> time. But we've come to realize that stats counting does indeed impose
> undesirable overhead.
>
> Our current plan (a) is to disable stats collection by default (making
> Cache.stats() throw), unless CacheBuilder.enableStats() (or something like
> that) is called. Obviously this will break any code that currently calls
> Cache.stats() at upgrade time. We don't like that, but so far we consider
> it preferable to the other options.

I wouldn't object to that, although it would be better of course if
there was a change which would give a compile-time error. Changing the
result to Optional<CacheStats> might really be an option.

> Other alternatives we've considered are:
>
> (b) don't make stats() throw, but instead turn on stats collection the
> first time stats() is called. This sounds nice, but has the problem of
> potentially generating misleading stats, especially as the cache hit rate
> will be much lower at startup time.

If this solution is adopted, I wouldn't turn on stats collection the
first time stats() is called, which is some unexpected side-effect and
really might give weird results. Instead always return a CacheStats
object with all times etc. set to 0. This would prevent application
crashes, and would show only results which are obviously wrong (and not
wrong in a manner which could remain undetected).

Greetings, Philipp

Frank Pavageau

unread,
Apr 3, 2012, 5:16:43 AM4/3/12
to tsuna, Blair Zajac, Charles Fry, guava-...@googlegroups.com
On Tue, Apr 3, 2012 at 6:32 AM, tsuna <tsun...@gmail.com> wrote:
> On Mon, Apr 2, 2012 at 8:11 PM, Blair Zajac <bl...@orcaware.com> wrote:
>> On 4/2/12 7:30 PM, Charles Fry wrote:
>>> long ago in Issue 863, and we regretfully ignored it at the time. But
>>> we've come to realize that stats counting does indeed impose undesirable overhead.
>>
>> I'm curious, can you share what amount of overhead you saw?
>
> Yeah I'm curious too.  Guava's Cache stats are pretty much the only
> place in my applications that still use AtomicLong for frequently
> incremented counters (I use jsr166e's LongAdder everywhere else).

I can illustrate that: we had a tree of Caches (with 2 levels and a
reasonable fanout) and ended up with 300K AtomicLong (6 per Cache + 6
per Segment) when we switched from Guava r09 to 10.0. That's a fair
amount of memory, and quite a few more objects to scan during the GC.
Tuning the concurrency level on the sub-Caches reduced that, but we
eventually replaced the tree with a single Cache and a complex key, at
the cost of instanciating a key for each cache request.

Frank

Kevin Bourrillion

unread,
Apr 3, 2012, 11:20:17 AM4/3/12
to Philipp Wendler, guava-...@googlegroups.com
On Mon, Apr 2, 2012 at 11:12 PM, Philipp Wendler <m...@philippwendler.de> wrote:
> Our current plan (a) is to disable stats collection by default (making
> Cache.stats() throw), unless CacheBuilder.enableStats() (or something like
> that) is called. Obviously this will break any code that currently calls
> Cache.stats() at upgrade time. We don't like that, but so far we consider
> it preferable to the other options.

I wouldn't object to that, although it would be better of course if
there was a change which would give a compile-time error. Changing the
result to Optional<CacheStats> might really be an option.

If there's a good argument to be made for Optional<CacheStats> on its own merits, without the issue of upgrade pain, that's worth hearing and considering. But if the cost of easing one-time upgrade pain is to be stuck with an otherwise inappropriate API for the rest of Guava's lifetime, it wouldn't be worth it.

 
> (b) don't make stats() throw, but instead turn on stats collection the
> first time stats() is called. This sounds nice, but has the problem of
> potentially generating misleading stats, especially as the cache hit rate
> will be much lower at startup time.

If this solution is adopted, I wouldn't turn on stats collection the
first time stats() is called, which is some unexpected side-effect and
really might give weird results. Instead always return a CacheStats
object with all times etc. set to 0. This would prevent application
crashes, and would show only results which are obviously wrong (and not
wrong in a manner which could remain undetected).

Ah, I hadn't thought of that. That could be better than throwing an exception. Again the question is, would that be a useful behavior even in Guava 19.0?  Or: is it worth doing as a onetime transitional measure in 12.0 only, then moving to exceptions after that?



--
Kevin Bourrillion @ Google

Chris Povirk

unread,
Apr 3, 2012, 11:25:00 AM4/3/12
to Kevin Bourrillion, Philipp Wendler, guava-...@googlegroups.com
I'd be upset if I found a performance problem in my app, turned off
stats to fix it, and suddenly started serving 500s because my debug
info contained a now-disabled CacheStats that decided to throw
exceptions. I would prefer 0s (or -1s or something else obviously
bogus).

Louis Wasserman

unread,
Apr 3, 2012, 11:24:50 AM4/3/12
to Kevin Bourrillion, Philipp Wendler, guava-...@googlegroups.com
If this solution is adopted, I wouldn't turn on stats collection the
first time stats() is called, which is some unexpected side-effect and
really might give weird results. Instead always return a CacheStats
object with all times etc. set to 0. This would prevent application
crashes, and would show only results which are obviously wrong (and not
wrong in a manner which could remain undetected).

Ah, I hadn't thought of that. That could be better than throwing an exception. Again the question is, would that be a useful behavior even in Guava 19.0?  Or: is it worth doing as a onetime transitional measure in 12.0 only, then moving to exceptions after that?

I like returning a dummy CacheStats (for all time), though I might take it further: add a boolean isCollectingStats method to CacheStats, perhaps, or something to indicate whether the underlying cache was actually collecting stats, or is just returning a "dummy" stats object.

Philipp Wendler

unread,
Apr 3, 2012, 11:50:29 AM4/3/12
to guava-...@googlegroups.com
Hi,

Thinking about it, this is right.
There really needs to be a way to query (without catching an exception)
whether statistics are enabled or not. I for example might also want to
disable statistics in some runs of my application, and have it enabled
in other runs (and then use it).

So either have stats() return null, or an Optional<CacheStats> or have a
method CacheStats.statisticsEnabled(). The null-solution is used in
similar cases in the Java API [1], but I'd like Optional more.
Of the remaining two solutions, the Optional<CacheStats> makes the
"optionallity" more explicit.
When returning a dummy CacheStats object, one could still decide whether
to return a fake value from all getters except "statisticsEnabled()", or
let them throw an exception.
As I prefer to not have fake values (even if they are quite obvious),
and having an object with almost all methods throwing exceptions seems
weird, and the functionality of Optional would just be reimplemented in
CacheStats, I would prefer the Optional<CacheStats> version.

Greetings, Philipp

[1] >
http://docs.oracle.com/javase/6/docs/api/java/lang/management/MemoryPoolMXBean.html#getCollectionUsage()

Louis Wasserman

unread,
Apr 3, 2012, 12:07:58 PM4/3/12
to Philipp Wendler, guava-...@googlegroups.com
Hrrrrm.  My concern with Optional<CacheStats> is that
  1. I'm not convinced that the migration effort demanded is worthwhile.
  2. I'm not convinced that users won't respond to Optional.absent() by pretending the CacheStats is all zeroes, anyway, but having to write that logic themselves.

tsuna

unread,
Apr 3, 2012, 12:10:54 PM4/3/12
to Frank Pavageau, Blair Zajac, Charles Fry, guava-...@googlegroups.com
On Tue, Apr 3, 2012 at 2:16 AM, Frank Pavageau <frank.p...@gmail.com> wrote:
> On Tue, Apr 3, 2012 at 6:32 AM, tsuna <tsun...@gmail.com> wrote:
>> On Mon, Apr 2, 2012 at 8:11 PM, Blair Zajac <bl...@orcaware.com> wrote:
>>> On 4/2/12 7:30 PM, Charles Fry wrote:
>>>> long ago in Issue 863, and we regretfully ignored it at the time. But
>>>> we've come to realize that stats counting does indeed impose undesirable overhead.
>>>
>>> I'm curious, can you share what amount of overhead you saw?
>>
>> Yeah I'm curious too.  Guava's Cache stats are pretty much the only
>> place in my applications that still use AtomicLong for frequently
>> incremented counters (I use jsr166e's LongAdder everywhere else).
>
> I can illustrate that: we had a tree of Caches (with 2 levels and a
> reasonable fanout) and ended up with 300K AtomicLong (6 per Cache + 6
> per Segment)

I don't think that's what Charles was referring to though. In the
issue he referred to [1], the author of elasticsearch is reporting
that the stats add volatile writes to the read path, something that
was otherwise carefully avoided by CHM (the implementation of which is
mostly re-used for LocalCache).

If you wound up with 300K AtomicLong it's because you had either too
many LocalCaches, or you configured them to create too many segments.
But that's not the issue being discussed here.


Anyways, I urge the Guava team to consider adding jsr166e LongAdder to
Guava (it's literally only 2 classes) and then switch the CacheStats
implementation to it. I'm sure that would make Shay happy in [1], and
it would me happy too. And this way we don't have to add a new API or
change the default semantics of the API, and this whole discussion
about whether to return null or 0 or Optional<?> or whatever is moot.

[1] http://code.google.com/p/guava-libraries/issues/detail?id=863

Blair Zajac

unread,
Apr 3, 2012, 1:11:00 PM4/3/12
to Louis Wasserman, Kevin Bourrillion, Philipp Wendler, guava-...@googlegroups.com
Or return the time in millis or nanos when stats collecting was started, if the time is 0 then no stats are being collected.

Blair

Charles Fry

unread,
Apr 4, 2012, 3:07:45 PM4/4/12
to Benjamin Manes, Blair Zajac, guava-...@googlegroups.com
I would expect striped counters would be efficient enough. While there isn't the ability to stripe by processor yet, you can hash by thread id to reduce the likelihood of contention. JDK8 LongAdder and Cliff Click's Counter perform pretty well.

Here are benchmark results (all benchmarks will be released publicly in the next few months):

                   benchmark         ns runtime
           ConcurrentHashMap     15.016 =====
        CacheBuilder_noStats     56.316 ====================
       CacheBuilder_newStats     61.118 ======================
                CacheBuilder     77.752 ============================
           ComputingMapMaker     54.709 ====================
 LoadingCacheBuilder_noStats     54.808 ====================
LoadingCacheBuilder_newStats     81.018 =============================
         LoadingCacheBuilder     81.163 ==============================

The newStats are with LongAdder, noStats with disableStats (which I'm proposing become the default), and the current default is AtomicLong.

I think this makes a clear case for turning stats off by default (which would continue to result in static zero stats being returned, as disableStats() does now). I agree that we should migrate stats to LongAdder, however.

Charles

Benjamin Manes

unread,
Apr 4, 2012, 4:00:10 PM4/4/12
to Charles Fry, Blair Zajac, guava-...@googlegroups.com
Can you also try using an AtomicLongArray and choosing an index location based on the thread id? e.g. 

  static final int ceilingNextPowerOfTwo(int x) {
    // From Hacker's Delight, Chapter 3, Harry S. Warren Jr.
    return 1 << (Integer.SIZE - Integer.numberOfLeadingZeros(x - 1));
  }
  static final SIZE_MASK = ceilingNextPowerOfTwo(Runtime.getRuntime().availableProcessors()) - 1;

  final AtomicLongArray hits = new AtomicLongArray(SIZE_MASK + 1);
  hits.incrementAndGet((int) Thread.currentThread().getId() & SIZE_MASK);

While it keeps the volatile write path, it helps ensure an even spread of the volatile writes to avoid contention. I suspect your Caliper test isn't multi-threaded, though, so it may not show much of an improvement.

Another approach is to use thread-local statistics and drain them periodically. This would mean the stats are eventually consistent and may be lossy. While this could be a surprise, it is pragmatically good enough for how stats are consumed.

A few years back Memcached had a performance issue with collecting stats as well. I think they ended up using thread-local stats, but I don't recall the final solution. It may be good to take a quick peak and see what they did.

-Ben

Charles Fry

unread,
Apr 4, 2012, 4:27:05 PM4/4/12
to Benjamin Manes, Blair Zajac, guava-...@googlegroups.com
The benchmark is single-threaded, so we won't see any difference there. Once we have a simple multi-threaded benchmark (which we definitely need) then we can more deeply explore such options.

Benjamin Manes

unread,
Apr 4, 2012, 4:30:26 PM4/4/12
to Charles Fry, Blair Zajac, guava-...@googlegroups.com
Then I think the choice is between disablement and thread-local stats + periodic aggregation. Can you try thread-local just so that we can see the perf impact?

tsuna

unread,
Apr 4, 2012, 4:47:11 PM4/4/12
to Charles Fry, Benjamin Manes, Blair Zajac, guava-...@googlegroups.com
On Wed, Apr 4, 2012 at 1:27 PM, Charles Fry <f...@google.com> wrote:
> The benchmark is single-threaded, so we won't see any difference there.

Then we can't draw any conclusion based on this benchmark.

Charles Fry

unread,
Apr 4, 2012, 5:12:58 PM4/4/12
to tsuna, Benjamin Manes, Blair Zajac, guava-...@googlegroups.com
> The benchmark is single-threaded, so we won't see any difference there.

Then we can't draw any conclusion based on this benchmark.

Well, we _can_ conclude that stats collection has an undesirable impact on latency, which supports our desire to disable it by default.

But you're right, as for the _best_ way to collect stats, it is inconclusive. And I don't plan to spend any further time on this, but I'd welcome any volunteer contribution, whether of performance analysis or code improvements!

Charles 

Kevin Bourrillion

unread,
Apr 4, 2012, 5:18:51 PM4/4/12
to Charles Fry, tsuna, Benjamin Manes, Blair Zajac, guava-...@googlegroups.com
I agree that it is more important that we stop forcibly collecting stats for all users whether they want it or not, than that we make stat collection absolutely optimal when it's enabled.


--
Reply all
Reply to author
Forward
0 new messages