Caching - a Java 8 rewrite

Benjamin Manes

unread,

Jan 4, 2015, 5:50:30 AM1/4/15

to guava-discuss, Charles Fry

I thought a fun way to kick off the new year would be to see what a Java 8 version of Guava's cache might look like. It took about two weeks to write a feature compatible implementation built on top of Java 8's rewrite of ConcurrentHashMap (CHM). I call this new cache Caffeine.

The design is based on my original proposal when Kevin, Charles, and I began to discuss a caching library. At the time significant effort had already been put into soft & weak reference based caching, as there had been a belief that it was ideal to delegate caching to the garbage collector and obtain the concurrency for free. As that is no longer the recommended strategy, this implementation is a fork of ConcurrentLinkedHashMap (CLHM) where a decorator sits on top of a computing map. An unexpected advantage is that the new ConcurrentHashMap was designed to not be vulnerable to the denial-of-service via hash collision attack, which Guava still is. On the API side, Caffeine uses a fork of Guava's with some Java 8 idioms and exposes a few advanced features.

While there is a lot of work to do before switching focus to performance, the initial benchmarks shows a clear improvement. Of course, the quick prototyping of features significantly degraded performance compared to CLHM, but hopefully that can be restored. There are also a number of optimizations ideas worth exploring that never made it into CLHM, so with effort the final implementation may outperform it.

The benchmarks below use JMH on a 4c/8HT macbook. The read/write thread count was set to the low value of 4 due to a long standing Guava bug that can cause an OutOfMemoryError (crash) at high concurrency for synthetic workloads. The compute benchmark was set to 32 threads to show the impact of lock contention and the benefits of prescreening. Past benchmarking with CLHM demonstrated that it scaled in line with the backing hash table, which is now based on per-node locks. Guava retains the segmented locking strategy, which is known to degrade noticeably at around 8-12 cores.

	random reads / sec	random writes / sec	compute same key / sec	compute random key / sec
ConcurrentLinkedHashMap	76,569,121 (4t)	16,566,954 (4t)	-	-
Caffeine	45,437,638 (4t)	18,228,124 (4t)	723,280,326 (32t)	90,563,072 (32t)
Guava	22,740,644 (4t)	11,087,604 (4t)	45,185,485 (32t)	66,620,017 (32t)

So far most of the work was building a robust test suite based on specification metadata. This matches my proposal on the internal wiki back when the tests were a mix of Bob's, Charles', and mine (those that I ported from over). Guava's current suite shows a lot of work by Charles, et. al. but is also fairly brittle in regards to combinational testing. In contrast by using the @CacheSpec annotation to specify the constraints, injecting contextual information, and moving low-level validation into assert matchers the tests are very generic. There are currently 180,000 tests that execute in 24s.

While there are benefits like a true maximum size threshold (single LRU chain), there are some disadvantages in this design. Guava's weak/soft reference caching is primary with all other features tacked on. Bob's goal was to reduce the memory footprint by doing this, as ReferenceMap's wrappers were replaced with embedded values in the hash entry object. By having forked the hash table, we were able to add LRU, expiration, and weights directly to the entry as well. By choosing not to fork the new Java 8 CHM, all of these fields must be held in a wrapper on top of the value. To maintain a single maximum size / expiration policy (not segmented), the weight field (an integer) must always be present. This is because its unused bits are leveraged in a state machine that resolves race conditions between the hash table and the LRU policy.

There is a lot of work left to do before I would call this production ready. I'd estimate 2-4 work weeks to mature the implementation code from a prototype sprint to something that is clean and elegant. Another 3-4 work weeks should be dedicated to performance improvements: small optimizations, improving buffer management, and reducing memory overhead. A future release should focus on adding tracing and a simulator for both users (cache sizing) and developers (evaluating alternatives to LRU).

Hopefully this work will factor into the larger discussion of what Guava's future is as the team begins to tackle Java 8.

Happy new year,

Ben

Charles Fry

unread,

Jan 5, 2015, 10:17:18 AM1/5/15

to Benjamin Manes, guava-discuss

This is super exciting! I'm glad to see that the ideas we once discussed are finally coming to fruition. I certainly look forward to watching this evolve!

Charles

--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss

This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/CAGu0%3DMP9Q7LSgWyCHPhLHfE%3DQ-n9W9us2YdYP%3DWUZkid4_8mSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Christian Gruber

unread,

Jan 5, 2015, 10:20:47 AM1/5/15

to Benjamin Manes, guava-discuss, Charles Fry

This is great!

As a side note,I have been of the opinion, for a while, that the caching system of guava really deserves its own module/project (as does eventbus). We should factor that into the discussion of how to factor java8 and cacheing into our longer term Guava plans. Regardless, thanks for the analysis and initial efforts on this.

I like the concept (though waffley on the name) of an explicit SPI (the Cache#advanced() method) to separate access to internal knob-twiddling from the base-line API, incidentally. I would rename size() to estimatedSize() even with the javadoc qualification, personally.

Additionally, at this point we might consider separating out the writable API from the fetching API, since caches that don't "put" but rather have a lambda to obtain/fetch new values aren't really usefully going to use put() methods on the front-line Cache api.

That's all I got on my first e-mail back from holidays, but I'm excited.

Christian.

--

Benjamin Manes

unread,

Jan 5, 2015, 10:35:37 AM1/5/15

to Christian Gruber, guava-discuss, Charles Fry

The name 'Advanced' is pretty awful and I'd love to change it to something meaningful. Most likely more of its methods will need to provide Optional return values as it couples us to implementation choices. For example if it allows returning N hottest, that may not be possible if with a different eviction policy.

A confusion with size / estimatedSize presently is that it is the number of entries, whereas some configurations are by the weighted size (available in #advanced()). Without reading the JavaDoc, a user might not know what a size method refers to. A less attractive name, but more explicit, is CHM's #mappingCount().

Benjamin Manes

unread,

Jan 10, 2015, 2:41:16 AM1/10/15

to Christian Gruber, guava-discuss, Charles Fry

I added Guava adapters for API compatibility, validated them by porting the relevant Guava's tests over, and wrote a short document of user-visible differences.

The only nice behavior that cannot be duplicated is non-blocking invalidations. In Guava, an entry being computed is ignored during write operations (put, remove, clear, etc) and can be clobbered. If the value is incomplete it is overwritten and when the loading completes that value is discarded and the notification sent. Caffeine delegates to ConcurrentHashMap which will always block on concurrent writes to an entry, whether its loaded or not, as it must provide stricter semantics. Changing this behavior would require modifying the implementation, which I do not think is the right approach.

While clobbering is a nice feature, I think losing that ability is minor in practice. It also isn't one that users are expected to be aware of or rely upon, but is simple an implementation choice. Note that if an asynchronous cache is used then this difference isn't as noticeable because the Future values are immediately present from ConcurrentHashMap's perspective.

After fixing a few reference caching bugs, there are now over 1 million tests being executed and 93% coverage. The majority of those test methods are run against Guava and Caffeine, which helped identify bugs where the two were inconsistent.

Ben Manes

unread,

Feb 1, 2015, 5:32:24 AM2/1/15

to guava-...@googlegroups.com, cgr...@google.com, yrfse...@gmail.com

Its been 3 weeks since my last status update. Other than Charles' and Christian's encouraging words, I haven't heard anything from the Guava team. Instead of providing unwanted detail here, I have begun documenting on the Github wiki pages. That way there is visibility and transparency if Guava becomes interested, and no spam otherwise.

A brief description of what has been completed so far:

1) Code generation of entry types to minimize memory overhead, similar to Guava's hand-rolled EntryFactory. In most cases the overhead is equivalent, except that reference caching overhead is smaller in Guava. This is expected as Guava forked ConcurrentHashMap for this exact reason and in order to provide an alternative to JBoss's ConcurrentReferenceHashMap that was being offered to JSR166. The overhead is similar to Google Collection's ReferenceMap. I believe that this is acceptable as large reference caches are a known performance problem.

2) Reworked entry state transitions to be encoded using the entry's key instead of requiring the weight field to be present.

3) Initial adoption of relaxed reads and writes. This reduces CPU stalls due to memory fences when operating on volatile fields. These optimizations may sometimes be applied by Hotspot if biased locking is disabled. Per-entry relaxed operations are complete; top-level usage pending.

4) Tracing and simulator implemented. When tracing is enabled it is asynchronously written to a file in a text or binary format. A simulator was implemented that can be driven by a trace file or synthetic distributions. The simulator includes a few simple eviction policies. Note that the caches have not yet been instrumented with tracing.

5) Various improvements to the APIs, code quality, coverage, JavaDoc, async batch loading, etc.

There is still a lot of work remaining, which is listed on the roadmap. Of note includes more code generation to reduce baseline memory overhead and dynamic buffer sizing based on contention to reduce memory usage and speedier clean ups.

Cheers,

Ben

To unsubscribe from this group and stop receiving emails from it, send an email to guava-discuss+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward