Sorry, I should have been clearer as my explanation was confusing.
The per element time is O(1) in all scenarios. If M entries have expired then it will take O(m) time to clean them up. This differs from many other designs that use a O(lg n) priority queue, where the total time would be O(m lg n).
The drain_max constant is to guard against hostile situations, such as a weak reference cache having all of its entries garbage collected. The goal is to amortize the penalty, so a cap is used so that other callers can share in the work of fully cleaning up the cache. This could also occur for expiration, but is presently not guarded against.
In Guava all of this work is performed per segment (aka concurrencyLevel), which is defaulted to 4. This means that the penalty at O(m / s), where S is the number of segments. Increasing the segments is primarily for improved write performance as each segment is independent (multiple can written to or cleaned up in parallel).
The maintenance work is performed under the segment's lock, which is also acquired for writes. This means that the penalty will increase the wait time for writers, but readers are never blocked. A loading entry is only a write in terms of inserting a reservation entry, with the computation not holding the segment's lock. So this lets us assume that, given the low write rate of a cache, the maintenance work will have a small impact on the overall performance.
I do think that Guava should guard against hostile expiration by using the drain_max cap. However, I don't think it will make a significant difference in practice given how cheap the maintenance work is. You can use #cleanUp() on a background thread if you are concerned, which will reduce the impact.
For more details you can see our Strangeloop '11
slides. Caffeine's
design document may also be of interest, since there are many similarities.