if using weighter's based on bytes shouldn't key size also be a part of the weight?

8 views
Skip to first unread message

pariodeusex

unread,
Apr 30, 2011, 9:01:38 PM4/30/11
to ConcurrentLinkedHashMap
Hi again

if I'm using a weight based on bytes shouldn't key byte size also be a
part of the weight. The map is going to store the key in order to
distinguish the right value in a bucket with more than one value. Your
weight calculation method in your tutorial only calculates the weight
on the value.

Given that the key will take up memory just as the value does, would
not the map exceed the maximum weighted capacity (in bytes) if key
sizes are not taken into account together with the value sizes if the
keys are large complex objects (like byte buffers)?

regards

pariodeusex

Benjamin Manes

unread,
May 2, 2011, 3:01:42 AM5/2/11
to ConcurrentLinkedHashMap
Howdy,

The key is usually a lightweight value object and the JVM isn't
particularly a good fit for explicit memory management. I had
originally thought that wasn't a popular use-case, e.g. off-heap
ByteBuffers. At the time I was mostly concerned with collection
values, since I'd seen poor usages cause memory leaks. I was also
accustomed to using a 2nd level cache (memcached) for off-heap memory
management and used soft-ref as a failsafe if experiencing JVM memory
pressure. In retrospect, I should have included the key for more
flexibility. So far its not been an issue, at least in that you're the
first to bring it up.

We plan to add this feature to MapMaker and supply both the key and
value to that function. That's the long-term solution, but I'm not
sure when that enhancement will be started.

If you can demonstrate a clear need where this limitation is causing
problems, then I'd be open to resolving it. Otherwise I'd defer until
MapMaker has evolved to be a satisfactory upgrade path.

Cheers,
Ben

pariodeusex

unread,
May 2, 2011, 7:55:51 PM5/2/11
to ConcurrentLinkedHashMap
Ben,

My keys in some cases are the same size as my values so taking into
account key sizes would be easy in that scenario.

However when using complex objects like byte buffers the sizes of the
keys can vary wildly meaning they have to be calculated. If they are
not then the actual size of the map can vary massively, especially if
I have a very large cache like 20GB. When I'm running a 20GB cache on
a machine with 32GB ram I will often get out of heap space because
they keys are causing the maps to go over the 20GB limit. My map was
using about 30GB in once case just because of keys overhead.

I can get round this problem for now by using the enhancements
discussed previously. Calculating the value and key size before hand
and passing the resulting weight to the put and replace methods
instead of using the conventional weighter method as well as
specifying the weight as KB instead of bytes.

Knowing exactly how large a map can become is critical for using
memory effectively for caching otherwise I would have to only have a
cache of 10GB with another 10GB sitting possibly unused which would
mean more DB hits causing overall app performance to tank.

regards

pariodeusex

Ben Manes

unread,
May 4, 2011, 11:59:53 PM5/4/11
to ConcurrentLinkedHashMap
Sorry for the delayed response.

Having pondered this issue, I'll go ahead with your proposal of adding
the specialized #put()/#replace() methods. This seems to be the least
invasive. MapMaker will be the upgrade path once it supports weights.

There were two obvious alternatives. The first was to add a
#keyWeigher() method to the builder to maintain the Weigher<E>
contract. Unfortunately this is not just confusing since the value
weigher is #weigher(), but incorrect as the default implementation
would need to no-op (return zero). That breaks the weigher's contract.

A better alternative would be to add EntryWeigher<K, V> and an
#entryWeigher() builder method. This doesn't seem favorable since it
adds to the library's vocabulary and requires that users learn the API
mistake for Weigher. It seems better to offer the methods for the
expert use-cases that need the added flexibility.

This change will be in v1.2. I'm a bit behind on CLHM, but I'll try to
get back to it soon. There's very little blocking the release, other
than a lack of time.

Thanks!
Ben

pariodeusex

unread,
May 9, 2011, 5:08:27 PM5/9/11
to ConcurrentLinkedHashMap
Thanks for adding the extra methods, saves me from having to add my
own implementation every time there is a new version.

I was wondering if there is any other overhead for each key/value pair
or bucket, apart from the values and keys. As the hashmap maintains a
linked list for each key/value pair I though there might be a 16 byte
or so overhead but I haven't had the chance to look through the
implementation thoroughly enough to make this assertion. Being able to
factor in any K/V overheads into the weight would give me much better
control over calculating the max byte size of the map.

regards

pariodeusex

Ben Manes

unread,
May 9, 2011, 7:01:03 PM5/9/11
to ConcurrentLinkedHashMap
You're welcome to fork the project temporarily until v1.2 is released.
I'm hoping to spend this upcoming weekend hacking on it. The test
suite is fairly robust so that should provide some confidence if you
make changes.

CLHM is implemented as a decorator. This has the advantage of not
inheriting CHM's complexities and opting into improvements (Doug's
recent redesign, Cliff Click's alternative). The downside is that it
incurs a slightly higher memory overhead by decorating the value
instead of inlining the fields on the Entry object. It also requires
an extra references to the key for eviction, since the LRU traverses
the values. In addition, to avoid races when the weight changes the
value + weight are stored as an immutable object which is swapped
atomically. The per-entry overhead is below, but note that the top-
level overhead is not static (uses queues to buffer instead of
incurring lock contention).

CHM.HashEntry:
K key
int hash
Node value
HashEntry<K, V> next

Node:
K key
Node prev
Node next
WeightedValue<V> weightedValue

WeightedValue:
int weight
V value

Ben Manes

unread,
May 13, 2011, 10:12:19 PM5/13/11
to ConcurrentLinkedHashMap
To revisit an earlier discussion in the issue tracker,

If you know the weight up-front for both the key and value, what is
the harm of using a wrapper and a custom weigher? The only issue I see
is that it requires an extra reference and int field, which is minor
memory-wise. This would allow you to work with the current APIs,
rather than needing new ones be added.

That was satisfactory before, even though not your preference. Is that
no longer the case?

Thanks!
Ben
Reply all
Reply to author
Forward
0 new messages