NullpointerException during ConcurrentHashMap.remove generated by a get to the ConcurrentLinkedHashMap

194 views
Skip to first unread message

Patricio Echague

unread,
Oct 20, 2010, 6:11:34 PM10/20/10
to ConcurrentLinkedHashMap
Hi all, we are getting this exception[1] sort of intermittently. Any
idea if the issue is on our code or a possible bug in the cache?

Thanks

[1]
Trace: java.lang.NullPointerException
at
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:
932)
at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.evict(ConcurrentLinkedHashMap.java:
275)
at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.access
$100(Concur
rentLinkedHashMap.java:87)
at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap
$AddTask.run(Concu
rrentLinkedHashMap.java:450)
at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.drainWriteQueue(C
oncurrentLinkedHashMap.java:374)
at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.attemptToDrainEvi
ctionQueues(ConcurrentLinkedHashMap.java:346)
at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.processEvents(Con
currentLinkedHashMap.java:419)
at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.get(ConcurrentLin
kedHashMap.java:586)
at
org.apache.jackrabbit.core.persistence.util.LRUBundleCache.get(LRUBundleCache.ja
va:162)

Ben Manes

unread,
Oct 20, 2010, 6:23:37 PM10/20/10
to ConcurrentLinkedHashMap
My guess is that in #evict() it is pulling off the sentinel node
(circular list), which doesn't have a key/value instance. This should
only happen if its above capacity but the map is empty, which
obviously shouldn't occur.

Can you point me to the code snippet using this? It would be
beneficial to reproduce this standalone in the unit tests.

Thanks!
Ben

On Oct 20, 3:11 pm, Patricio Echague <patric...@gmail.com> wrote:
> Hi all, we are getting this exception[1] sort of intermittently. Any
> idea if the issue is on our code or a possible bug in the cache?
>
> Thanks
>
> [1]
> Trace: java.lang.NullPointerException
>   at
> java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:
> 932)
> at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.evict(Concur­rentLinkedHashMap.java:
> 275)
> at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.access
> $100(Concur
> rentLinkedHashMap.java:87)
> at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap
> $AddTask.run(Concu
> rrentLinkedHashMap.java:450)
> at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.drainWriteQu­eue(C
> oncurrentLinkedHashMap.java:374)
> at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.attemptToDra­inEvi
> ctionQueues(ConcurrentLinkedHashMap.java:346)
> at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.processEvent­s(Con
> currentLinkedHashMap.java:419)
> at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.get(Concurre­ntLin
> kedHashMap.java:586)
> at
> org.apache.jackrabbit.core.persistence.util.LRUBundleCache.get(LRUBundleCac­he.ja
> va:162)

Patricio Echague

unread,
Oct 20, 2010, 6:38:40 PM10/20/10
to ConcurrentLinkedHashMap
Hi Ben, I don't have any snippet handy. It happened in our pre-
production environment kind of randomly.

We are using apache home-modified apache Jackrabbit optimized for high
scalability. Hence, I introduced your library.

The line that does a simple get [1] is that one that get the
exception.

org.apache.jackrabbit.core.persistence.util.LRUBundleCache.get(LRUBundleCache.java:
162)

I saw what you said during the #evict()

while (isOverflow()) {
Node<K, V> node = sentinel.next;
// Notify the listener if the entry was evicted
if (data.remove(node.key, node)) {
listenerQueue.add(node);
}

clearly node.key is null.

Is this enough info for you? if will be hard for me to reproduce
because it is under load.

Patricio Echague

unread,
Oct 20, 2010, 6:45:07 PM10/20/10
to ConcurrentLinkedHashMap
the piece of code is this:


* @param id the id of the bundle
* @return the cached bundle or <code>null</code>
*/
public NodePropBundle get(NodeId id) {
String currentUser =
UserPartitionedCacheHelper.getCurrentUser();

// Look for the user entry.
UserEntry userEntry = (UserEntry)
bundlesPerUser.get(currentUser); -----------> Ex here !!!
Entry entry = null;

getCurrentUser returns either "systemUser" or the current user.

Ben Manes

unread,
Oct 20, 2010, 6:47:46 PM10/20/10
to ConcurrentLinkedHashMap
Can you show me how you are configuring the Builder to create the
instance?

Looking at the code, I may have introduced a subtle race conditions
with respect to weighted values. Are you specifying a Weigher in the
builder? If so, can the value's weight change over its lifetime?

You are welcome to send me the class code privately, if you prefer.
Alternatively you can try to reproduce it by improving the the
MultiThreadedTest class which provides load testing.

On Oct 20, 3:38 pm, Patricio Echague <patric...@gmail.com> wrote:
> Hi Ben, I don't have any snippet handy. It happened in our pre-
> production environment kind of randomly.
>
> We are using apache home-modified apache Jackrabbit optimized for high
> scalability. Hence, I introduced your library.
>
> The line that does a simple get [1] is that one that get the
> exception.
>
> org.apache.jackrabbit.core.persistence.util.LRUBundleCache.get(LRUBundleCac­he.java:

Patricio Echagüe

unread,
Oct 20, 2010, 6:53:58 PM10/20/10
to concurrentl...@googlegroups.com
Hi Ben,

Yes, we are specifying a weigher. (this is one of the main reason I liked this cache :) )

The weight changes yes, but we put it back with the new size.

This is the code snippet:

    /**
     * Creates a new BundleCache
     *
     * @param maxSize the maximum size of this cache in bytes.
     */
    public LRUBundleCache(long maxSize, int concurrencyLevel, int initialCapacity) {
        if (maxSize > Integer.MAX_VALUE) {
            throw new IllegalArgumentException("The maxsize of this cache cannot be bigger than" +
                    Integer.MAX_VALUE);
        }

        // Define my own weigher.
        Weigher<UserEntry> weigher = new Weigher<UserEntry>() {
            public int weightOf(UserEntry userEntry) {
                return userEntry.size;
            }
          };

        bundlesPerUser = new ConcurrentLinkedHashMap.Builder<String, UserEntry>()
            .weigher(weigher)
            .maximumWeightedCapacity((int)maxSize)
            .initialCapacity(initialCapacity)
            .concurrencyLevel(concurrencyLevel)
            .listener(new UserEvictionListener())
            .build();
        this.maxSize = maxSize;
        log.info("LRUBundleCache created with size=" + maxSize
                + " - concurrencyLevel=" + concurrencyLevel + " - initialCapacity=" + initialCapacity);
--
Patricio.-

Patricio Echagüe

unread,
Oct 20, 2010, 7:03:33 PM10/20/10
to concurrentl...@googlegroups.com
one more thing to make it clear (if it was not)

The weight chances. We have a UserEntry that the the entry in the cache.

We punt bundles inside the user entry (which almost always changes their weight) and after updating the user entry with the new bundle, I put the UserEntry object back in to let the cache update the size (weight).

this is the code for the #put:

    public void put(NodePropBundle bundle) {
        // Do not cache if this is the system user.
        String currentUser = UserPartitionedCacheHelper.getCurrentUser();
        if(UserPartitionedCacheHelper.isSystemUser(currentUser)) {
            return;

        }

        // Look for the user entry.
        UserEntry userEntry = (UserEntry) bundlesPerUser.get(currentUser);
        if (userEntry == null) {
            // If user does not exist then create user and first entry.
            userEntry = new UserEntry(currentUser, bundle.getSize());
            // Insert the user entry.
            //bundlesPerUser.put(currentUser, userEntry);
        }

        LinkedMap bundles = userEntry.bundles;

        // Look for the entry inside user space.
        Entry entry = (Entry) bundles.get(bundle.getId());
        if (entry == null) {
            entry = new Entry(bundle, bundle.getSize());
        } else {
            // User and bundle already exists. Just update sizes in case it has changed.
            totBundles.decrementAndGet();
            userEntry.size -= entry.size;
            // Update entry with new content and size
            entry.bundle = bundle;
            entry.size = bundle.getSize();
        }

        // Update user size.
        userEntry.size += entry.size;

        // Insert the bundle entry into the user space with its size.
        bundles.put(bundle.getId(), entry);

        // Insert the user entry always so we asure that the size gets updated if the entry existed.
        bundlesPerUser.put(currentUser, userEntry);   ---------------> let the cache update the weight to the "updated" entry.   <------------

        // Increment bundle counter
        totBundles.incrementAndGet();
    }

2010/10/20 Patricio Echagüe <patr...@gmail.com>



--
Patricio.-

Ben Manes

unread,
Oct 20, 2010, 7:12:08 PM10/20/10
to ConcurrentLinkedHashMap
Okay, then I think the bug is due to,
weightedSize -= node.weightedValue.weight;

where the assumption is that the weight is not being concurrently
updated (e.g. stale volatile read). The problem is that I call,
data.remove(...)
which uses the CHM's segment's lock. This could result in the
decorator's segment lock (to expand the segment's lock) may still be
held and updating the weight.

Unfortunately I don't have a weighted test case in the
MultiThreadedTest class, which I should add.

Can you change this method to:

Lock lock = segmentLock[segmentFor(key)];
lock.lock();
try {
// Notify the listener if the entry was evicted
if (data.remove(node.key, node)) {
listenerQueue.add(node);
}
weightedSize -= node.weightedValue.weight;
node.remove();
} finally {
lock.unlock();
}

which would force a concurrent removal to complete before the node's
weight is read.

You can checkout the branch and compile a jar with ant or maven, such
as
ant package
after performing the change.

You can then retest with the snapshot jar, and we can see if that
resolves the race condition.

Patricio Echagüe

unread,
Oct 20, 2010, 8:32:01 PM10/20/10
to concurrentl...@googlegroups.com
By concurrently update you mean the thread that evicts + my threads the puts back the entry with the new size/weight?
--
Patricio.-

Ben Manes

unread,
Oct 20, 2010, 8:47:27 PM10/20/10
to ConcurrentLinkedHashMap
Yep. The segment lock can probably be slimmed down to just being
around the update to the weightedSize, e.g.
Lock lock = segmentLock[segmentFor(key)];
try {
weightedSize -= node.weightedValue.weight;
} finally {
lock.unlock();
}

This would ensure that while the node is no longer in the map, it is
also not being concurrently updated (by #put()) that would mutate the
value's weight. What is happening is that in the put() block the
_else_ condition is being performed. The change would expand the
data.remove()'s segment lock to ensure that the entry is no longer in
the map / mutable when the weighted size is read for eviction.

// maintain per-segment write ordering
lock.lock();
try {
prior = data.putIfAbsent(node.key, node);
if (prior == null) {
writeQueue.add(new AddTask(node, weight));
} else if (onlyIfAbsent) {
oldValue = prior.weightedValue.value;
} else {
WeightedValue<V> oldWeightedValue = prior.weightedValue;
weightedDifference = weight - oldWeightedValue.weight;
prior.weightedValue = weightedValue;
oldValue = oldWeightedValue.value;
}
} finally {
lock.unlock();

Patricio Echagüe

unread,
Oct 21, 2010, 7:57:00 PM10/21/10
to concurrentl...@googlegroups.com
Hi Ben, this codes reproduces the issue. It works almost always to me.


package com.openwave.src;

import java.util.concurrent.Callable;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import junit.framework.Assert;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap;
import com.googlecode.concurrentlinkedhashmap.Weigher;

/**
 * Reproduce almost always the race condition for issue: (20)
 * Link: http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=20
 *
 * Tested with Dual Core 2.79 Ghz and 3GB RAM.
 * For more cores the number of threads(MAX_THREADS) might need to be incremented
 * and perhaps the number of cycles (CYCLES) as well.
 *
 * The worker threads perform #get against the cache and the main threads does a #put continuously.
 *
 * @author pechague (Patricio Echague)
 *
 */
public class EvictRaceConditionTest {

    /**
     * a map of the cache entries
     */
    private ConcurrentLinkedHashMap<String, Entry> cache;

    private ThreadPoolExecutor pool;

    private Entry onlyEntry;

    private static final String UNIQUE_KEY = "theOnlyKey";

    private static final int CYCLES = 4000000;

    private static final int MAX_THREADS = 20;

    /**
     * @throws java.lang.Exception
     */
    @Before
    public void setUp() throws Exception {

        pool = new ThreadPoolExecutor(MAX_THREADS, MAX_THREADS, 0L, TimeUnit.MILLISECONDS,
                new LinkedBlockingQueue<Runnable>());


        // Define my own weigher.
        Weigher<Entry> weigher = new Weigher<Entry>() {
            public int weightOf(Entry userEntry) {
                return userEntry.size;
            }
        };

        cache = new ConcurrentLinkedHashMap.Builder<String, Entry>()
                .weigher(weigher)
                .maximumWeightedCapacity((int) 1)
                .initialCapacity(1)
                .build();

        onlyEntry = new Entry();
        onlyEntry.setSize(1);
    }



    /**
     * @throws java.lang.Exception
     */
    @After
    public void tearDown() throws Exception {
        pool.shutdownNow();
        while (pool.getPoolSize() > 0) { /* spin until terminated */ }
    }

    @Test
    public void testRaceCondition() throws Exception {
        for (int t = 0; t < MAX_THREADS; t++) {
            pool.submit(new CacheGetWorker());
        }

        Thread.sleep(1);
        System.out.println("Start writer");

        try {
            for (int i = 0; i < CYCLES; i++) {

                if (onlyEntry.getSize() == 1) {
                    onlyEntry.setSize(2);
                } else {
                    onlyEntry.setSize(1);
                }

                cache.put(UNIQUE_KEY, onlyEntry);
            }

            // The code should not reach this point.
            Assert.fail();
        } catch (NullPointerException npe) {
            System.out.println("Exception caught!");
            npe.printStackTrace();
        }

        System.out.println("Done Writer");
    }

    /**
     * This worker put the same object again and again but varies the size from 1 to 2 to generate an eviction.
     * @author pechague
     *
     */
    private class CacheGetWorker implements Callable<Boolean> {

        @Override
        public Boolean call() throws Exception {
            System.out.println("Start Worker!");
            Boolean thereWasAnError = Boolean.FALSE;
            for (int i = 0; i < CYCLES; i++) {
                try {
                    cache.get(UNIQUE_KEY);
                } catch (NullPointerException npe) {
                    System.out.println("Error found in thread:!" + Thread.currentThread().getId());
                    thereWasAnError = Boolean.TRUE;
                    break;
                } catch (Exception e) {
                    thereWasAnError = Boolean.TRUE;
                    System.out.println("Exception not expected in thread:!" + Thread.currentThread().getId());
                    e.printStackTrace();
                    break;
                }
            }
            System.out.println("Done worker");
            return thereWasAnError;
        }
    }

    /**
     *
     * @author pechague (Patricio Echague)
     *
     */
    private class Entry {

        private volatile int size = 1;

        public int getSize() {
            return size;
        }

        public void setSize(int size) {
            this.size = size;
--
Patricio.-

Patricio Echagüe

unread,
Oct 21, 2010, 8:01:35 PM10/21/10
to concurrentl...@googlegroups.com
it is not the most elegant code. In fact, here is it failing in the put. I need to improve it to capture the future object as well to see the status of the worker threads.

2010/10/21 Patricio Echagüe <patr...@gmail.com>



--
Patricio.-

Ben Manes

unread,
Oct 21, 2010, 8:16:40 PM10/21/10
to ConcurrentLinkedHashMap
Thanks!

I plan on porting it into:

http://code.google.com/p/concurrentlinkedhashmap/source/browse/branches/lru/unittest/src/java/com/googlecode/concurrentlinkedhashmap/MultiThreadedTest.java

and either add it to the primary test (a new thrasher operation) or
stand-alone using the ConcurrentTestHarness rather than an executor
pool for the workers.

On Oct 21, 5:01 pm, Patricio Echagüe <patric...@gmail.com> wrote:
> it is not the most elegant code. In fact, here is it failing in the put. I
> need to improve it to capture the future object as well to see the status of
> the worker threads.
>
> 2010/10/21 Patricio Echagüe <patric...@gmail.com>
> >> > > > On Wed, Oct 20, 2010 at 3:47 PM, Ben Manes <ben.ma...@gmail.com>...
>
> read more »

Patricio Echagüe

unread,
Oct 21, 2010, 8:21:35 PM10/21/10
to concurrentl...@googlegroups.com
and perhaps we can include more than one workers doing puts.

I saw that class but didn't have time to investigate what benefits it gives.
--
Patricio.-

Ben Manes

unread,
Oct 25, 2010, 10:53:48 PM10/25/10
to ConcurrentLinkedHashMap
This is fixed in /branch/lru. I'd appreciate a code review. I'm also
hoping to get one from the submitter of issue #19, which I
implemented.

A minor enhancement that I'd like to flush out prior to cutting a
release is to provide strict LRU ordering (issue #17). I experimented
with this in the version attached to that bug. Since it does increase
the penalty by 2-3 us on average I may make it a configuration option,
as I doubt that it would provide an improvement to the hit rate. The
primary concern I have is the implementation strategy (it could be
either O(n*k) or O(n lg k), but trade-offs for the latter may perform
worse).

I'd also like to take another shot at LIRS (old attempt is in /trunk),
but its probably best to make two releases rather than one.

On Oct 21, 5:21 pm, Patricio Echagüe <patric...@gmail.com> wrote:
> and perhaps we can include more than one workers doing puts.
>
> I saw that class but didn't have time to investigate what benefits it gives.
>
>
>
> On Thu, Oct 21, 2010 at 5:16 PM, Ben Manes <ben.ma...@gmail.com> wrote:
> > Thanks!
>
> > I plan on porting it into:
>
> >http://code.google.com/p/concurrentlinkedhashmap/source/browse/branch...
> > > >> > > > The weight changes yes, but we put it back with...
>
> read more »

Patricio Echagüe

unread,
Oct 25, 2010, 11:09:23 PM10/25/10
to concurrentl...@googlegroups.com

Thanks Ben.  Will complete the review tomorrow.

<Sent from my Android phone>

On Oct 25, 2010 7:53 PM, "Ben Manes" <ben....@gmail.com> wrote:

This is fixed in /branch/lru. I'd appreciate a code review. I'm also
hoping to get one from the submitter of issue #19, which I
implemented.

A minor enhancement that I'd like to flush out prior to cutting a
release is to provide strict LRU ordering (issue #17). I experimented
with this in the version attached to that bug. Since it does increase
the penalty by 2-3 us on average I may make it a configuration option,
as I doubt that it would provide an improvement to the hit rate. The
primary concern I have is the implementation strategy (it could be
either O(n*k) or O(n lg k), but trade-offs for the latter may perform
worse).

I'd also like to take another shot at LIRS (old attempt is in /trunk),
but its probably best to make two releases rather than one.


On Oct 21, 5:21 pm, Patricio Echagüe <patric...@gmail.com> wrote:

> and perhaps we can include more...

> On Thu, Oct 21, 2010 at 5:16 PM, Ben Manes <ben.ma...@gmail.com> wrote:
> > Thanks!
>

> > I plan o...

> >http://code.google.com/p/concurrentlinkedhashmap/source/browse/branch...

>
> > and either add it to the primary test (a new thrasher operation) or

> > stand-alone using the ...

Ben Manes

unread,
Nov 1, 2010, 12:46:45 AM11/1/10
to ConcurrentLinkedHashMap
v1.1 is code complete. I'll work on releasing to Maven Central
tomorrow night and upload a JAR in the download section.

If you get a chance before then, a code review is always welcome!

On Oct 25, 8:09 pm, Patricio Echagüe <patric...@gmail.com> wrote:
> Thanks Ben.  Will complete the review tomorrow.
>
> <Sent from my Android phone>
>

Patricio Echague

unread,
Nov 1, 2010, 8:40:47 PM11/1/10
to ConcurrentLinkedHashMap
Sure, Thanks a lot Ben.

Patricio Echagüe

unread,
Nov 2, 2010, 1:59:13 PM11/2/10
to ConcurrentLinkedHashMap
Hi Ben, when do you think the version 1.1 will be in Maven Central?
--
Patricio.-

Ben Manes

unread,
Nov 3, 2010, 3:59:51 PM11/3/10
to ConcurrentLinkedHashMap
I just performed the release for JDK6 [1]. It should be sync'd hourly
so it will be in Maven Central soon.

I'll do the backport to JDK5 by end-of-day and release it as well.
That work should be minor.

I'll update the front-page with the latest information once I've
confirmed both are on Central.

Thanks for your patience!

- Ben

[1] https://oss.sonatype.org/content/repositories/releases/com/googlecode/concurrentlinkedhashmap/concurrentlinkedhashmap-lru/1.1/

On Nov 2, 10:59 am, Patricio Echagüe <patric...@gmail.com> wrote:
> Hi Ben, when do you think the version 1.1 will be in Maven Central?
>

Ben Manes

unread,
Nov 4, 2010, 1:48:29 AM11/4/10
to ConcurrentLinkedHashMap
Released.

On Nov 3, 12:59 pm, Ben Manes <ben.ma...@gmail.com> wrote:
> I just performed the release for JDK6 [1]. It should be sync'd hourly
> so it will be in Maven Central soon.
>
> I'll do the backport to JDK5 by end-of-day and release it as well.
> That work should be minor.
>
> I'll update the front-page with the latest information once I've
> confirmed both are on Central.
>
> Thanks for your patience!
>
> - Ben
>
> [1]https://oss.sonatype.org/content/repositories/releases/com/googlecode...

Patricio Echagüe

unread,
Nov 4, 2010, 2:31:31 PM11/4/10
to concurrentl...@googlegroups.com
Ben, should it be available in Maven already? I'm checking https://repository.sonatype.org but it looks it is not there yet.
--
Patricio.-

Ben Manes

unread,
Nov 4, 2010, 2:37:36 PM11/4/10
to ConcurrentLinkedHashMap
Yes, they've promoted it to central, see:

http://repo1.maven.org/maven2/com/googlecode/concurrentlinkedhashmap/concurrentlinkedhashmap-lru/

On Nov 4, 11:31 am, Patricio Echagüe <patric...@gmail.com> wrote:
> Ben, should it be available in Maven already? I'm checkinghttps://repository.sonatype.orgbut it looks it is not there yet.
Reply all
Reply to author
Forward
0 new messages