Stormpot object pool

560 views
Skip to first unread message

Chris Vest

unread,
Sep 7, 2015, 6:12:36 PM9/7/15
to mechanica...@googlegroups.com
Hi,

There has occasionally been some discussion about object pooling on this list, and since I just released a new version of my object pooling library, I thought I’d share a few details about the design for those who might find it interesting.

First off, the release notes are here, and include benchmark results that shows it doing over 2 billion claim+release operations per second on a 64 core box: https://medium.com/@chrisvest/released-stormpot-2-4-eeab4aec86d0

Stormpot is what I would call a “general purpose” pool, so it assumes concurrent access from an unknown number of threads. This is shared, mutable memory concurrency, so to get performance you have to avoid contention and coherence traffic as much as possible. Perhaps it then seems ironic that, since Stormpot pools are always bounded and the claim method is a blocking operation, the foundation for the design is the BlockingQueue. These things are pretty slow, but I get performance back by adding a fast-path on top and playing a few tricks.

The first trick is putting a ThreadLocal cache in front of the queue. A thread that wants to claim an object will now first try to claim the object in the thread local cache, which will contain the previous object claimed by that thread, if any. The claimed/free state of an object can now no longer be implied by the queue and has to be contained in the objects themselves, but at least contention spreads out and, if there are enough to go around, each thread will get a dedicated object.

The second trick is padding the claimed/free state in the objects, to avoid false sharing. At least in benchmarks, these objects are very likely to be allocated close together, and thus share cache lines. I use the class-inheritance-of-unused-fields trick to make sure that the hot state of the objects end up on separate cache lines, assuming cache lines are 64 bytes.

The third trick, which got me to the 2 billion ops/sec mark, is realising that a claimed object is effectively owned by only one thread, so the single-writer principle applies when an object is released back to the pool (but not when it is claimed) and thus this state change can be implemented with lazySet instead of compareAndSet. Even if the underlying cmpxchg instruction is uncontended, it’s still slower than a plain store.

So in the fast path, claiming an object is just a thread local lookup and a compareAndSet (and validity/expiration check), and releasing is just a lazySet.

This is still slower than object allocation in the common case, though. But that’s not what it’s meant to compete with anyway.

Cheers,
Chris

Georges Gomes

unread,
Sep 8, 2015, 9:30:28 AM9/8/15
to mechanica...@googlegroups.com
Thanks Chris, good to know that a good object pool is around if we need one.
Cheers
Georges


--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Francesco Nigro

unread,
Sep 10, 2015, 2:07:17 AM9/10/15
to mechanical-sympathy
Hi Chris i've tryed the last version of the object pool but seems to be broken (NPE) while trying to read the value inside a Poolable...
I've to rese the code but beware that even if a code is conceptually single-writer the lazy set (or orderedPut) doesn't provide the same semantica of a release thus the same order garantees..https://groups.google.com/forum/m/#!searchin/mechanical-sympathy/PutOrdered/mechanical-sympathy/EHQp7lm5cbM

Chris Vest

unread,
Sep 10, 2015, 3:28:19 AM9/10/15
to mechanica...@googlegroups.com


On 10 Sep 2015, at 08:07, Francesco Nigro <nigr...@gmail.com> wrote:

Hi Chris i've tryed the last version of the object pool but seems to be broken (NPE) while trying to read the value inside a Poolable…

This sounds strange. The Poolable interface is implemented by client code, so it sounds like a mistake in that integration. Can you share a gist that demonstrates this problem?

I've to rese the code but beware that even if a code is conceptually single-writer the lazy set (or orderedPut) doesn't provide the same semantica of a release thus the same order garantees..https://groups.google.com/forum/m/#!searchin/mechanical-sympathy/PutOrdered/mechanical-sympathy/EHQp7lm5cbM
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cheers,
Chris

Francesco Nigro

unread,
Sep 10, 2015, 3:48:28 AM9/10/15
to mechanical-sympathy
The test source is really short...i could share it here:

package com.telbios.pool.bench;

import java.util.concurrent.TimeUnit;
import stormpot.Allocator;
import stormpot.BlazePool;
import stormpot.Config;
import stormpot.Pool;
import stormpot.Poolable;
import stormpot.Slot;
import stormpot.Timeout;

/**
 *
 * @author Franz
 */
public class StormpotPoolTest {

    private static final class PoolableStringBuilder implements Poolable {

        private final Slot slot;
        private final StringBuilder builder = new StringBuilder();

        public PoolableStringBuilder(Slot slot) {
            this.slot = slot;
        }

        @Override
        public void release() {
            slot.release(this);
        }

    }

    private static enum PoolableStringBuilderAllocator implements Allocator<PoolableStringBuilder> {

        Instance;

        @Override
        public PoolableStringBuilder allocate(Slot slot) throws Exception {
            return new PoolableStringBuilder(slot);
        }

        @Override
        public void deallocate(PoolableStringBuilder poolable) throws Exception {

        }

    }

    private static final int TESTS = 5;
    private static final int OPS = 100_000_000;
    private static final Timeout NO_WAIT = new Timeout(0, TimeUnit.NANOSECONDS);

    public static void main(String[] args) throws InterruptedException {
        final Config<PoolableStringBuilder> config = new Config<>()
                .setAllocator(PoolableStringBuilderAllocator.Instance)
                .setBackgroundExpirationEnabled(false)
                .setThreadFactory(task -> {
                    final Thread thread = new Thread(task);
                    thread.setDaemon(true);
                    return thread;
                });
        final Pool<PoolableStringBuilder> pool = new BlazePool<>(config);
        test(pool);
        System.out.println("warmup");
        for (int i = 0; i < TESTS; i++) {
            for (int o = 0; o < 100_000; o++) {
                test(pool);
            }
        }
        System.out.println("end warmup");

        for (int i = 0; i < TESTS; i++) {
            final long start = System.nanoTime();
            for (int o = 0; o < OPS; o++) {

            }
            final long elapsed = System.nanoTime() - start;
            System.out.print((OPS * 1000_000_000L) / elapsed);
            System.out.println(" ops/sec");
        }
    }

    private static void test(Pool<PoolableStringBuilder> pool) throws InterruptedException {
        final PoolableStringBuilder pooleable = pool.claim(NO_WAIT);
        final StringBuilder acquired = pooleable.builder;
        acquired.setLength(0);
        acquired.append("prova");
        pooleable.release();
    }
}

While running it...

Exception in thread "main" java.lang.NullPointerException
at com.telbios.pool.bench.StormpotPoolTest$PoolableStringBuilder.access$000(StormpotPoolTest.java:24)
at com.telbios.pool.bench.test(StormpotPoolTest.java:92)
at com.telbios.pool.bench.main(StormpotPoolTest.java:70)

Please correct me if i'm wrong...
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


Cheers,
Chris

Chris Vest

unread,
Sep 10, 2015, 4:23:01 AM9/10/15
to mechanica...@googlegroups.com
Aha, `Pool.claim` returns `null` if it times out, and you don’t check for that in your test method.

With a zero timeout, you’ll only get an object if one can be obtained without blocking. All the objects in the pool are allocated in a background thread, so the pool will most likely be empty immediately after construction.

Cheers,
Chris

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Francesco Nigro

unread,
Sep 10, 2015, 4:47:14 AM9/10/15
to mechanical-sympathy
Thank Chris!
I've corrected the tests with this: 

    private static void test(Pool<PoolableStringBuilder> pool) throws InterruptedException {
        PoolableStringBuilder pooleable = null;
        while (pooleable==null)
            pooleable = pool.claim(NO_WAIT);
        final StringBuilder acquired = pooleable.builder;
        acquired.setLength(0);
        acquired.append("prova");
        pooleable.release();
    }

And the results with my configuration (i7 haswell and ubuntu 15.4 x86-64) are:

24265898 ops/sec
24891538 ops/sec
24958694 ops/sec
24860089 ops/sec
24602964 ops/sec

The results are very similar (the single thread ones at least!) to https://github.com/ashkrit/blog/blob/master/src/main/java/objectpool/FastObjectPool.java :

24664199 ops/sec
24601561 ops/sec
24391226 ops/sec
23914223 ops/sec
24683325 ops/sec

The only one concern is that using a thread-local pool (a simple thread-local ArrayDeque as a Stack of free-to-acquire instances...) is in another league:

88031335 ops/sec
85045175 ops/sec
86030249 ops/sec
86065998 ops/sec
86168530 ops/sec

But i know that the features provided are completly different....
If my use case will need something more than a thread-local storage, i could add tests in a concurrent contended case...

Chris Vest

unread,
Sep 10, 2015, 5:20:44 AM9/10/15
to mechanica...@googlegroups.com
In the benchmark for my blog post, I configured the pools to not do any object expiration checking.
By default, Stormpot uses a time-based expiration policy, which implies a call to `System.nanotime()` on every call to claim. Your example code don’t need that, so you can configure `config.setExpiration((info) -> false)` as well, and remove that overhead. Extracting the iteration loop into its own method and warming it up many times, can also give a small speedup.

If you don’t need expiration or an upper bound on the number of objects, then a normal ThreadLocal, or a normal field each worker thread, will give you unbeatable performance. It could also be that allocation gives the best median latency, if you otherwise have to go out of your way to zero out the state of the reused objects.

Cheers,
Chris

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages