Musing openly: Chronicle integration with JCACHE data grid providers

479 views
Skip to first unread message

Ben Cotton

unread,
Dec 30, 2013, 12:23:15 PM12/30/13
to java-ch...@googlegroups.com

Musing openly, I think there may be a very interesting (and empowering) bridge that may be crossable from traditional JCACHE data grid providers to Chronicle.

For a while, I was thinking to myself  -- will Chronicle eventually evolve itself into a niche JCACHE provider that accommodates the ultra high-performace computing community (via a standard JSR107-compaint API)?

Then I thought, wait ... why not just integrate Chronicle via an explicit API adapter to a commodity JCACHE provider (eg. EhCache, GemFIRE, Coherence, Infinispan, etc.)?

Today every JCACHE provider implements Cache<K,V> as some variant on j.u.c.CHM<K,V> .... but ... why not implement Cache as something more capable than CHM?  why not implement it as  something like Chronicle/Excerpt/ExcerptAppender?  By doing so, you can build very sweet things like a distributed (or replicated) Cache with a /dev/shm IPC transport (chronicle rendered) interconnect. Cache operands implemented as a CHM, by necessity live on the heap.  Cache operands implemented via Chronicle can live where they want.

E.g.  The JCACHE provider=JBoss Infinispan provides an API (org.infinispan.container.DataContainer) that defines a interface that can be used to implement a Cache that is something other than a CHM operand.  You could (I assume, but have not tested) use this exact same interface to implement an Infinispan Cache operand that is in fact a Chronicle/Excerpt/ExcerptAppender operand.

Nice, right?  Everything 100% community-driven open source too!  :-)

I'm off to my Infinsipan/JGroups build workspaces playing with this Chronicle apapter idea ... has any one else (maybe a JCACHE provider?) mused in the openHFT community wrt to using Chronicle as a non-CHM javax.cache.Cache<K,V> operand  implementation?



Peter Lawrey

unread,
Dec 30, 2013, 1:16:41 PM12/30/13
to java-ch...@googlegroups.com

I suggest you look at HugeHashMap in the HugeCollections. It is a concurrent map where the keys and values are entirely off heap. Ie it uses less than one byte of heap per key/value.

--
You received this message because you are subscribed to the Google Groups "Java Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ben Cotton

unread,
Dec 30, 2013, 6:26:30 PM12/30/13
to java-ch...@googlegroups.com
HugeHashMap<K,V> looks ideal.  :-)

FYI.  =  http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tc4026102.html#a4028629

Interestingly, this thread at the Infinispan.org URL above was originally started by Yavuz Gokirmak, author of  openHFT Chronicle tests.

Thanks, Ben

Petr Postulka

unread,
Jan 6, 2014, 2:19:23 PM1/6/14
to java-ch...@googlegroups.com
Hi Peter,

are you planning to publish HugeCollections to maven central as other openhft libraries? Right now I can't find it there and it seems it is not published in any other repository.

Thank you and kind regards,

Petr

Peter Lawrey

unread,
Jan 6, 2014, 3:54:34 PM1/6/14
to java-ch...@googlegroups.com

I will publish huge collections this week. I was trying to pick a good time for the first release and now seems as good a time as any.

Petr Postulka

unread,
Jan 6, 2014, 3:56:27 PM1/6/14
to java-ch...@googlegroups.com
Thanks Peter, much appreciated ...

Thomas Lo

unread,
Jan 10, 2014, 11:35:41 AM1/10/14
to java-ch...@googlegroups.com
Peter, do you have a google group for openhft Huge Collection? Got some questions to ask and don't want hijack Chronicle's group for that. Thanks.

Peter Lawrey

unread,
Jan 10, 2014, 12:50:12 PM1/10/14
to java-ch...@googlegroups.com

Good point. I can set one up on friday. I am away skiing in france at the moment.

You could start an issue on github for more documentation on specific topics ;)

Thomas Lo

unread,
Jan 10, 2014, 2:47:08 PM1/10/14
to java-ch...@googlegroups.com
Have fun skiing!!!! Don't want to disturb you on that!

Will do.

Ben Cotton

unread,
Jan 15, 2014, 9:55:20 AM1/15/14
to java-ch...@googlegroups.com
Hi Peter,

FYI there is now an explicit discussion at JCACHE provider=Infinispan re: the consideration to use OpenHFT HugeHashMap as an off-heap Cache<K,V> impl.


One of the comments at this discussion was wrt to HHM's use of  Unsafe.malloc vs.  Netty.jemalloc and the potential extra COPY impact wrt to "reconnecting" HHM Entry value objects with on-heap  API usage (e.g. NIO).

It would seem to me that direct Unsafe.malloc, in concert w/ using BytesMarshallable value objects to "reconnect" with on-heap APIs, has at least the same merit as using Netty.jemalloc

Any comment?

Ben Cotton

unread,
Jan 15, 2014, 11:50:27 AM1/15/14
to java-ch...@googlegroups.com
Simplifying my question, is there something that Netty's jemalloc() like off-heap allocation management does that is somehow different (advantageous?) when compared with straightforward usage of Unsafe malloc/free ?

Peter Lawrey

unread,
Jan 15, 2014, 11:56:55 AM1/15/14
to java-ch...@googlegroups.com
HHM uses one large allocation per segment to store most serialized objects.  In this way, millions of entries can use the same allocation.  As the size is not know in advance, or assumed to be unknown, the key and value are serialized first into a temporary buffer and once the size is known into the next free portion of the preallocated data, or for oversized objects into a new malloc'ed area. i.e. if every key/value fits into the preallocated off heap, no additional allocations occur.

You can use ByteMarshallable for the most options on how to serialize your key/value, however you will get the same behaviour with normal Externalizable. i.e. you are not forced to use the libraries API in the class.  Using Externalizable is with this library more efficient than using the same code with Java Serialization.


--

Peter Lawrey

unread,
Jan 15, 2014, 11:59:16 AM1/15/14
to java-ch...@googlegroups.com
Good question.  I suspect there is a bunch of things it is not doing, but I will investigate.


--

Peter Lawrey

unread,
Jan 15, 2014, 1:19:05 PM1/15/14
to java-ch...@googlegroups.com
The first thing I noticed is that allocating using the Pooled Heap is twice as fast on my machine, Netty creating/freeing 256 bytes is 11 million vs DirectStore 5.6 million per second.  Note: HHM avoids doing this at all and I suspect this difference is not important for HHM.

I re-wrote one of their tests as a performance test.  Given they don't appear to performance test their object serialization is a worry ;) but it also means I probably didn't do it as optimally as it could be. In the following test I serialize and deserialize an object with four fields String, int, double, Enum using the same writeExternalizable/readExternalizble code.

Netty: Serialization/Deserialization latency: 327,499 us avg
Netty: Serialization/Deserialization latency: 97,419 us avg
Netty: Serialization/Deserialization latency: 54,232 us avg
Netty: Serialization/Deserialization latency: 58,950 us avg
Netty: Serialization/Deserialization latency: 53,177 us avg
Netty: Serialization/Deserialization latency: 53,189 us avg
Netty: Serialization/Deserialization latency: 53,672 us avg
Netty: Serialization/Deserialization latency: 52,871 us avg
Netty: Serialization/Deserialization latency: 52,211 us avg
Netty: Serialization/Deserialization latency: 51,924 us avg
DirectStore: Externalizable latency: 6,899 us avg
DirectStore: Externalizable latency: 825 us avg
DirectStore: Externalizable latency: 496 us avg
DirectStore: Externalizable latency: 494 us avg
DirectStore: Externalizable latency: 385 us avg
DirectStore: Externalizable latency: 212 us avg
DirectStore: Externalizable latency: 201 us avg
DirectStore: Externalizable latency: 197 us avg
DirectStore: Externalizable latency: 199 us avg
DirectStore: Externalizable latency: 203 us avg

The code is

/*
 * Copyright 2012 The Netty Project
 *
 * The Netty Project licenses this file to you under the Apache License,
 * version 2.0 (the "License"); you may not use this file except in compliance
 * with the License. You may obtain a copy of the License at:
 *
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 * License for the specific language governing permissions and limitations
 * under the License.
 */
package io.netty.handler.codec.marshalling;

import io.netty.buffer.ByteBuf;
import io.netty.channel.ChannelHandler;
import io.netty.channel.embedded.EmbeddedChannel;
import net.openhft.lang.io.Bytes;
import net.openhft.lang.io.DirectBytes;
import net.openhft.lang.io.DirectStore;
import net.openhft.lang.io.serialization.BytesMarshallable;
import org.jboss.marshalling.MarshallerFactory;
import org.jboss.marshalling.Marshalling;
import org.jboss.marshalling.MarshallingConfiguration;
import org.jboss.marshalling.Unmarshaller;
import org.jetbrains.annotations.NotNull;
import org.junit.Test;

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.lang.annotation.RetentionPolicy;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;

public class SerialMarshallingEncoderTest extends SerialCompatibleMarshallingEncoderTest {

    @Override
    protected ByteBuf truncate(ByteBuf buf) {
        buf.readInt();
        return buf;
    }

    @Override
    protected ChannelHandler createEncoder() {
        return new MarshallingEncoder(createProvider());
    }

    @Test
    public void testMarshallingPerf() throws Exception {
        MyData testObject = new MyData("Hello World", 1, 2.0, RetentionPolicy.RUNTIME);

        final MarshallerFactory marshallerFactory = createMarshallerFactory();
        final MarshallingConfiguration configuration = createMarshallingConfig();
        Unmarshaller unmarshaller = marshallerFactory.createUnmarshaller(configuration);

        for (int t = 0; t < 10; t++) {
            long start = System.nanoTime();
            int RUNS = 10000;
            for (int i = 0; i < RUNS; i++) {
                EmbeddedChannel ch = new EmbeddedChannel(createEncoder());
                ch.writeOutbound(testObject);
                assertTrue(ch.finish());

                ByteBuf buffer = ch.readOutbound();

                unmarshaller.start(Marshalling.createByteInput(truncate(buffer).nioBuffer()));
                MyData read = (MyData) unmarshaller.readObject();
                assertEquals(testObject, read);

                assertEquals(-1, unmarshaller.read());

                assertNull(ch.readOutbound());
                buffer.release();
            }
            long average = (System.nanoTime() - start) / RUNS;
            System.out.printf("Netty: Serialization/Deserialization latency: %,d us avg%n", average);
        }

        unmarshaller.finish();
        unmarshaller.close();
    }

    @Test
    public void testMarshallingPerfDirectStore() throws Exception {
        MyData testObject = new MyData("Hello World", 1, 2.0, RetentionPolicy.RUNTIME);
        MyData testObject2 = new MyData("test", 12, 222.0, RetentionPolicy.CLASS);

        DirectStore ds = DirectStore.allocateLazy(256);
        DirectBytes db = ds.createSlice();
        for (int t = 0; t < 10; t++) {
            long start = System.nanoTime();
            int RUNS = 10000;
            for (int i = 0; i < RUNS; i++) {
                db.reset();
                testObject.writeExternal(db);
                long position = db.position();
                db.reset();
                testObject2.readExternal(db);
                assertEquals(testObject, testObject2);

                assertEquals(position, db.position());
            }
            long average = (System.nanoTime() - start) / RUNS;
            System.out.printf("DirectStore: Externalizable latency: %,d us avg%n", average);
        }
        ds.free();
    }

    public static class MyData implements Externalizable {
        String text;
        int value;
        double number;
        RetentionPolicy policy;

        public MyData() {
        }

        public MyData(String text, int value, double number, RetentionPolicy policy) {
            this.text = text;
            this.value = value;
            this.number = number;
            this.policy = policy;
        }

        @Override
        public void writeExternal(ObjectOutput out) throws IOException {
            out.writeUTF(text);
            out.writeInt(value);
            out.writeDouble(number);
            out.writeUTF(policy.name());
        }

        @Override
        public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
            text = in.readUTF();
            value = in.readInt();
            number = in.readDouble();
            policy = RetentionPolicy.valueOf(in.readUTF());
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || getClass() != o.getClass()) return false;

            MyData myData = (MyData) o;

            if (Double.compare(myData.number, number) != 0) return false;
            if (value != myData.value) return false;
            if (policy != myData.policy) return false;
            if (text != null ? !text.equals(myData.text) : myData.text != null) return false;

            return true;
        }
    }
}

Ben Cotton

unread,
Jan 15, 2014, 2:36:07 PM1/15/14
to java-ch...@googlegroups.com
Fabulously informative, Peter (as always).  :-)

I have shared your findings with the JCACHE provider=Infinispan community.

Ben Cotton

unread,
Feb 5, 2014, 12:58:03 PM2/5/14
to java-ch...@googlegroups.com


Hi Peter,  Could you at all comment  wrt to tactics you might use to make OpenHFT's  HugeHashMap adaptable to the run-time benefits of an HTM-assist capability (e.g. on Intel TSX hardware)?  Would any HTM-assist adapter mechanism you provide likely be transparent from the user view?  Or could such an adapter actually include a user-facing API to manage their HHM w/ HTM-assist expectations (and obligations)?  Knowing this is not in your immediate near-term HugeCollections delivery plans, just very excited about all the discussion since your recent blog post. Thx, Ben

Peter Lawrey

unread,
Feb 5, 2014, 1:50:56 PM2/5/14
to java-ch...@googlegroups.com

The HHM already uses fine grain locking so it wouldn't gain as much from HTM. The next steps for me is a hashmap which can be shared between processes and replicated.

Ben Cotton

unread,
Feb 5, 2014, 10:57:50 PM2/5/14
to java-ch...@googlegroups.com

> The HHM already uses fine grain locking so it wouldn't gain as much from HTM. 

Makes perfect sense, thanks.

The next steps for me is a hashmap which can be shared between processes and replicated.

Can't wait.  

Question for your consideration (and with me asking in a wee-bit of the Devil's advocate role):  In the exact same way that community-driven open source JCACHE vendors' solutions (e.g. Hazelcast/Infinispan) can so obviously benefit from OpenHFT's (already in-place) HHM being their off-heap Cache<K,V> operand provider, could OpenHFT's community driven open source HHM solution (and its ambition to be replicated/distributed)  possibly benefit from their (already in place) capabilities to deliver replicated/distributed (and CAP-Brewer coherently synchronized consistent) JCACHE data grids.

Might be a Win-Win.

IMHO (very humble) that combination of such a highly-scalable highly-distributed  HHM might be the long-term basis for OpenHFT potentially being a "can't beat" Map-Reduce platform provider.   Enabling MR processing operations on ultra-high-performance distributed Cache operands in the same way Hadoop enables MR processing operations on ultra-low-performance disk-based distributed UFS operands.

Just musing out loud, you know.  :-)  Any interest?

Peter Lawrey

unread,
Feb 6, 2014, 2:26:52 AM2/6/14
to java-ch...@googlegroups.com

Yes. I have interest and I have my own ideas.

The plan is to support a partitioned time series database. E.g. if you have five hosts, each on deals with a different day of the week.

Then you can do things like.

eurusd.mid = average (eurusd.bp, eurusd.ap);
usdchf.mid = average (usdchf.bp, usdchf.ap);
Correlation correl = eurusd.mid.correl (usdchf.mid, filter.context);

And it will split the query across all your machines/cpus and collect the results transparently for you.

Ie the objective is that the data is local to each machine already and you write java in a reasonably natural fashion. Possibly support scala.

Ben Cotton

unread,
Feb 9, 2014, 7:33:38 PM2/9/14
to java-ch...@googlegroups.com
 
Musing openly, I think there may be a very interesting (and empowering) bridge that may be crossable from traditional JCACHE data grid providers to OpenHFT.

No longer just "musing", we are now "openly building"  a POC integration of this potential.  No results yet, but "here we go...."


Peter Lawrey

unread,
Feb 10, 2014, 2:56:41 AM2/10/14
to java-ch...@googlegroups.com
At the moment I am working on a SharedHashMap which is much like HugeHashMap but it can be shared between processes and replicated. It assumes a single master at at this stage but the aim for a get/put should be around a micro-second.  It also supports very large sizes like HHM.

I think this is the best option to support JCache.


Ben Cotton

unread,
Feb 17, 2014, 4:14:51 PM2/17/14
to java-ch...@googlegroups.com

Ben Cotton

unread,
Sep 1, 2014, 4:45:27 PM9/1/14
to java-ch...@googlegroups.com

FYI.   Responded to RedHat's request for updated status re potential of OpenHFT being both a

1.  Off-Heap 
2.  /dev/shm IPC transport

provider to Infinispan (JCACHE).

Reply all
Reply to author
Forward
0 new messages