Looking for Java serialization with minimal GC

534 views
Skip to first unread message

Dave

unread,
Jan 24, 2013, 2:04:43 PM1/24/13
to java-serializat...@googlegroups.com
I'm designing a high-performance application that will be sending a lot of messages between applications. All of these messages need to be sent on the network and persisted to disk. We need to minimize or eliminate garbage collection as much as possible. Can anyone recommend a serialization solution? Messages will be of a known format / schema and generally in the 100-200 byte range.

Tatu Saloranta

unread,
Jan 24, 2013, 2:44:20 PM1/24/13
to java-serializat...@googlegroups.com
You'll have to measure GC overhead for different codecs, probably
using smallest heap you can configure to maximize effects of GC.
Testing framework has been changed/re-configured to try to eliminate
GC overhead (there was quite a bit discussion on this), so default
results do not necessarily help all that much.
However, codec code itself can be used as is.

My guess is that most of top codecs actually work.
Serializer/deserialization is not very often the bottleneck of
systems, once you use libraries in efficient way.

So, you could choose to use Protobuf, Kryo (if this is java only),
Smile (a flavor of binary JSON, via Jackson or protostuff library),
for example.

-+ Tatu +-

Kannan Goundan

unread,
Jan 24, 2013, 4:31:33 PM1/24/13
to java-serializat...@googlegroups.com
Like Tatu said, most of the top serializers will probably be good
enough in terms of performance (as long as they're within ~2x of the
fastest one). I'd really recommend narrowing things down based on
your other needs. Some of the seralizers require writing a lot of
code to serialize/deserialize. Some require specifying a separate
schema file. Some will work with any POJO. Some are JVM-only, which
may or may not be a problem for you. Some have support for migrating
to newer schemas.

A couple notes on performance:

1. Test with your own platform. The published benchmarks are for the
Oracle JVM in "server" mode. If you're going to run on Android, make
sure to test there. The optimization and GC characteristics will be
different.

2. Test with your own data value. The value used in the benchmark
isn't bad, but it is a single value with a few integers and a few
strings. For example, if your messages are large lists of integers,
performance will probably differ from our benchmarks.

3. If you do try an measure GC overhead, be careful, it's tricky. For
example, some of the serializers let you use mutable objects, which
you can reuse for multiple messages, but this may not result in as
much of an advantage as it initially seems. Allocation is pretty
cheap on the JVM and most of the message objects will probably be
collected as part of a cheap "nursery" collection.
> --
> You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
> To post to this group, send email to java-serializat...@googlegroups.com.
> To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
>

Tatu Saloranta

unread,
Jan 25, 2013, 1:15:31 PM1/25/13
to java-serializat...@googlegroups.com
+1 for everything said below.

Related to this, it would be great to have Android variant for results
(i.e. running on a real Android device).
But I don't know how practical this might be.

-+ Tatu +-

Peter Lawrey

unread,
Feb 10, 2013, 4:23:06 PM2/10/13
to java-serializat...@googlegroups.com
If you are looking for messaging over TCP of one million messages per second or more, depending on your message, you might consider Java Chronicle (I am the author)
Note: this combines serialization, messaging, and logging to disk and you can achieve sub micro-second latencies if you are communicating on the same machine (including serialization+deserialization)

It is designed to reduce or eliminate garbage i.e. << 1 object per message. There are some methods which create objects (as they return a new object) but others use primitives or recycled objects.

Dave

unread,
Feb 11, 2013, 8:02:48 AM2/11/13
to java-serializat...@googlegroups.com
Thanks for all of the suggestions. At this point we're looking at a ByteBuffer implementation with rigid, versioned schemas and messages that are not self-describing. But for non-peformance-critical apps we're going to look at self-describing messages, probably Protocol Buffers. I'll take a look at Java Chronicle, it definitely sounds like it's got a good set of features for what we're trying to do.


--
Find me on Cowbird: http://cowbird.com/author/dave


--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

Tatu Saloranta

unread,
Feb 11, 2013, 1:35:49 PM2/11/13
to java-serializat...@googlegroups.com
On Mon, Feb 11, 2013 at 5:02 AM, Dave <dla...@gmail.com> wrote:
> Thanks for all of the suggestions. At this point we're looking at a
> ByteBuffer implementation with rigid, versioned schemas and messages that
> are not self-describing. But for non-peformance-critical apps we're going to
> look at self-describing messages, probably Protocol Buffers. I'll take a

FWIW, Protobuf is not self-describing. Also note that ByteBuffers are
typically not particularly efficient input source, but I guess it
makes sense if that's the interface your I/O level provides.

Good luck!

-+ Tatu +-

Peter Lawrey

unread,
Feb 12, 2013, 7:17:20 AM2/12/13
to java-serializat...@googlegroups.com
Java Chronicle uses ByteBuffer by default but if you need extra performance it has the option to use Unsafe directly instead.  It can be about 30% faster but I strongly suggest you test with ByteBuffers to start with.

The advantage of using direct or memory mapped ByteBuffers as there is no additional overhead in copying the data to/from native memory.  In the case of memory mapped files there is no additional system call required to write (or read) the data to disk or to another process. i.e. as soon as it has finished copying the data, you are done.

While ByteBuffer is slower with bytes esp text. it is faster for multi-byte data types like int, long, or double.  It can be configured to use native byte ordering (the default for Chronicle) which means serializing a double is just a direct memory write.


On Monday, 11 February 2013 13:02:48 UTC, Dave wrote:
Thanks for all of the suggestions. At this point we're looking at a ByteBuffer implementation with rigid, versioned schemas and messages that are not self-describing. But for non-peformance-critical apps we're going to look at self-describing messages, probably Protocol Buffers. I'll take a look at Java Chronicle, it definitely sounds like it's got a good set of features for what we're trying to do.


--
Find me on Cowbird: http://cowbird.com/author/dave


On Sun, Feb 10, 2013 at 4:23 PM, Peter Lawrey <peter....@gmail.com> wrote:
If you are looking for messaging over TCP of one million messages per second or more, depending on your message, you might consider Java Chronicle (I am the author)
Note: this combines serialization, messaging, and logging to disk and you can achieve sub micro-second latencies if you are communicating on the same machine (including serialization+deserialization)

It is designed to reduce or eliminate garbage i.e. << 1 object per message. There are some methods which create objects (as they return a new object) but others use primitives or recycled objects.


On Thursday, 24 January 2013 19:04:43 UTC, Dave wrote:
I'm designing a high-performance application that will be sending a lot of messages between applications. All of these messages need to be sent on the network and persisted to disk. We need to minimize or eliminate garbage collection as much as possible. Can anyone recommend a serialization solution? Messages will be of a known format / schema and generally in the 100-200 byte range.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Pres

unread,
Feb 12, 2013, 3:21:55 PM2/12/13
to java-serializat...@googlegroups.com
Hi Dave,

you might want to try Obser, it supports direct serialization to ByteBuffer (direct / array backed).
It's memory footprint should be manageable and it supports all objects that can be serialized via standard java serialization.

Regards,
Pres

Emanuele Ziglioli

unread,
Feb 12, 2013, 3:23:17 PM2/12/13
to java-serializat...@googlegroups.com
Hi Dave,

what do you mean "self-describing"?

I'm the author of Java Construct, and we use it for binary protocols, so not object serializatin in general:

Can't comment on performances (speed or memory usage) not having profiled it yet.
But we do use ByteBuffer underneath


On Tuesday, 12 February 2013 02:02:48 UTC+13, Dave wrote:
Thanks for all of the suggestions. At this point we're looking at a ByteBuffer implementation with rigid, versioned schemas and messages that are not self-describing. But for non-peformance-critical apps we're going to look at self-describing messages, probably Protocol Buffers. I'll take a look at Java Chronicle, it definitely sounds like it's got a good set of features for what we're trying to do.


--
Find me on Cowbird: http://cowbird.com/author/dave


On Sun, Feb 10, 2013 at 4:23 PM, Peter Lawrey <peter....@gmail.com> wrote:
If you are looking for messaging over TCP of one million messages per second or more, depending on your message, you might consider Java Chronicle (I am the author)
Note: this combines serialization, messaging, and logging to disk and you can achieve sub micro-second latencies if you are communicating on the same machine (including serialization+deserialization)

It is designed to reduce or eliminate garbage i.e. << 1 object per message. There are some methods which create objects (as they return a new object) but others use primitives or recycled objects.


On Thursday, 24 January 2013 19:04:43 UTC, Dave wrote:
I'm designing a high-performance application that will be sending a lot of messages between applications. All of these messages need to be sent on the network and persisted to disk. We need to minimize or eliminate garbage collection as much as possible. Can anyone recommend a serialization solution? Messages will be of a known format / schema and generally in the 100-200 byte range.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Peter Lawrey

unread,
Feb 13, 2013, 6:43:52 AM2/13/13
to java-serializat...@googlegroups.com
Do you have a link for Obser, it sounds interesting but I couldn't find it.

Pres

unread,
Feb 13, 2013, 6:58:24 AM2/13/13
to java-serializat...@googlegroups.com
Currently still not correctly hosted, but you can grab the latest source code from http://sockali.net/obser/ and use Maven to build it.
Reply all
Reply to author
Forward
0 new messages