Big Queue project

845 views
Skip to first unread message

Ashwin Jayaprakash

unread,
Dec 17, 2013, 1:36:02 AM12/17/13
to java-ch...@googlegroups.com
Hi, what do you think of this project - http://bulldog2011.github.io/blog/2013/01/24/big-queue-tutorial/? Netflix Suro appears to be using this project.

Regards.

Peter Lawrey

unread,
Dec 17, 2013, 6:39:18 AM12/17/13
to java-ch...@googlegroups.com
A common problem I see with many of these solutions is the conversion to/from byte[].  This typically creates very large amounts of garbage which slows down your application even if you ignore the cost of a GC pause.  In my experience serializing to/from a byte[] slows down messaging by as much as 3x in realistic examples.  

Garbage doesn't just slow down your serialization, it slows down your whole application by filling your CPU caches with garbage, literally.  If you produce 300 MB/s of garbage your have filled your L1 cache with garbage in 0.1 ms and your L2 cache with garbage in under 1 ms.  This means if you access something you have a milli-second ago it's in L3 cache if you are lucky.  The problem with L3 is that it's not only 10-20x slower than L1 it is also shared across all your CPUs. i.e.  Your multi-threaded process is now single threaded contended on a shared resources which is much, much slower than your L1.

In short, if you have less than 300 MB/s of garbage, your GC times will look good but your application is likely to be 2-5x slower (in applications I have tuned) if single threaded, and far worse if multi-threaded.  You actually want allocation rates much lower than 1 MB/s and if you can get it below 250 KB/second you can run all day without a minor collection. (with a 24 GB Eden space)

Chronicle is designed to support saving and loading objects without creating any garbage (or as little as possible depending on the types you use)  I have a demo (called chronicle-demo) which shows sending 10 million messages/objects (not just byte[]) from one process to another and back again in a 32 MB heap (no special tuning otherwise) without trigging a minor collection.  There is garbage but far less than one byte per message.  BTW The typical round trip time for a persisted request/response is less than 1 micro-second even on a laptop. (How many you can do at with the typical latency depends on your machine but you can usually do at least 500K/s and any multi cpu system)

In a similar approach, I have a HugeHashMap which can concurrently put/get/get/remove 100 million entries with a 64 MB heap (on a machine with 32 GB of memory) and not trigger a minor collection. See HugeHashMapTest. With a 6 core machine it can perform 13+ million operations per second. 



On 17 December 2013 06:36, Ashwin Jayaprakash <ashwin.ja...@gmail.com> wrote:
Hi, what do you think of this project - http://bulldog2011.github.io/blog/2013/01/24/big-queue-tutorial/? Netflix Suro appears to be using this project.

Regards.

--
You received this message because you are subscribed to the Google Groups "Java Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Peter Lawrey

unread,
Dec 17, 2013, 7:12:52 AM12/17/13
to java-ch...@googlegroups.com
On this page, https://github.com/bulldog2011/bigqueue the author quotes 166 - 333 MB/second with 1 KB message, which I assume as byte[].

As the author notes, the bandwidth of your disk is a limiting factor which is why I have optimised for smaller messages to ensure the per message overhead is at a minimum.

The figures I quote for an ext4 files system is ~700 MB/second for messages as small as 64 bytes and 570 MB/second for 16 bytes message

On a tmpfs filesystem, Chronicle achieves 1.4 GB/s from a single thread. 

In my example I write/read multiple int values.  Writing long or double or untouched byte[] values is slightly faster but optimistic in terms of performance.

I have tested this on a PCI SSD and achieved a sustained write rate of 900 MB/s which is the limit of the device. Most SATA SSD are limited to 500 MB/s.

 To maximise performance you might like to have a chronicle per producer thread but you would need both a) even faster storage b) much larger storage, which is a very expensive combination.  My 480 GB PCI SSD fills in 10 minutes.  The next version of Chronicle will support rolling files.

My target is to test it for writing/reading one trillion messages in under a day.

Regards,
   Peter.

Ashwin Jayaprakash

unread,
Dec 19, 2013, 11:19:08 AM12/19/13
to java-ch...@googlegroups.com
Thanks for the explanation, Peter. I guess Martin Thomson's SBE work would fill that gap of fast ser-deser.

Peter Lawrey

unread,
Dec 19, 2013, 11:24:47 AM12/19/13
to java-ch...@googlegroups.com

I would be interested to see that.

What ever does will take time and make it slower. My tests already include serialization and deserialization costs. If you are not careful, deserialization can be more expensive that low latency messaging.

It is an interesting space to watch.

Cheers,
   Peter.

Reply all
Reply to author
Forward
0 new messages