Concurrent retrieval of statistics

Mohan Radhakrishnan

unread,

Oct 16, 2018, 4:33:03 AM10/16/18

to mechanical-sympathy

Hi,

There is streaming data everywhere like trading data , JVM logs.etc. Retrieval of statistics of this data need fast data structures. Where can I find the literature on such fast data structures to store and retrieve timestamps and data in O(!) time ? Should this always be low-level Java concurrent utilities ?

Thanks,

Mohan

Wojciech Kudla

unread,

Oct 16, 2018, 4:47:31 AM10/16/18

to mechanica...@googlegroups.com

I can only speak from my own experience. For data generated by latency critical threads you will probably want to have a simple SPSC buffer per thread meaning no competition between producers so fewer cycles lost on cache coherence. The consumer could just iterate over all known buffers and drain them in some housekeeping thread (preferably on the same numa node).

Nitsan Wakart did some inspiring work on queues and there's plenty to learn from it, however for simple, well structured data you would probably be better off with something bespoke.

I typically use an off-heap circular byte buffer with producer and consumer high watermarks cached for additional speed.

I guess choosing the right data structure and the means of generally moving the data will depend on many factors but most of the time jdk components are not suitable for that.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Francesco Nigro

unread,

Oct 16, 2018, 5:17:54 AM10/16/18

to mechanical-sympathy

Or you could use something similar to this too: if you are slow to collect, you loose things (and you know how many samples), but you won't slow down too much the producer...https://github.com/real-logic/agrona/pull/119

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Gil Tene

unread,

Oct 17, 2018, 8:34:09 AM10/17/18

to mechanica...@googlegroups.com

See "WriterReaderPhaser: A story about a new (?) synchronization primitive". WriterReaderPhaser (WRP) was designed for the common pattern of speed or latency critical writers writing and accumulating data into a data structure (e.g. a stats collecting thing, like a histogram), and less-critical background threads occasionally or periodically gathering up that accumulated data in a lossless manner. It provides for wait-free writers and potentially blocking readers. It's a common pattern used and needed when e.g. logging of metrics or other stats is performed periodically, and I've used the pattern often enough that I decided it needs a new synchronization primitive.

For a full code example of how to use this synchronization primitive in e.g. a classic double-buffered approach around a simple data structure, you can see how an implementation of WriterReaderPhaser is used in HdrHistogram's SingleWriterRecorder and Recorder classes, both of which record data into an interval histogram, and provide for lossless reading of that interval histogram. Those are classic examples of stats written by latency critical writers in a wait-free manner, and collected by a non-latency critical background thread, and WriterReaderPhaser can be similarly used to coordinate this sort of work around any sort of stats-gathering data structure. The WRP's current implementation has the writer use an atomic increment (which translates to a LOCK XADD on x86) to enter and leave the critical section. Single writer cases need no further synchronization, and multi-writer cases would need to coordinate on writing to the common data structure (e.g. a AtomicHistogram or ConcurrentHistogram in Recorder, depending on whether or not auto-resizing of the histogram is needed).

A java implementation of WRP currently lives in HdrHistogram (which is available on maven central). It is too small for its own package and probably needs a better home. If interest in it grows, Martin can probably find it a home in Agrona. Other implementations have been built in e.g. other language versions of HdrHistogram (e.g. C, .NET).

Carl Mastrangelo

unread,

Oct 17, 2018, 7:47:05 PM10/17/18

to mechanical-sympathy

Without forking this thread too much, I read your post a few months back (the link you posted is dead though!). One of the questions I had is that WRP still synchronizes all of the writers in writerCriticalSectionEnter, since they all will have to contend on the long. My question is if it's possible to shard that counter somehow, since presumably the number of writers is high, and they enter the writer critical section significantly more than the reader does? (and if it is possible, was this unnecessary for WRP?)

On Wednesday, October 17, 2018 at 5:34:09 AM UTC-7, Gil Tene wrote:

See "WriterReaderPhaser: A story about a new (?) synchronization primitive". WriterReaderPhaser (WRP) was designed for the common pattern of speed or latency critical writers writing and accumulating data into a data structure (e.g. a stats collecting like a histogram), and less-critical background threads occasionally or periodically gathering up that accumulated data is a lossless manner, and provides for wait-free writers and potentially blocking readers. It's a common pattern used and needed when e.g. logging of metrics or other stats is performed periodically, and I've used the pattern often enough that I decided it needs a new synchronization primitive.

Gil Tene

unread,

Oct 17, 2018, 10:40:49 PM10/17/18

to mechanica...@googlegroups.com

On Wednesday, October 17, 2018 at 7:47:05 PM UTC-4, Carl Mastrangelo wrote:

Without forking this thread too much, I read your post a few months back (the link you posted is dead though!).

I fixed the link in the post. Thx.

One of the questions I had is that WRP still synchronizes all of the writers in writerCriticalSectionEnter, since they all will have to contend on the long.

While this technically is synchronization, it is wait free synchronization. The writers remain wait-free (on hardware architectures that support atomic increments, such as x86) while contending for write access to the cache line containing the long epoch counter. non-blocking (including non-spinnning) forward progress is guaranteed, and there are no retry loops or other wait situations involved.

My question is if it's possible to shard that counter somehow, since presumably the number of writers is high, and they enter the writer critical section significantly more than the reader does? (and if it is possible, was this unnecessary for WRP?)

Yes, WRP implementations that use wider forms of counters (e.g. using a LongAdder instead of volatile longs and atomic updaters) are certainly possible, and such a widening of the counters would not change the algorithm or the WRP semantics or guarantees in any way. I found that in most use cases the number of writers is low enough and the rate of writing low enough enough (even when it is in the millions per second) that reducing contention on the counters is not worth the extra indirection cost of AtomicAdder. But much like my choice to use atomic updaters on volatile longs instead of an AtomicLong, you can implement various equivalent-logic striped adder schemes that are shallower in indirection depth than LongAdder (just uglier to code).

I believe that LeftRight, which is a virtual mirror image to WRP in implementation, but with entirely different semantics, guarantees, problem statement, and domain (see long back and forth in the comments section of my WRP blog post) uses striped counters in some implementations for exactly the reason you suggest.

Reply all

Reply to author

Forward