Efficient in-memory buffer of messages?

Vitali Lovich

unread,

Aug 20, 2019, 10:53:30 AM8/20/19

to Cap'n Proto

Hi, I'm trying to use cap'n'proto as want efficient in-memory store of events that adds as little overhead as possible to the act of creating an event. Where I'm at is that I have per thread buffers that I in-place allocate a MallocMessageBuilder into a 512 buffer and use the remainder as scratch. I then defer serialization and destruction onto a background thread that flushes when there is a listener for the event (no listeners registered, simply overwrite). This works fine and performs fairly well (~300ns overhead per event on PC, maybe 1000ns on Android).

Is there a way I can write a builder to live on the stack and not allocate it in the ring buffer in this setup? The challenges I ran into are that the builder zeroes on destruction and mandates that it's initialized with zeroed memory.

I'm ok if this adds a restriction on the available arena size which I don't actually have now (since the builder itself is *very* at 200+ bytes relative to the events). I would still like to be able to add more data to the message in the background thread like I do now (eg process name, pid, etc) that is event-agnostic and can be cheaper to defer filling that in to the background listener delivery thread.

Kenton Varda

unread,

Aug 21, 2019, 1:46:03 PM8/21/19

to Vitali Lovich, Cap'n Proto

Hi Vitali,

Instead of using MallocMessageBuilder, have you tried writing a custom subclass of MessageBuilder? It should allow you to customize these things more precisely.

Note that *someone* has to zero out memory before Cap'n Proto can start allocating from it (because Cap'n Proto structures all expect to be zero-initialized anyway, so as an optimizaiton we assume memory is zero'd in advance), but with a custom MessageBuilder you have more control over when it happens.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CAF8PYMg54h9js%2BRiVhx4TA4DKiLty8iLZiRCUc21iKbmq7fhDQ%40mail.gmail.com.

Vitali Lovich

unread,

Aug 21, 2019, 2:03:27 PM8/21/19

to Kenton Varda, Cap'n Proto

I have not. I thought MessageBuilder constructor requires zero'ed memory so this kind of split construction to "resume" the arena isn't possible with a custom subclass but maybe I'm wrong?

And yes I agree that I need to zero that memory but I do that on a background low priority thread after I serialize (the goal is to offload as much as possible and capture/write only the unique data that's irreplaceable at the tracepoint). For example I even avoid capturing the TID at the tracepoint because that's implicitly available when I submit that threads events for serialization.

Vitali Lovich

unread,

Aug 21, 2019, 2:05:04 PM8/21/19

to Kenton Varda, Cap'n Proto

Oh. I'm very wrong. The check is in MallocMessageBuilder. I'll try my custom builder.

Kenton Varda

unread,

Aug 21, 2019, 2:20:52 PM8/21/19

to Vitali Lovich, Cap'n Proto

Hmm, is it actually a performance win to offload memory-zeroing to another thread? I would think moving the cache lines between the two cores would cost more than the zeroing itself.

-Kenton

To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CAF8PYMj435gc8xEgKS%3D%2BD%2BOiAFwTPfejiLHTM%3Dv%3D7cZkGYTkQA%40mail.gmail.com.

Vitali Lovich

unread,

Aug 22, 2019, 1:25:37 AM8/22/19

to Kenton Varda, Cap'n Proto

So the perf win (in theory) is about offloading the serialization step/persistence to a background thread. Since that's already happening there makes sense to zero it out before returning it back to the original thread to do something with. And I realize maybe it's possible to not actually need a serialization step since in theory it should be "free" but it looks like there's not built-in APIs that provide this?

The general theory of the system is to have 2 TLS buffers so that the "fast-path" for storing an event doesn't need to acquire any locks or anything. Just save some data into RAM. When TLS buffer fills up we notify the background thread that this buffer should be flushed and then switch to the other TLS buffer (which currently requires taking a lock but could be lock free later)

If you have any tips on better approaches to try that would keep the trace point even more minimal cost than this would gladly accept your input.

Kenton Varda

unread,

Aug 23, 2019, 2:39:09 PM8/23/19

to Vitali Lovich, Cap'n Proto

Ah, ok, that makes sense!

-Kenton

Reply all

Reply to author

Forward