--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/26259f10-a18c-45c1-a247-5356424bd096%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/C296B1F6-90D7-451A-A6FB-A8E909AB40B4%40earthlink.net.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/587f4e91-c3fc-4f56-96a2-81755f8efe72%40googlegroups.com.
<output.png>
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/2C3483F0-FB62-4E59-B69F-01B71F74E4B8%40earthlink.net.
Hi Kostis,
One tool you might find useful is FlameGraph, which will visualize data collected from perf (https://github.com/brendangregg/FlameGraph).
I will describe the in process transport architecture a bit so you get a better idea of what gRPC overheads are included in your measurements. The architecture centers around the following ideas:Some possible performance optimizations for gRPC/ in process transport:
- Avoid serialization, framing, wire-formatting
- Transfer metadata and messages as slices/slice-buffers, unchanged from how they enter the transport (note that while this avoids serializing from slices to HTTP2 frames, this still performs serialization from protos to byte buffers)
- Avoid polling or other external notification
- Each side of a stream directly triggers the scheduling of other side’s operation completion tags
- Maintain communication and concurrency model of gRPC core
- No direct invocation of procedures from opposite side of stream
- No direct memory sharing; data shared only as RPC requests and responses
- Optimized implementations of structs for small cases
- E.g., investigate more efficient completion queue for small # of concurrent events
- Where can we replace locks with atomics or avoid atomics altogether
For tiny messages over the in process transport, it should be feasible to get a few microseconds of latency, but it may not be possible with moderately sized messages because of serialization/deserialization costs between proto and ByteBuffer.
Hope this helps!
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/1dcd55d8-e4fd-4a64-ab00-e6328b38a0f7%40googlegroups.com.