I'm working on an application that has a fan-out - load generator
communicates with the middle tier, which then communicates with leaf
nodes.
As a baseline, I would like to look at the throughput and end-to-end
latency associated with empty gRPC communications. In this specific
case, it happens to be a ping-ping-pong-pong (1 leaf node). I know
that the latency of a synchronous gRPC ping-pong is ~190 microseconds.
So, I should expect to see a ping-ping-pong-pong latency of ~380
microseconds. But, I want to look at the latency associated with
various load conditions.
I see from a closed loop test that such an "empty-packet"
communication system achieves a saturation throughput when there are
100K outstanding requests. This seems like a ridiculously high number
of outstanding requests. Also, I would expect 100K outstanding
requests to lower achieved throughput, since queuing delays must
dominate at this point. If the cost of a ping-ping-pong-pong is 380
microseconds, and assuming that the load generator takes 1 microsecond
(say) to send a request to the middle-tier server, I must achieve
close to saturation throughput when there are ~380 outstanding
requests? That doesn't seem to be true, though.
Therefore, I would like to look at the *number of queued tags*, when
there are 100K outstanding requests vs. 380 outstanding requests, to
see why such a large number of outstanding requests are necessary to
achieve a saturation throughput.
Regards,
Akshitha Sriraman
Ph.D. Candidate
Computer Science | University of Michigan, Ann Arbor