I'm trying to reconcile differences I'm seeing in results between a ping-pong test, and an under-load test with an mps set to an equivalent level, and reply-every = 1.
For example, results of a ping-pong test show a half RTT of 10us. This would mean a message is being sent from the client roughly every 20us, or at a rate of 50,000 mps.
I would expect that running an under load test, configured for mps=50,000 and reply-every=1 would produce very similar results. However, I'm finding that instead the under load tests show a 20-25% increase in latency, with half RTT ~12-12.5us.
Now, even if I drop the reply-every to 100, so that client->server traffic is the same, but now server->client traffic is 1% of what it was, the same latency increase is present.
Even more extreme, I can lower the mps=10,000 and reply-every=100, so that traffic in both directions is significantly reduced, and still see the increased latency compared to a ping-pong test.
I'm trying to better understand how the application handles the two tests differently that might account for this.
It would also be worth mentioning that when running these tests, but client threads (rx and tx) are bound to their own core via the command line arguments, as is the server thread.