Yes, I ensured that fortio client ans server in our test setup can handle much higher throughpout without Envoy proxy (about 300K QPS).
Meanwhile, I received a suggestion from Zizon Qiu to configure "connection_balance_config" of the listener to more aggressively spread work across workers.
This suggestion did help. Setting "connection_balance_config" to "exact_balance" helped to spread load across workers better.
- Now the throughput scales much better, e.g. 70K QPS for 10 Envoy workers instead of 30K QPS for 10 Envoy workers. Scaling Envoy to more workers utilizes near 100% of CPUs.
- I can see the difference of load spread between workers by comparing the "/stats" from the admin interface before configuring "connection_balance_config" and after configuring it.
Before configuring "connection_balance_config":
listener.0.0.0.0_10000.worker_0.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_0.downstream_cx_total: 0
listener.0.0.0.0_10000.worker_1.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_1.downstream_cx_total: 0
listener.0.0.0.0_10000.worker_2.downstream_cx_active: 1
listener.0.0.0.0_10000.worker_2.downstream_cx_total: 1
listener.0.0.0.0_10000.worker_3.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_3.downstream_cx_total: 0
listener.0.0.0.0_10000.worker_4.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_4.downstream_cx_total: 0
listener.0.0.0.0_10000.worker_5.downstream_cx_active: 28
listener.0.0.0.0_10000.worker_5.downstream_cx_total: 28
listener.0.0.0.0_10000.worker_6.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_6.downstream_cx_total: 0
listener.0.0.0.0_10000.worker_7.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_7.downstream_cx_total: 0
listener.0.0.0.0_10000.worker_8.downstream_cx_active: 112
listener.0.0.0.0_10000.worker_8.downstream_cx_total: 112
listener.0.0.0.0_10000.worker_9.downstream_cx_active: 115
listener.0.0.0.0_10000.worker_9.downstream_cx_total: 115
After configuring "connection_balance_config":
listener.0.0.0.0_10000.worker_0.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_0.downstream_cx_total: 26
listener.0.0.0.0_10000.worker_1.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_1.downstream_cx_total: 26
listener.0.0.0.0_10000.worker_2.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_2.downstream_cx_total: 26
listener.0.0.0.0_10000.worker_3.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_3.downstream_cx_total: 26
listener.0.0.0.0_10000.worker_4.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_4.downstream_cx_total: 25
listener.0.0.0.0_10000.worker_5.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_5.downstream_cx_total: 26
listener.0.0.0.0_10000.worker_6.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_6.downstream_cx_total: 25
listener.0.0.0.0_10000.worker_7.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_7.downstream_cx_total: 25
listener.0.0.0.0_10000.worker_8.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_8.downstream_cx_total: 26
listener.0.0.0.0_10000.worker_9.downstream_cx_active: 0
listener.0.0.0.0_10000.worker_9.downstream_cx_total: 25
Thanks a lot for help and for suggestion about "wrk" for load testing.
Maxim.