Scaling Envoy's throughput

1,297 views
Skip to first unread message

Maksim Vaisbrot

unread,
May 12, 2021, 6:31:42 AM5/12/21
to envoy-users
Hi,
We are trying to scale Envoy's throughput (number of queries per second).
We expect the throughput to be increased as we use more worker threads for Envoy proxy (with "--concurrency" flag) - until CPU or network bottleneck reached.

Test setup is as following:
- Two nodes (24 CPU cores each one) connected with 10GE inetrfaces.
- Workload is presented by fortio client on one node, and fortio server on second node.
- Envoy proxy is running on server-side node (e.g. "sudo envoy --concurrency X -c envoy-config.yaml"). Envoy configuration is attached.
- Client generates load with the following command: "fortio load  -jitter=True -c 256 -qps 0 -t 120s -a -r 0.0001   -httpbufferkb=128 http://10.10.1.12:10000/echo?size=1024"

However, we observe the throughput is increased while scaling up to 3 Envoy workers . Scaling beyond it, doe not increase throughput.
Throughput is the same (about 30K requests per second) for Envoy with 3 workers, for Envoy with 10 workers or for Envoy with 24 workers.
During this benchmark the CPUs bottleneck is not reached (about 35% utilization of all CPU cores) and also the networking bottleneck is not reached (less then 10%).

Is there some configuration that can fix this behavior? How could we reach higher throughput utilizing full node CPU/networking capacity?

Thanks in advance,
Maxim.

envoy-config.yaml

Radim Vansa

unread,
May 12, 2021, 11:09:34 AM5/12/21
to envoy...@googlegroups.com
Hi,

you seem to be assuming that fortio (both client and server) can handle
arbitrarily high throughput. Do you have a baseline without envoy proxy?

From my experience if all you care about is throughput you can use wrk
[1] (binary available in many distributions). As for the replying side,
you can certainly configure another Envoy instance to serve a static
resource, or use HAProxy for that - the incantation would be

http-request return status 200 content-type "text/plain" string "Hello
world"

Cheers

Radim

[1] https://github.com/wg/wrk
> --
> You received this message because you are subscribed to the Google
> Groups "envoy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to envoy-users...@googlegroups.com
> <mailto:envoy-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/envoy-users/ffefa8fd-b003-44e2-919c-c4478d008332n%40googlegroups.com
> <https://groups.google.com/d/msgid/envoy-users/ffefa8fd-b003-44e2-919c-c4478d008332n%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
Radim Vansa <rva...@redhat.com>
Middleware Performance Team

Maksim Vaisbrot

unread,
May 12, 2021, 11:36:08 AM5/12/21
to envoy-users
Hi Radim,

Yes, I ensured that fortio client ans server in our test setup can handle much higher throughpout without Envoy proxy (about 300K QPS).

Meanwhile, I received a suggestion from Zizon Qiu to configure "connection_balance_config" of the listener to more aggressively spread work across workers.
This suggestion did help. Setting "connection_balance_config" to "exact_balance" helped to spread load across workers better.
- Now the throughput scales much better, e.g. 70K QPS for 10 Envoy workers instead of 30K QPS for 10 Envoy workers. Scaling Envoy to more workers utilizes near 100% of CPUs.
- I can see the difference of load spread between workers by comparing the "/stats" from the admin interface before configuring "connection_balance_config" and after configuring it.
  Before configuring "connection_balance_config": 

listener.0.0.0.0_10000.worker_0.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_0.downstream_cx_total: 0 
listener.0.0.0.0_10000.worker_1.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_1.downstream_cx_total: 0 
listener.0.0.0.0_10000.worker_2.downstream_cx_active: 1 
listener.0.0.0.0_10000.worker_2.downstream_cx_total: 1 
listener.0.0.0.0_10000.worker_3.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_3.downstream_cx_total: 0 
listener.0.0.0.0_10000.worker_4.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_4.downstream_cx_total: 0 
listener.0.0.0.0_10000.worker_5.downstream_cx_active: 28 
listener.0.0.0.0_10000.worker_5.downstream_cx_total: 28 
listener.0.0.0.0_10000.worker_6.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_6.downstream_cx_total: 0 
listener.0.0.0.0_10000.worker_7.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_7.downstream_cx_total: 0 
listener.0.0.0.0_10000.worker_8.downstream_cx_active: 112 
listener.0.0.0.0_10000.worker_8.downstream_cx_total: 112 
listener.0.0.0.0_10000.worker_9.downstream_cx_active: 115 
listener.0.0.0.0_10000.worker_9.downstream_cx_total: 115

  After configuring "connection_balance_config":

listener.0.0.0.0_10000.worker_0.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_0.downstream_cx_total: 26 
listener.0.0.0.0_10000.worker_1.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_1.downstream_cx_total: 26 
listener.0.0.0.0_10000.worker_2.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_2.downstream_cx_total: 26 
listener.0.0.0.0_10000.worker_3.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_3.downstream_cx_total: 26 
listener.0.0.0.0_10000.worker_4.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_4.downstream_cx_total: 25 
listener.0.0.0.0_10000.worker_5.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_5.downstream_cx_total: 26 
listener.0.0.0.0_10000.worker_6.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_6.downstream_cx_total: 25 
listener.0.0.0.0_10000.worker_7.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_7.downstream_cx_total: 25 
listener.0.0.0.0_10000.worker_8.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_8.downstream_cx_total: 26 
listener.0.0.0.0_10000.worker_9.downstream_cx_active: 0 
listener.0.0.0.0_10000.worker_9.downstream_cx_total: 25

Thanks a lot for help and for suggestion about "wrk" for load testing.
Best Regards,
Maxim.
Reply all
Reply to author
Forward
0 new messages