Hi folks,
I'm seeing some unexpected performance results w/Envoy's HTTP/2 proxying that I'm not able to figure out.
In summary: I'm seeing higher latencies (mean, median, p95, p99) via HTTP/2 than with HTTP/1. I thought that the persistent connections w/H2 would improve things...
Test Setup:
* Using two `c3.2xlarge`s within the same AZ in AWS. 8 CPUs, 1Gb network.
* Running Vegta at 1,000 HTTP reqs/sec.
* Using nginx as the server. The only thing the nginx server does is a `return 200;` on "/". Very simple and performant.
* HTTP/1 setup: Envoy-to-nginx.
* Admittedly there is one less hop (Envoy) in this configuration. Using this as the config because it mirrors how we're using HAProxy today.
* HTTP/2: Envoy-to-Envoy in between Vegeta & nginx, doing HTTP/1.1->HTTP/2 proxying.
* A very recent build of Envoy (c7004069ffd2654b5a7c7f76340f5266b12d8c9f/Clean/RELEASE)
Test Execution:
* Running vegeta with `vegeta attack -duration=60s -rate 1000 -workers 10`
* I observe via `netstat` that the persistent TCP connections are created for HTTP/2. Cool.
* On the ingress box, I observe via `tcpdump` that ingress traffic on `eth0` is indeed HTTP/2, and also that traffic on `lo` is HTTP/1. Looks like everything is wired up correctly.
* I've run this many times over the past 2 days so results are repeatable.
JSON configs, and output from /stats & /clusters are attached. Also attached is a chart of test results for 1,000 reqs/sec, but the charts for 5k & 10k rps are of similar shapes. Only attaching 1K rps for now to avoid confusing.
Is my testing methodology wrong, or my Envoys misconfigured? I've tried tuning various Envoy buffer/circuit breaking/generate_request_id settings, with negligible improvements.
Thanks for any help,
Dan