HTTP/1 vs HTTP/2 performance

dtehr...@twilio.com

unread,

Aug 17, 2017, 2:05:16 PM8/17/17

to envoy-users

Hi folks,

I'm seeing some unexpected performance results w/Envoy's HTTP/2 proxying that I'm not able to figure out.

In summary: I'm seeing higher latencies (mean, median, p95, p99) via HTTP/2 than with HTTP/1. I thought that the persistent connections w/H2 would improve things...

Test Setup:

* Using two `c3.2xlarge`s within the same AZ in AWS. 8 CPUs, 1Gb network.

* Using Vegeta as the client-side test-driver: https://github.com/tsenart/vegeta

* Running Vegta at 1,000 HTTP reqs/sec.

* Using nginx as the server. The only thing the nginx server does is a `return 200;` on "/". Very simple and performant.

* HTTP/1 setup: Envoy-to-nginx.

* Admittedly there is one less hop (Envoy) in this configuration. Using this as the config because it mirrors how we're using HAProxy today.

* HTTP/2: Envoy-to-Envoy in between Vegeta & nginx, doing HTTP/1.1->HTTP/2 proxying.

* A very recent build of Envoy (c7004069ffd2654b5a7c7f76340f5266b12d8c9f/Clean/RELEASE)

Test Execution:

* Running vegeta with `vegeta attack -duration=60s -rate 1000 -workers 10`

* I observe via `netstat` that the persistent TCP connections are created for HTTP/2. Cool.

* On the ingress box, I observe via `tcpdump` that ingress traffic on `eth0` is indeed HTTP/2, and also that traffic on `lo` is HTTP/1. Looks like everything is wired up correctly.

* I've run this many times over the past 2 days so results are repeatable.

JSON configs, and output from /stats & /clusters are attached. Also attached is a chart of test results for 1,000 reqs/sec, but the charts for 5k & 10k rps are of similar shapes. Only attaching 1K rps for now to avoid confusing.

Is my testing methodology wrong, or my Envoys misconfigured? I've tried tuning various Envoy buffer/circuit breaking/generate_request_id settings, with negligible improvements.

Thanks for any help,

Dan

chart.png

egress-clusters.txt

egress-envoy-config.json

egress-stats.txt

ingress-clusters.txt

ingress-envoy-config.json

ingress-stats.txt

Daniel Hochman

unread,

Aug 17, 2017, 2:35:48 PM8/17/17

to dtehr...@twilio.com, envoy-users

Took a quick look.

In the HTTP/1 test, Envoy is running client (egress) side, is that correct? I'm not seeing stats for that portion of the test.

Because the request itself is so fast, there's probably not a lot of connection churn on HTTP/1 or concurrent requests even at 1K/s. I wonder if that's contributing to the results of this test.

Daniel Hochman

Engineer

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.
To post to this group, send email to envoy...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/94611f60-12b6-42ee-9a18-adbf9db683c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Klein

unread,

Aug 17, 2017, 3:11:08 PM8/17/17

to Daniel Hochman, Dan Tehranian, envoy-users

Yeah, per Daniel, I think the workload you have may be faster w/ HTTP/1. The reason is that you probably are not getting much benefit (if any) from header compression, HTTP/2 framing adds overhead, and with the low number of clients, Envoy is likely opening up enough concurrent HTTP/1.1 connections that everything is happening in parallel.

This is not really an Envoy issue per say but I think really an HTTP/1 vs. HTTP/2 issue. (Obviously there might be some issue here related to Envoy but I kind of doubt it).

To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/CABM3oOMS-uzyj_oMQ4STe12X5Y3iOff-r0mo0%2BrCQPTY8cNg_A%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Matt Klein

Software Engineer

mkl...@lyft.com

dtehr...@twilio.com

unread,

Aug 17, 2017, 4:08:17 PM8/17/17

to envoy-users, dhoc...@lyft.com, dtehr...@twilio.com

Hi Daniel & Matt, thanks of the replies.

re: missing stats for HTTP/1 egress - Sorry about that. New file attached.

re: header compression - Yea, were talking about that as a possibility as well. I will try to grab sample headers from our prod env and re-test with that.

Let me ask this: If one wanted to construct a test that allowed the benefits of Envoy's HTTP/2 proxy to truly shine, what should be added? Sounds like additional HTTP headers are needed. Maybe a POST payload and some JSON response?

Thanks,

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/94611f60-12b6-42ee-9a18-adbf9db683c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/CABM3oOMS-uzyj_oMQ4STe12X5Y3iOff-r0mo0%2BrCQPTY8cNg_A%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

egress-stats-2.txt

Matt Klein

unread,

Aug 17, 2017, 6:49:30 PM8/17/17

to Dan Tehranian, envoy-users, Daniel Hochman

Let me ask this: If one wanted to construct a test that allowed the benefits of Envoy's HTTP/2 proxy to truly shine, what should be added? Sounds like additional HTTP headers are needed. Maybe a POST payload and some JSON response?

For internal datacenter use cases, HTTP/2 will be better when you have a large mesh of many backends and high request volume. Instead of having to keep alive multiple connections to each backend you will only need a single connection (balancing memory usage vs. blocking waiting for the connection pool). The other major benefit is that when there is a server error/timeout, instead of having to kill the connection, it can be kept alive because the stream is reset but the connection stays alive. IMO this is the primary benefit. To expose this you would need to start injecting errors into the system and see how it effects overall latency.

I suspect there are some workloads where header compression will help within datacenter, but this is a balance of smaller data vs. compression/decompression time. I'm not an expert on this. There are people on this list that probably have a better idea of when/where HTTP/2 is going to be faster for a single request/response within DC.

To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/d6e3eba2-8c00-419e-bb68-0bb012003f9d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward