[nodejs] Throughput performance of node as an HTTP proxy

Peter Griess

unread,

May 11, 2010, 11:44:18 AM5/11/10

to nodejs

Hi,

I've written a simple node.js app to test throughput as an HTTP proxy when requests to the origin server have various latencies. I'm a bit surprised to find that increasing origin server latency has such a pronounced impact on proxy throughput (see attached graph).

Does anyone have any insight as to why this might be the case?

Briefly, at 0ms origin latency I can sustain 13700 reqs/s, at 500ms 9500 reqs/s, at 1000ms 5800 reqs/s, at 1500ms 4010 reqs/s, and at 2000ms 3010 reqs/s. If node.js were perfectly efficient, I'd have expected throughput to remain the same as origin latency increased.

I've got sar(1) and netstat(1) output in case anyone is interested. At 0ms latency, we saturate all 8 CPUs on the proxy, with about 55% idle CPU on the origin. As latency increases, we do an increasingly poor job of saturating the CPU (13% idle at 500ms, 45% idle at 1000ms, etc).

My testing setup is as follows: all hosts are on the same GigE subnet, with 6 dedicated client hosts, 1 dedicated proxy host, and 1 dedicated origin host. Each host is running Linux kernel 2.6.9-78EL x86_64 (an old rhel-4.x distribution which I'm stuck with, I'm afraid) and node's embedded libev is selecting the EVBACKEND_EPOLL backend. The test uses httperf to generate load against the proxy at increasing connection rates (e.g. 10000 conn/s, 11000 conn/s, etc), with one request per connection until increasing load does not increase throughput. This value is taken to be the maximum throughput for the latency value being tested. Both proxy and origin are running the experimental preforking implementation that I sent out a patch for a week or so back.

Source for both node.js apps is attached, with proxy.js being the proxy, and echo.js being the origin server.

Peter

--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.

echo.js

proxy.js

latency.png

Uberbrady

unread,

May 11, 2010, 1:00:38 PM5/11/10

to nodejs

Very interesting. I have used Node for similar stuff in the past.
Didn't bench it anywhere near so thoroughly, though.

What happens if you take out the preforking on both sides?

-B.

> echo.js
> 1KViewDownload
>
> proxy.js
> 1KViewDownload
>
> latency.png
> 4KViewDownload

r...@tinyclouds.org

unread,

May 11, 2010, 2:10:58 PM5/11/10

to nod...@googlegroups.com

Hi Peter,

Great benchmark and it's nice to see the prefork stuff in action!

On Tue, May 11, 2010 at 8:44 AM, Peter Griess <p...@std.in> wrote:
> Hi,
>
> I've written a simple node.js app to test throughput as an HTTP proxy when
> requests to the origin server have various latencies. I'm a bit surprised to
> find that increasing origin server latency has such a pronounced impact on
> proxy throughput (see attached graph).
>
> Does anyone have any insight as to why this might be the case?
>
> Briefly, at 0ms origin latency I can sustain 13700 reqs/s, at 500ms 9500
> reqs/s, at 1000ms 5800 reqs/s, at 1500ms 4010 reqs/s, and at 2000ms 3010
> reqs/s. If node.js were perfectly efficient, I'd have expected throughput to
> remain the same as origin latency increased.

If I understand it correctly both the proxy server and origin are
holding more and more concurrent requests as you increase the latency?
If so I would expect some sort of performance degradation - if what
you're experiencing is good or bad is unclear. Do you have a similar
server that could be tested against? (What if you replace the node
proxy with HAproxy?)

> I've got sar(1) and netstat(1) output in case anyone is interested. At 0ms
> latency, we saturate all 8 CPUs on the proxy, with about 55% idle CPU on the
> origin. As latency increases, we do an increasingly poor job of saturating
> the CPU (13% idle at 500ms, 45% idle at 1000ms, etc).
>
> My testing setup is as follows: all hosts are on the same GigE subnet, with
> 6 dedicated client hosts, 1 dedicated proxy host, and 1 dedicated origin
> host.

Wow - that's a nice setup :-)

Peter Griess

unread,

May 11, 2010, 2:24:23 PM5/11/10

to nodejs

Good question, Brady.

I don't think the preforking is having a negative impact, as the only difference between this and running a single stand-alone process is that all processes are sharing the same bound socket and calling accept(2). Otherwise, the code is identical. I'd expect throughput to be 1/8 (these are 8-way boxes) of the preforking implementation across the board.

That said, not seeing that expected behavior would certainly be a smoking gun of sorts ;) If nothing else jumps out, I'll give that a shot and report back.

Peter

Peter Griess

unread,

May 11, 2010, 2:43:40 PM5/11/10

to nodejs

On Tue, May 11, 2010 at 1:10 PM, <r...@tinyclouds.org> wrote:

If I understand it correctly both the proxy server and origin are
holding more and more concurrent requests as you increase the latency?
If so I would expect some sort of performance degradation - if what
you're experiencing is good or bad is unclear. Do you have a similar
server that could be tested against? (What if you replace the node
proxy with HAproxy?)

Yes, that's right.

At any given point in time when handling 10,000 reqs/s with 500ms latency, the proxy will have 5,000 upstream requests which are blocked on 5,000 downstream requests to the origin, so 10,000 file descriptors open. At 1000ms latency and 10,000 reqs/s these numbers should double to 20,000 required file descriptors.

I'm expecting degradation proportional to the scalability of epoll. It allegedly scales O(1) with the number of file descriptors, so I was expecting to see throughput stay the same as latency increased. Maybe this is not the reality of epoll, or at least on such an old kernel.

Do you think HAproxy in particular is a good comparable? I was going to use nginx (or maybe YTS, which is the flavor of the decade around here).

Peter

Jan Schütze

unread,

May 12, 2010, 11:26:46 AM5/12/10

to nod...@googlegroups.com

I would be interested in results compared to nginx!

--

http://dracoblue.net

Peter Griess

unread,

May 12, 2010, 3:15:56 PM5/12/10

to nodejs

Nginx is able to handle 17100 reqs/s at 0ms latency and 13900 reqs/s at 1000ms latency. During these tests, the origin is 35% idle.

Caveat: I've never used Nginx before. It's very possible that I'm not tuning this very well. I've attached my nginx.conf in case anyone wants to take a shot at tweaking the configuration to help performance.

Peter

2010/5/12 Jan Schütze <Ja...@dracoblue.de>

nginx.conf

Peter Griess

unread,

May 12, 2010, 3:53:53 PM5/12/10

to nodejs

I should mention that with this configuration Nginx is unable to saturate the CPU. I'd be completely unsurprised if we couldn't squeeze more performance out of this set up and bottleneck on the origin.

Peter

r...@tinyclouds.org

unread,

May 12, 2010, 3:58:17 PM5/12/10

to nod...@googlegroups.com

On Wed, May 12, 2010 at 12:15 PM, Peter Griess <p...@std.in> wrote:
> Nginx is able to handle 17100 reqs/s at 0ms latency and 13900 reqs/s at
> 1000ms latency. During these tests, the origin is 35% idle.

Okay - so Node is definitely doing something wrong. Too bad this is
linux, it'd be nice to do some dtracing. I wonder if doing
"setNoDelay()" changes it?

> Caveat: I've never used Nginx before. It's very possible that I'm not tuning
> this very well. I've attached my nginx.conf in case anyone wants to take a
> shot at tweaking the configuration to help performance.

Look okay to me?

> I should mention that with this configuration Nginx is unable to saturate the
> CPU. I'd be completely unsurprised if we couldn't squeeze more performance

> out of this set up and bottleneck on the origin.

You've got 16 workers - using one per core is probably better.

Peter Griess

unread,

May 12, 2010, 4:46:20 PM5/12/10

to nodejs

Man, Nagle's a regular whipping boy around here of late ;) I'll give that a shot and report back.

I went with 2 workers per core as I was worried about upping the FD count per worker too high. I'll try downing the worker count and upping FD count to (more than) cover the difference and see what happens.

Peter

Louis Santillan

unread,

May 12, 2010, 6:50:00 PM5/12/10

to nod...@googlegroups.com

Have you benchmarked httpd.asm from linuxassembly.org? Not as
featureful as some other http daemons but extremely lightweight and
fast.

-L

Isaac Schlueter

unread,

May 12, 2010, 6:55:17 PM5/12/10

to nod...@googlegroups.com

Got a link to http.asm? I can't seem to find it on linuxassembly.org.

Louis Santillan

unread,

May 12, 2010, 7:09:22 PM5/12/10

to nod...@googlegroups.com

http://asm.sourceforge.net/
http://asm.sourceforge.net/asmutils.html
http://github.com/leto/asmutils

-L

Peter Griess

unread,

May 13, 2010, 2:46:15 PM5/13/10

to nodejs

Setting setNoDelay(true) didn't make any difference (I added this at the top of http.js`connectionListener()), either at 0ms or 1000ms latency.

Playing with Nginx max_workers didn't change results much there either.

Peter

Reply all

Reply to author

Forward