Benchmarking NXWEB against G-WAN

755 views
Skip to first unread message

Yaroslav

unread,
Dec 13, 2011, 6:16:54 AM12/13/11
to nx...@googlegroups.com
My tests show there is no difference for gwan between serving small static file (100.html) and simplest dynamic content (benchmark.c script). It could be that static content is even faster. Which means that gwan is completely caching small files in RAM.

Although caching is nice to have feature, its use in benchmarking makes comparizon unfair. In real life there won't be single file on the server. Requests will come for different files, which might be not reasonable/possible to cache.

nxweb does not cache anything.

Here is benchmark.c script for g-wan:

#include "gwan.h"
int main(int argc, char *argv[]) {
  xbuf_t *reply=get_reply(argv);
  xbuf_cat(reply, "<p>Hello, world!</p>");
  return 200;
}

I really like the simplicity. And on-the-fly compilation is very nice feature as well.

nxweb module requires config descriptor at the end of file:

#include "nxweb/nxweb.h"
static nxweb_result benchmark(nxweb_uri_handler_phase phase, nxweb_request *req) {
  nxweb_response_append(req, "<p>Hello, world!</p>");
  return NXWEB_OK;
}
static const nxweb_uri_handler hello_module_uri_handlers[] = {
  {"/benchmark", benchmark, NXWEB_INPROCESS|NXWEB_HANDLE_GET},
  {0, 0, 0}
};

Marcel Wijnen

unread,
Jun 23, 2012, 3:28:35 PM6/23/12
to nx...@googlegroups.com
Hi Yaroslav,

I've been playing around with the hello.c script and tried to do some performance tests. I'm running on intel i5 (four cores). I performed my initial benchmark with "ab" and "weighttp". I've looked at NGINX and NXWEB and my requests per second are of the order of magniture of 10 [K#/sec]. I'm running from localhost, so no network bottleneck. Are there other limiting factors that I'm missing here?

I don't really get why I'm a factor of 10-20 slower than your benchmark. Can you give me some direction with this? Thanks in advance.

Best,
Marcel

Yaroslav

unread,
Jun 23, 2012, 4:25:30 PM6/23/12
to nx...@googlegroups.com
Hi Marcel,

Are you running Linux on hardware or in virtual machine? My benchmarks were on hardware. I was running Ubuntu 11.10 x64 for those measurements.

I recommend using httpress for benchmarking. ab typically gives much lower results as it is single threaded. And weighttp results are similar to httpress but deviate more from one measurement to another.

Use /benchmark-inprocess handler in nxweb for max results.

Check cpu load when running tests. Cpu should be limiting factor for max results. If it is not fully loaded then there must be some other bottlenecks, which are hard to predict without knowing your setup.

Note also that max results could not be achieved if you are running only one concurrent connection, or single thread, or small number of requests. Check httpress options that I used on benchmarking page.

Yaroslav

Marcel Wijnen

unread,
Jun 23, 2012, 5:13:48 PM6/23/12
to nx...@googlegroups.com
Wow thanks for the quick reply!

I'm on hardware (ubuntu 11.10) and just tried with /benchmark-inprocess. My results are still not similar to your old benchmark of 73/69 [#K/sec], which you performed with ab.

================================================================================================
TestNXWEBG-WANlibeventmicrohttpdmongoosenginx
1. hello 100 ka73 / 696430 / 75NA6037

Minimal handler returning '<p>Hello, world!</p>'; 100 concurrent, keep-alive (ab -c 100 -n 320000 -k)
================================================================================================

My CPU's are all on 100% during the benchmark. I'm using the following commands for the benchmark for a single and four threads.

weighttp -n 100000 -c 100 -t 1 -k "http://localhost/8055/benchmark-inprocess"
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 100000 total requests
progress:  10% done
progress:  20% done
progress:  30% done
progress:  40% done
progress:  50% done
progress:  60% done
progress:  70% done
progress:  80% done
progress:  90% done
progress: 100% done

finished in 7 sec, 552 millisec and 548 microsec, 13240 req/s, 4731 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 36595255 bytes total, 21495255 bytes http, 15100000 bytes data

weighttp -n 100000 -c 100 -t 4 -k "http://localhost/8055/benchmark-inprocess"
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 25 concurrent requests, 25000 total requests
spawning thread #2: 25 concurrent requests, 25000 total requests
spawning thread #3: 25 concurrent requests, 25000 total requests
spawning thread #4: 25 concurrent requests, 25000 total requests
progress:  10% done
progress:  20% done
progress:  30% done
progress:  40% done
progress:  50% done
progress:  60% done
progress:  70% done
progress:  80% done
progress:  90% done
progress: 100% done

finished in 7 sec, 833 millisec and 841 microsec, 12765 req/s, 4561 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 36595220 bytes total, 21495220 bytes http, 15100000 bytes data

I can try httpress but if the main difference is deviation as you point out, I don't expect that this will close the gap.

Am I interpreting things in the wrong way or still missing something?




 



On Saturday, June 23, 2012 10:25:30 PM UTC+2, Yaroslav wrote:
Hi Marcel,

Are you running Linux on hardware or in virtual machine? My benchmarks were on hardware. I was running Ubuntu 11.10 x64 for those measurements.

I recommend using httpress for benchmarking. ab typically gives much lower results as it is single threaded. And weighttp results are similar to httpress but deviate more from one measurement to another.

Use /benchmark-inprocess handler in nxweb for max results.

Check cpu load when running tests. Cpu should be limiting factor for max results. If it is not fully loaded then there must be some other bottlenecks, which are hard to predict without knowing your setup.

Note also that max results could not be achieved if you are running only one concurrent connection, or single thread, or small number of requests. Check httpress options that I used on benchmarking page.

Yaroslav
I've been playing around with the hello.c script and tried to do some performance tests. I'm running on intel i5 (four cores). I performed my initial benchmark with "ab" and "weighttp". I've looked at NGINX and NXWEB and my requests per second are of the order of magniture of 10 [K#/sec]. I'm running from localhost, so no network bottleneck. Are there other limiting factors that I'm missing here?

I don't really get why I'm a factor of 10-20 slower than your benchmark. Can you give me some direction with this? Thanks in advance.

Best,
Marcel



 



On Tuesday, December 13, 2011 12:16:54 PM UTC+1, Yaroslav wrote:
My tests show there is no difference for gwan between serving small static file (100.html) and simplest dynamic content (benchmark.c script). It could be that static content is even faster. Which means that gwan is completely caching small files in RAM.

Although caching is nice to have feature, its use in benchmarking makes comparizon unfair. In real life there won't be single file on the server. Requests will come for different files, which might be not reasonable/possible to cache.

nxweb does not cache anything.

Here is benchmark.c script for g-wan:

#include "gwan.h"
int main(int argc, char *argv[]) {
  xbuf_t *reply=get_reply(argv);
  xbuf_cat(reply, "<p>Hello, world!</p>");
  return 200;
}

I really like the simplicity. And on-the-fly compilation is very nice feature as well.

nxweb module requires config descriptor at the end of file:

#include "nxweb/nxweb.h"
static nxweb_result benchmark(nxweb_uri_handler_phase phase, nxweb_request *req) {
  nxweb_response_append(req, "<p>Hello, world!</p>");
  return NXWEB_OK;
}
static const nxweb_uri_handler hello_module_uri_handlers[] = {
  {"/benchmark", benchmark, NXWEB_INPROCESS|NXWEB_HANDLE_GET},
  {0, 0, 0}
};

 
Hi Yaroslav,

Yaroslav

unread,
Jun 23, 2012, 5:27:09 PM6/23/12
to nx...@googlegroups.com
Marcel,

I've noticed you have slash (/) in url instead of colon (:) in front of port number 8055.

If you installed nxweb from source code without modifications, the command should look like this:

weighttp -n 100000 -c 100 -t 4 -k "http://localhost:8055/benchmark-inprocess"

It seems you have something responding to your requests on port 80 with data size of 151 bytes. Do you have some other web server listening on that port?

nxweb's /benchmark-inprocess typically responds with 20 bytes of data. Try opening url you are benchmarking in standard browser. nxweb will say: <p>Hello, world!</p>

Yaroslav

Marcel Wijnen

unread,
Jun 23, 2012, 5:53:36 PM6/23/12
to nx...@googlegroups.com
Yaroslav,

Thanks for your reply. You were right. Now I get 10K for NON-keep alive and 39K for keep alive mode. So that's a lot better!

Does this mean that the last factor of the difference is mainly related to the difference in processors? I'm running on 2.67 [GHz] x 4.

I will look at using your benchmark tool as well, but first it is time for some Saturday evening drinks. Thanks for your help!

Thanks,
Marcel




On Saturday, June 23, 2012 11:27:09 PM UTC+2, Yaroslav wrote:
Marcel,

I've noticed you have slash (/) in url instead of colon (:) in front of port number 8055.

If you installed nxweb from source code without modifications, the command should look like this:

weighttp -n 100000 -c 100 -t 4 -k "http://localhost:8055/benchmark-inprocess"

It seems you have something responding to your requests on port 80 with data size of 151 bytes. Do you have some other web server listening on that port?

nxweb's /benchmark-inprocess typically responds with 20 bytes of data. Try opening url you are benchmarking in standard browser. nxweb will say: <p>Hello, world!</p>

Yaroslav

Yaroslav

unread,
Jun 23, 2012, 6:03:18 PM6/23/12
to nx...@googlegroups.com
On Sun, Jun 24, 2012 at 1:53 AM, Marcel Wijnen <mmfww...@gmail.com> wrote:
Yaroslav,

Thanks for your reply. You were right. Now I get 10K for NON-keep alive and 39K for keep alive mode. So that's a lot better!

Does this mean that the last factor of the difference is mainly related to the difference in processors? I'm running on 2.67 [GHz] x 4.

It could be. I've got new PC this spring (i7-2600 @ 3.4GHz), so now I measure about half a million req/sec.

Marcel Wijnen

unread,
Jun 23, 2012, 6:18:28 PM6/23/12
to nx...@googlegroups.com
I still have some doubts about my setup. If I look at your latest benchmark

TestNXWEB




1. hello 100 ka200 / 121




2. hello 10051 / 42





And my results ~50K for keep-alive mode and 10 for non-keep-alive mode.

I've briefly looked up some CPU specifics and I think you will probably have a similar CPU. This makes me wonder whether I actually have 4 threads running, since my results are still ~4 times slower than yours.

I can see that all four CPU's are running in the performance monitor, but I'm not sure that I actually have four thread running for the NXWEB server. Do you have an idea how I can check this?



Thanks for your reply. You were right. Now I get 10K for NON-keep alive and 39K for keep alive mode. So that's a lot better!

Does this mean that the last factor of the difference is mainly related to the difference in processors? I'm running on 2.67 [GHz] x 4.
Yaroslav,

Yaroslav

unread,
Jun 24, 2012, 2:22:44 AM6/24/12
to nx...@googlegroups.com
Nxweb writes config parameters into log upon startup. Check it. It should look like this:

NXWEB startup: pid=1234 net_threads=4 ...

Could you please provide more details about your hardware. Like CPU model number, memory clock speed, etc. Is it laptop or desktop PC?

Have you checked if your system isn't short on memory, is it using swap or not?

Marcel Wijnen

unread,
Jun 24, 2012, 10:21:39 AM6/24/12
to nx...@googlegroups.com

Hi Yaroslav,

Here my output from the server:
2012-06-24 14:05:43 [2659:0xb74ebb50]: nxweb binding :8055 for http
2012-06-24 14:05:43 [2659:0xb74ebb50]: NXWEB startup: pid=2659 net_threads=4 pg=4096 short=2 int=4 long=4 size_t=4 evt=32 conn=1092 req=164 td=1064
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler benchmark_inworker [100] registered for url: /benchmark-inworker
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler benchmark [100] registered for url: /benchmark-inprocess
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler test [900] registered for url: /test
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler hello [1000] registered for url: /hello
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler nxweb_8777 [10000] registered for url: /8777
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler java_test [10000] registered for url: /java-test
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler sendfile [900000] registered for url: (null)
2012-06-24 14:05:43 [2659:0xb74ebb50]: handler default [999999999] registered for url: (null)
2012-06-24 14:05:43 [2659:0xb74ebb50]: module cache [0] successfully initialized
2012-06-24 14:05:43 [2659:0xb74ebb50]: using default request dispatcher

From your remark I conclude that indeed four threads are started.

Here my output from the benchmark:
weighttp -n 1000000 -c 100 -t 4 -k http://localhost:8055/benchmark-inprocess

weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 25 concurrent requests, 250000 total requests
spawning thread #2: 25 concurrent requests, 250000 total requests
spawning thread #3: 25 concurrent requests, 250000 total requests
spawning thread #4: 25 concurrent requests, 250000 total requests
progress:  10% done
...
progress: 100% done

finished in 18 sec, 261 millisec and 105 microsec, 54761 req/s, 9091 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 170000000 bytes total, 150000000 bytes http, 20000000 bytes data

I was wondering whether the status code says something about the process. Because there are four of those, I figured that might be related to number of processes. There is 2xx, 3xx, 4xx, 5xx and all messages seem to be related to 2xx.

From my performance monitor I can see that the four CPU's are at 80[%] during the entire benchmark, this is a flat line for all four. My swap memory (2[Gib]) is at zero percent and my memory usage is about 700[MB] which is approximately 17[%] of 3.9[Gib]. The swap and and memory don't seem to change during the benchmark. Should the swap memory be used?

My processor data: (There are four of those)
processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 37
model name      : Intel(R) Core(TM) i5 CPU       M 580  @ 2.67GHz
stepping        : 5
cpu MHz         : 1199.000
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 2
apicid          : 4
initial apicid  : 4
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 5321.05
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Since you are stating that your current performance is around 500[K#/s] and mine is about 50[K#/s], my naive assumption is that your benchmark should be approx 30% faster based on the difference in CPU. This means that there must be another reason for the difference.

Hope these data help to give more insight in the differences. Could you help me a bit further in resolving this. Your help is much appreciated!

Thanks,
Marcel

 

On Sunday, June 24, 2012 8:22:44 AM UTC+2, Yaroslav wrote:
Nxweb writes config parameters into log upon startup. Check it. It should look like this:

NXWEB startup: pid=1234 net_threads=4 ...

Could you please provide more details about your hardware. Like CPU model number, memory clock speed, etc. Is it laptop or desktop PC?

Have you checked if your system isn't short on memory, is it using swap or not?

Yaroslav

unread,
Jun 24, 2012, 10:54:44 AM6/24/12
to nx...@googlegroups.com
Hi Marcel,

i5-580M - is mobile processor, i.e. you are using laptop, right?

Mobile processors & laptops are generally much slower than desktops. CPU frequency is not the only measure of processor speed; there are lots of other factors. Generally mobile solutions concentrate on power saving instead of performance. So don't expect them to be anywhere close to desktops when it comes to benchmarks.

Status codes are HTTP status codes. 200 means "OK". All 2xx - that's the way it should be.

Swap memory zero - this is good. Performance drops significantly when system starts using swap.

80% CPU might mean nothing. Have you ever observed 100% cpu load on your PC? While encoding video or doing other computationally intensive task? Some CPUs just can't go 100%. I used to have one of that kind. There are many factors in PC that might limit performance. It could be related to motherboard, chipset, memory throughput, power saving mode, etc.

There could also be some linux kernel parameters that might affect performance. Try googling about linux kernel setup for network benchmarking. There are lots of tweaks, while my experience says they do not help very much. But it might worth trying.

Yaroslav

Marcel Wijnen

unread,
Jun 24, 2012, 11:30:44 AM6/24/12
to nx...@googlegroups.com
Hi Yaroslav,

Thanks for your help! I guess this is good news. My main concern was about the possible frequency and the correct setup of your server on my machine. I won't spend time optimizing the benchmark on my computer is it is well known that desktops are in general much quicker. Thanks for confirming that all the performance parameters are behaving as they are expected.

Thanks again for your support, next thing for me will be trying to set it all up with ssl. This will probably take a while since I have never done this and need to go through all the basic stuff. Do you have an estimate of how much of the frequency will be lost when including ssl?

Somewhat unrelated. I saw that the name of your repository is Yarosla and your name is Yaroslav. I've heard once that in Russian names are written differently in different contexts. Is it correct when I say "Hi Yaroslav"?

For now, have a nice evening.

Best,
Marcel




On Sunday, June 24, 2012 4:54:44 PM UTC+2, Yaroslav wrote:
Hi Marcel,

i5-580M - is mobile processor, i.e. you are using laptop, right?

Mobile processors & laptops are generally much slower than desktops. CPU frequency is not the only measure of processor speed; there are lots of other factors. Generally mobile solutions concentrate on power saving instead of performance. So don't expect them to be anywhere close to desktops when it comes to benchmarks.

Status codes are HTTP status codes. 200 means "OK". All 2xx - that's the way it should be.

Swap memory zero - this is good. Performance drops significantly when system starts using swap.

80% CPU might mean nothing. Have you ever observed 100% cpu load on your PC? While encoding video or doing other computationally intensive task? Some CPUs just can't go 100%. I used to have one of that kind. There are many factors in PC that might limit performance. It could be related to motherboard, chipset, memory throughput, power saving mode, etc.

There could also be some linux kernel parameters that might affect performance. Try googling about linux kernel setup for network benchmarking. There are lots of tweaks, while my experience says they do not help very much. But it might worth trying.

Yaroslav


Yaroslav

unread,
Jun 24, 2012, 12:34:59 PM6/24/12
to nx...@googlegroups.com
On Sun, Jun 24, 2012 at 7:30 PM, Marcel Wijnen <mmfww...@gmail.com> wrote:
Hi Yaroslav,

Thanks for your help! I guess this is good news. My main concern was about the possible frequency and the correct setup of your server on my machine. I won't spend time optimizing the benchmark on my computer is it is well known that desktops are in general much quicker. Thanks for confirming that all the performance parameters are behaving as they are expected.

Thanks again for your support, next thing for me will be trying to set it all up with ssl. This will probably take a while since I have never done this and need to go through all the basic stuff. Do you have an estimate of how much of the frequency will be lost when including ssl?

There are my SSL benchmarks on the Benchmarks page. SSL performance largely depends on chosen cipher suite. Choosing right cipher suite is quite extensive topic, so I suggest you google about it.
 

Somewhat unrelated. I saw that the name of your repository is Yarosla and your name is Yaroslav. I've heard once that in Russian names are written differently in different contexts. Is it correct when I say "Hi Yaroslav"?

This is correct. 'yaroslav' name was occupied so I dropped last letter when registering my account.

Marcel Wijnen

unread,
Jun 24, 2012, 12:38:32 PM6/24/12
to nx...@googlegroups.com
Thanks!

Marcel

unread,
Jun 26, 2012, 4:21:58 PM6/26/12
to nx...@googlegroups.com
Yaroslav,

Possibly interesting for you to know. After I upgraded to Ubuntu 12.04 and refreshed all my installed packages my benchmark became a lot faster. So for people with Laptop systems it is still possible to get quite high numbers.

Without keep alive mode I now get :=>  40 [K#/s] requests
And with the keep alive mode :=> 150 [K#/s] requests

So if I now only correct for CPU clock speed I would expect ~200 [K#/s] requests on my laptop if I were to use Intel I7. That leaves a gap of your benchmark being approx. 2.5 times faster, which is related to other factors like you mentioned. This gap is at least a lot smaller compared to the initial factor of 10.  

Thanks,
Marcel

Yaroslav

unread,
Jun 26, 2012, 4:35:04 PM6/26/12
to nx...@googlegroups.com
This is funny. By the way on my new PC I also run Ubuntu 12.04 :)

There could be some kernel optimizations in newer release. Or better default config settings.
Reply all
Reply to author
Forward
0 new messages