Hi Rene,
On Sat, Aug 18, 2012 at 7:01 PM,
ad...@x-simulator.de
<
ad...@x-simulator.de> wrote:
>> You can run multiple zerogw instances/servers (there is a caveat for
>> long polling, but for web sockets no problems)
>
>
> Think i could do any load balancing with a small proxy in front of zerogw?
>
There are couple of options:
1. Proxy. Note you need something supporting websockets (e.g. haproxy,
or dumb tcp load balancer).
2. Zerogw instance per IP with:
a) DNS load balancing
b) choosing random host at client
3. Put several instances of zerogw on single socket (you can do this
with procboss, or I'll commit startup script in python soon)
Note that options (2a) and (3) has problems with fallback of
websockets to long polling, but I don't think it's relevant for your
application.
> wrk is my choice - perfect for multicore systems:
>
https://github.com/wg/wrk/
>
> I was testing on a Core I3@3,07 Ghz 10GB DDR within virtualbox on Ubuntu.
> Virtualbox settings: 2Core and 4GB for the guest
> Host is win 7
>
> ./wrk -t1 -c 200 -r100k
http://localhost:8000/
>
> Making 100000 requests to
http://localhost:8000/
> 1 threads and 200 connections
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 5.74ms 3.02ms 12.65ms 65.22%
> Req/Sec 26.30k 0.88k 27.00k 95.65%
> 100050 requests in 3.59s, 17.46MB read
> Requests/sec: 27879.43
> Transfer/sec: 4.87MB
>
> I repeat the test often with a lot of modified parameters and recognized
> that zerogw does not make excessive use of the second core. Could it be?
>
> The average rate never has been more than round about 30.000 req/sec. (not
> bad at all) No matter if wrk ran with 1 or 2 threads so i assume this is
> the maximum rate i can achive on that system.
>
Sure. Zerogw is single threaded for most of the work. It uses threads
for the following:
1. Disk IO, it's for disk-bound applications, so has low CPU usage
2. Zeromq IO, just because zeromq works this way, it's usually not a
bottleneck either
If you really need multiple cores, run multiple instances of zerogw as
outlined above
> Next, i did the same test with http server GWAN:
http://gwan.com/
>
> I compared with gwan because it does not need any tuning on parameters, it´s
> incredible fast, it is small and run´s with just one click.
>
> One thread:
>
> ./wrk -t1 -c 200 -r100k
http://localhost:8080/
>
> Making 100000 requests to
http://localhost:8080/
> 1 threads and 200 connections
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 592.00us 619.21us 1.33ms 68.75%
> Req/Sec 41.81k 0.91k 43.00k 68.75%
> 100046 requests in 2.40s, 27.00MB read
> Socket errors: connect 0, read 0, write 0, timeout 100
> Requests/sec: 41606.95
> Transfer/sec: 11.23MB
>
> There are some socket errors (timeout) due reaching some file descriptors
> limit (ran a lot of tests after another)
>
> i than repeat the test with two threads of wrk
>
> ./wrk -t2 -c 200 -r100k
http://localhost:8080/
>
> the request rate per second was nearly 70.000 than (can show the exact
> results at monday on the other workstation). So it´s obviously i reached
> with only 1 thread the limit of wrk and not gwan.
>
IIRC, gwan is superior in serving static in the following two ways:
1. It's multithreaded
2. It caches files in the memory
BTW, both are also not implemented in much more popular nginx server,
I believe it's because it's either not that good or not that useful in
practice, comparing to the simple benchmarks.
> I tried to tune zerogw, compiled it with cflags like -o2 and -o3 and
> -pthread, because the source is referencing to pthread.h, but the waf script
> does not contain the pthread flag. The request rate did not get any better.
> Is it not programmed multithreaded? I did not find any posix functions in
> the source.
>
Pthreads are used to create disk threads and inside zeromq. Probably
compiler is smart enough to link against pthreads as a dependency. I'm
not sure how optimization flags are useful for zerogw.
> I am not a c/c++ programmer, so it would be great when you show me the right
> way:) I would not surpised when you tell me that i am completely wrong with
> testing only the static request rate and not the message rate, but i think
> the regular request rate must be also as fast as possible.
>
It's not exactly the regular request rate for zerogw. Neither for
nginx, and most other web servers. You can hardcode response in config
to see something more raw (see crossdomain.xml route in example
config)
Because of single threaded nature of zerogw, and the potential
slowness of disk io, zerogw offloads all disk accesses to IO threads.
This has the overhead on the benchmarks. But in practice, when disk is
slow and request misses the disk cache in nginx all the requests on
this process (e.g. 1/4 of requests if you have 4 workers), will wait
until disk request satisfied (even ones don't need disk). In zerogw
only disk/static requests will wait. In benchmarks nginx hits cache
always, and has less context(thread) switches, so it's faster. (I'm
talking about nginx here because I do not know exactly how gwan works
in this respect)
And yes, raw request rate is not interesting anyway. Some big social
games have reported they have 5k request per second with 1 million
daily users. So you may need about 10 zerogw instances to serve whole
facebook.com (speculating here, but you've got the point)
> For benchmarking i used the chat.py example and had to modify the
> zerogw.yaml a bit for proper work (the original zerogw.yaml was not working
> with chat.py):
Will look into that shortly
--
Paul