Fwd: Re: Ubuntu packages of zerogw

87 views
Skip to first unread message

ad...@x-simulator.de

unread,
Aug 10, 2012, 7:16:42 AM8/10/12
to zer...@googlegroups.com

Hi Paul,

> In the meantime I've upgraded ubuntu packages, so they should work.


great i will check it out today :)

I have been very busy the last weeks with my community x-simulator.de.
But now i continue with zerogw.

Something about my personal project goal:

I plan to use zerogw for a scalable pushserver that needs to handle at
least 50.000 concurrent connections on one server for a mobile phone app
for Android and iOS

Requirements:
- At least 50.000 concurrent connections per server / The more the
better (200.000 to 300.000), but i know that would lead to a lot of
problems and the complexity rises:
http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1
- easy Scalable to handle more user later
- Send 1byte roughly every 10minutes is enough - websocket overhead not
counted (business logic stays completely in the mobile app)
- Server must be able to handle a lot of concurrent and fast reconnects.
Probably 50% of the 50.000 user wants to reconnect the same time within
a very short time. After roughly 10min they will kill the connection
(job has been done) and will reconnect after some time, when they decide
to do.

My preliminary thoughts were to use zeromq and object c and java
bindings (ios and android) for high performance serving but that lacks
on my only poor experience with c and java, and i had to create the app
businesslogic also in java an c.
I want to "finish" the project asap not in month:)

So i hope to reach the goal with zerogw.

Do you know anything about zerogw related to performance in a
high-serving envirement or any issues that need to be fixed when i try that?

I am also not sure if i should switch to the c language for the
clustering backend or if python is enough.

Btw. I like to share my progress and the finished project completely and
credits go to zerogw and his author:)

Best regards
Ren�





Paul Colomiets

unread,
Aug 10, 2012, 8:22:17 PM8/10/12
to ad...@x-simulator.de, zer...@googlegroups.com
Hi,

On Fri, Aug 10, 2012 at 2:14 PM, ad...@x-simulator.de
<ad...@x-simulator.de> wrote:
> I plan to use zerogw for a scalable pushserver that needs to handle at least
> 50.000 concurrent connections on one server for a mobile phone app for
> Android and iOS
>

That's cool. Haven't tried it for anything except web browser.

> Requirements:
> - At least 50.000 concurrent connections per server / The more the better
> (200.000 to 300.000), but i know that would lead to a lot of problems and
> the complexity rises:
> http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1

From zerogw perspective number of connections is not a problem (be
sure to tune max-websockets). The general OS tuning part should be
done however (ulimit, fd, buffer sizes, port range ...). The
performance is bound by messages per second, not by idle connections.

> - easy Scalable to handle more user later

You can run multiple zerogw instances/servers (there is a caveat for
long polling, but for web sockets no problems)

> - Send 1byte roughly every 10minutes is enough - websocket overhead not
> counted (business logic stays completely in the mobile app)

Zerogw could run 20000 messages per second on my old core 2 duo
laptop. So it's OK for about 1M clients for your message rate.

> - Server must be able to handle a lot of concurrent and fast reconnects.
> Probably 50% of the 50.000 user wants to reconnect the same time within ao
> very short time.

Not sure I understand this. You mean 25000 reconnects per second?
Haven't tested yet. Feel free to benchmark, and I'll fix if that
doesn't satisfy you.

> My preliminary thoughts were to use zeromq and object c and java bindings
> (ios and android) for high performance serving but that lacks on my only
> poor experience with c and java, and i had to create the app businesslogic
> also in java an c.
> I want to "finish" the project asap not in month:)
>

Now you'll do it in JS, right?

> So i hope to reach the goal with zerogw
>
> Do you know anything about zerogw related to performance in a high-serving
> envirement or any issues that need to be fixed when i try that?
>

I've benchmarked message rate (see below), and it's ok. I haven't done
benchmarks for reconnection rate and too many connections (IIRC 20000
was ok), but there should be no problems.

For 50000 connections, you need to set max-websockets in zerogw.
ulimit on number of files, and probably either listen on at least 2
IPs or make ephemeral port range bigger (only 32000 ports by default),
you may need more IPs if you have big reconnection rate (because of
TIMED_WAIT connections)

> I am also not sure if i should switch to the c language for the clustering
> backend or if python is enough.
>

It depends very much on the task. I get 1000 messages per second
processed (input) per cpu-process in python on code similar to tabbed
chat (which includes redis lookup on each request). Output message
rate is bigger as zeromq sends messages asynchronously and in separate
non-pythonic (C) thread, and zerogw can offload some work of message
fan out to multiple users. Note also python scales to multiple CPUs
only only by spawning multiple processes (It's ok, but you must design
with this in mind)

> Btw. I like to share my progress and the finished project completely and
> credits go to zerogw and his author:)
>

Nice.

--
Paul

ad...@x-simulator.de

unread,
Aug 18, 2012, 12:01:39 PM8/18/12
to zer...@googlegroups.com

Hi Paul,

i make progress after hours of studying the source and find some issues
after testing a lot.

> You can run multiple zerogw instances/servers (there is a caveat for
> long polling, but for web sockets no problems)

Think i could do any load balancing with a small proxy in front of zerogw?


> Now you'll do it in JS, right?

That�s right, and it works very well:)

> Zerogw could run 20000 messages per second on my old core 2 duo
> laptop. So it's OK for about 1M clients for your message rate.

I made first http benchmarks to see the possible (static) response rate.
Testing the message sending rate comes later.

wrk is my choice - perfect for multicore systems:
https://github.com/wg/wrk/

I was testing on a Core I3@3,07 Ghz 10GB DDR within virtualbox on Ubuntu.
Virtualbox settings: 2Core and 4GB for the guest
Host is win 7

./wrk -t1 -c 200 -r100k http://localhost:8000/

Making 100000 requests to http://localhost:8000/
1 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.74ms 3.02ms 12.65ms 65.22%
Req/Sec 26.30k 0.88k 27.00k 95.65%
100050 requests in 3.59s, 17.46MB read
Requests/sec: 27879.43
Transfer/sec: 4.87MB

I repeat the test often with a lot of modified parameters and
recognized that zerogw does not make excessive use of the second core.
Could it be?

The average rate never has been more than round about 30.000 req/sec.
(not bad at all) No matter if wrk ran with 1 or 2 threads so i assume
this is the maximum rate i can achive on that system.

Next, i did the same test with http server GWAN: http://gwan.com/

I compared with gwan because it does not need any tuning on parameters,
it�s incredible fast, it is small and run�s with just one click.

One thread:

./wrk -t1 -c 200 -r100k http://localhost:8080/

Making 100000 requests to http://localhost:8080/
1 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 592.00us 619.21us 1.33ms 68.75%
Req/Sec 41.81k 0.91k 43.00k 68.75%
100046 requests in 2.40s, 27.00MB read
Socket errors: connect 0, read 0, write 0, timeout 100
Requests/sec: 41606.95
Transfer/sec: 11.23MB

There are some socket errors (timeout) due reaching some file
descriptors limit (ran a lot of tests after another)

i than repeat the test with two threads of wrk

./wrk -t2 -c 200 -r100k http://localhost:8080/

the request rate per second was nearly 70.000 than (can show the exact
results at monday on the other workstation). So it�s obviously i reached
with only 1 thread the limit of wrk and not gwan.

I tried to tune zerogw, compiled it with cflags like -o2 and -o3 and
-pthread, because the source is referencing to pthread.h, but the waf
script does not contain the pthread flag. The request rate did not get
any better. Is it not programmed multithreaded? I did not find any posix
functions in the source.

I am not a c/c++ programmer, so it would be great when you show me the
right way:) I would not surpised when you tell me that i am completely
wrong with testing only the static request rate and not the message
rate, but i think the regular request rate must be also as fast as possible.

For benchmarking i used the chat.py example and had to modify the
zerogw.yaml a bit for proper work (the original zerogw.yaml was not
working with chat.py):

############################################
Server:
zmq-io-threads: 1
disk-io-threads: 1000
listen:
- host: 0.0.0.0
port: 8000
control:
socket:
- !zmq.Bind ipc:///tmp/zerogw-ctl
max-requests: 40000
network-timeout: 1
error-log:
level: 0
warning-timeout: 300
#filename: /home/adminuser/projekte/zerogw/examples/zerogw_error.log
mime-types:
extra:
yaml: text/x-yaml
mov: application/x-movable
no-warnings: yes


Estp:
socket: !zmq.Pub
- !zmq.Bind ipc:///tmp/zgwestp.sock
interval: 5

Routing:
routing: !Prefix ~
routing-by: !Uri ~

map:
"/":
static:
enabled: yes
root: ./
restrict-root: no # bad for production
index-file: websocket.html
deny-suffixes:
- .py
- .yaml
deny-prefixes:
- "."
"/chat*":
websocket:
heartbeat-interval: 0
enabled: no
forward:
- !zmq.Bind "tcp://127.0.0.1:7002"
subscribe:
- !zmq.Bind "tcp://127.0.0.1:7003"

children:
- match:
- "/js/*"
- "/ws/*"
static:
enabled: yes
root: ./
restrict-root: no # bad for production
############################################

Hope you find some time to think about it.

Regards, Ren�



Zitat von Paul Colomiets <pa...@colomiets.name>:
> --
> You received this message because you are subscribed to the Google
> Groups "Zerogw" group.
> To post to this group, send email to zer...@googlegroups.com.
> To unsubscribe from this group, send email to
> zerogw+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Paul Colomiets

unread,
Aug 18, 2012, 7:15:16 PM8/18/12
to zer...@googlegroups.com
Hi Rene,

On Sat, Aug 18, 2012 at 7:01 PM, ad...@x-simulator.de
<ad...@x-simulator.de> wrote:
>> You can run multiple zerogw instances/servers (there is a caveat for
>> long polling, but for web sockets no problems)
>
>
> Think i could do any load balancing with a small proxy in front of zerogw?
>

There are couple of options:

1. Proxy. Note you need something supporting websockets (e.g. haproxy,
or dumb tcp load balancer).
2. Zerogw instance per IP with:
a) DNS load balancing
b) choosing random host at client
3. Put several instances of zerogw on single socket (you can do this
with procboss, or I'll commit startup script in python soon)

Note that options (2a) and (3) has problems with fallback of
websockets to long polling, but I don't think it's relevant for your
application.

> wrk is my choice - perfect for multicore systems:
> https://github.com/wg/wrk/
>
> I was testing on a Core I3@3,07 Ghz 10GB DDR within virtualbox on Ubuntu.
> Virtualbox settings: 2Core and 4GB for the guest
> Host is win 7
>
> ./wrk -t1 -c 200 -r100k http://localhost:8000/
>
> Making 100000 requests to http://localhost:8000/
> 1 threads and 200 connections
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 5.74ms 3.02ms 12.65ms 65.22%
> Req/Sec 26.30k 0.88k 27.00k 95.65%
> 100050 requests in 3.59s, 17.46MB read
> Requests/sec: 27879.43
> Transfer/sec: 4.87MB
>
> I repeat the test often with a lot of modified parameters and recognized
> that zerogw does not make excessive use of the second core. Could it be?
>
> The average rate never has been more than round about 30.000 req/sec. (not
> bad at all) No matter if wrk ran with 1 or 2 threads so i assume this is
> the maximum rate i can achive on that system.
>


Sure. Zerogw is single threaded for most of the work. It uses threads
for the following:
1. Disk IO, it's for disk-bound applications, so has low CPU usage
2. Zeromq IO, just because zeromq works this way, it's usually not a
bottleneck either

If you really need multiple cores, run multiple instances of zerogw as
outlined above


> Next, i did the same test with http server GWAN: http://gwan.com/
>
> I compared with gwan because it does not need any tuning on parameters, it´s
> incredible fast, it is small and run´s with just one click.
>
> One thread:
>
> ./wrk -t1 -c 200 -r100k http://localhost:8080/
>
> Making 100000 requests to http://localhost:8080/
> 1 threads and 200 connections
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 592.00us 619.21us 1.33ms 68.75%
> Req/Sec 41.81k 0.91k 43.00k 68.75%
> 100046 requests in 2.40s, 27.00MB read
> Socket errors: connect 0, read 0, write 0, timeout 100
> Requests/sec: 41606.95
> Transfer/sec: 11.23MB
>
> There are some socket errors (timeout) due reaching some file descriptors
> limit (ran a lot of tests after another)
>
> i than repeat the test with two threads of wrk
>
> ./wrk -t2 -c 200 -r100k http://localhost:8080/
>
> the request rate per second was nearly 70.000 than (can show the exact
> results at monday on the other workstation). So it´s obviously i reached
> with only 1 thread the limit of wrk and not gwan.
>

IIRC, gwan is superior in serving static in the following two ways:
1. It's multithreaded
2. It caches files in the memory

BTW, both are also not implemented in much more popular nginx server,
I believe it's because it's either not that good or not that useful in
practice, comparing to the simple benchmarks.


> I tried to tune zerogw, compiled it with cflags like -o2 and -o3 and
> -pthread, because the source is referencing to pthread.h, but the waf script
> does not contain the pthread flag. The request rate did not get any better.
> Is it not programmed multithreaded? I did not find any posix functions in
> the source.
>

Pthreads are used to create disk threads and inside zeromq. Probably
compiler is smart enough to link against pthreads as a dependency. I'm
not sure how optimization flags are useful for zerogw.

> I am not a c/c++ programmer, so it would be great when you show me the right
> way:) I would not surpised when you tell me that i am completely wrong with
> testing only the static request rate and not the message rate, but i think
> the regular request rate must be also as fast as possible.
>

It's not exactly the regular request rate for zerogw. Neither for
nginx, and most other web servers. You can hardcode response in config
to see something more raw (see crossdomain.xml route in example
config)

Because of single threaded nature of zerogw, and the potential
slowness of disk io, zerogw offloads all disk accesses to IO threads.
This has the overhead on the benchmarks. But in practice, when disk is
slow and request misses the disk cache in nginx all the requests on
this process (e.g. 1/4 of requests if you have 4 workers), will wait
until disk request satisfied (even ones don't need disk). In zerogw
only disk/static requests will wait. In benchmarks nginx hits cache
always, and has less context(thread) switches, so it's faster. (I'm
talking about nginx here because I do not know exactly how gwan works
in this respect)

And yes, raw request rate is not interesting anyway. Some big social
games have reported they have 5k request per second with 1 million
daily users. So you may need about 10 zerogw instances to serve whole
facebook.com (speculating here, but you've got the point)

> For benchmarking i used the chat.py example and had to modify the
> zerogw.yaml a bit for proper work (the original zerogw.yaml was not working
> with chat.py):

Will look into that shortly

--
Paul

Paul Colomiets

unread,
Aug 18, 2012, 8:36:49 PM8/18/12
to zer...@googlegroups.com
Hi Rene,

On Sun, Aug 19, 2012 at 2:15 AM, Paul Colomiets <pa...@colomiets.name> wrote:
> Hi Rene,
>
> On Sat, Aug 18, 2012 at 7:01 PM, ad...@x-simulator.de
> <ad...@x-simulator.de> wrote:
>>> You can run multiple zerogw instances/servers (there is a caveat for
>>> long polling, but for web sockets no problems)
>>
>>
>> Think i could do any load balancing with a small proxy in front of zerogw?
>>
>
> 3. Put several instances of zerogw on single socket (you can do this
> with procboss, or I'll commit startup script in python soon)
>

Done. There is example script at:

examples/tabbedchat/single_socket.sh

>> For benchmarking i used the chat.py example and had to modify the
>> zerogw.yaml a bit for proper work (the original zerogw.yaml was not working
>> with chat.py):pr
>
> Will look into that shortly
>

There are two issues:
1. I've used to run examples from root of the project not from examples dir
2. Some benchmarks do not send port number in host header (IIRC, `ab`
has the same problem)

So example config feels right ATM.

BTW, thanks for doing and publishing the benchmarks

--
Paul
Reply all
Reply to author
Forward
0 new messages