SocksJS performance with a lot of users

464 views
Skip to first unread message

AD

unread,
Oct 16, 2012, 9:32:15 AM10/16/12
to sockjs
Hello,

 We are looking at ways of optimizing our workflow.  Currently we have a RabbitMQ consumer that listens for messages, and we have all Conn objects stored in an ETS table.  When a message comes in, we scan the ETS table for the proper subscriptions and then do a lists:foreach( Conn:send() ) on each applicable record.

 What we are seeing is with about 100,000 users, it takes about 500-600 ms to send those messages with about a 300 byte payload.  This isnt bad, but i was just wondering if there is a more efficient way of sending to all users.

 The problem we are trying to solve is to minimize message delivery latency.  If it takes 500ms to send a message to 100k users, thats about 2 msg/sec.  If the incoming message rate is say 4 msg/sec, there will be  about 500ms of latency and will just increase from there linearly.  Obviously we can scale the hardware horizontally to reduce the number of dispatches but just curious of any feedback on optimizations here.

Thanks
-AD

script hawk

unread,
Oct 16, 2012, 11:15:27 PM10/16/12
to soc...@googlegroups.com
I don't know if my understanding is correct. Assume you have 10 dispatchers each having 10,000 users, when you broadcast a message, you end up sending the same message to each dispatcher 10,000 times. Is that what is happening?

Marek Majkowski

unread,
Oct 17, 2012, 5:01:36 AM10/17/12
to straig...@gmail.com, sockjs
Ah, right.

First, stock version of sockjs-erlang is not really optimized. For
example a json
encoder is run for every outgoing connection... mrjoes proposed (and
did for sockjs-tornado)
an API call that optimizes that, something like sockjs.send([Conn, ...], msg).

Additionally, I know that Gleber had some ideas on optimizing sockjs-erlang for
large scale, but I don't think I've ever seen the code.

Finally, well, you might just want to shard, right? Have, say, 20k connections
per Erlang VM (or host or whatever). That means, except for
initial message delivery via Rabbit, you should have an average latency
of around 50ms, right?

Cheers,
Marek

Gleb Peregud

unread,
Oct 17, 2012, 6:42:58 AM10/17/12
to maj...@gmail.com, straig...@gmail.com, sockjs
On Wed, Oct 17, 2012 at 11:01 AM, Marek Majkowski <maj...@gmail.com> wrote:
> Additionally, I know that Gleber had some ideas on optimizing sockjs-erlang for
> large scale, but I don't think I've ever seen the code.

I have the code - it's fast and it works in production. But it's hard
to get approval for open-sourcing it from the client, but I am still
working on it.

AD

unread,
Oct 17, 2012, 10:28:10 AM10/17/12
to Gleb Peregud, maj...@gmail.com, sockjs
Yes we can shard, but i would like to know we can support 100k users easily with low latency.  Until then our max user capacity is way lower (to trade off for latency)

I re-read the thread with Gleb and understand there are some concerns on the IP on his optimizations. I am happy to help contribute here if I can just get a better steer on the work that needs to be done.

Is the JSON encoding the biggest pain point of the optimizations or are there other areas to focus on ?

-AD

Marek Majkowski

unread,
Oct 17, 2012, 10:33:59 AM10/17/12
to AD, Gleb Peregud, sockjs
On Wed, Oct 17, 2012 at 3:28 PM, AD <straig...@gmail.com> wrote:
> Yes we can shard, but i would like to know we can support 100k users easily
> with low latency. Until then our max user capacity is way lower (to trade
> off for latency)
>
> I re-read the thread with Gleb and understand there are some concerns on the
> IP on his optimizations. I am happy to help contribute here if I can just
> get a better steer on the work that needs to be done.
>
> Is the JSON encoding the biggest pain point of the optimizations or are
> there other areas to focus on ?

In order:

1) Json encoding could be optimized if you're doing broadcasts (ie: send the
same message to more than one destination)
2) SockJS-erlang could be re-architected a bit to reduce number
of erlang messages being exchanged in order to send a single
message out. Also, we could reduce number of erlang processes
(especially for native WS).

Gleb, anything else?

Marek

AD

unread,
Oct 17, 2012, 10:46:17 AM10/17/12
to Marek Majkowski, Gleb Peregud, sockjs
For #1, would that be as simple as implementing a send_multi so that the json encoding happens once?

trying to backtrack through the code, looks like send() calls Rpid ! go which then falls into here

 go ->
                                  Req1 = sockjs_http:unhook_tcp_close(Req0),
                                  reply_loop(Req1, SessionId, ResponseLimit,
                                             Fmt, Service)

And all the pain is in the sockjs_util:encode_frame() work ?

Gleb Peregud

unread,
Oct 17, 2012, 12:54:03 PM10/17/12
to AD, Marek Majkowski, sockjs
On Wed, Oct 17, 2012 at 4:46 PM, AD <straig...@gmail.com> wrote:
> For #1, would that be as simple as implementing a send_multi so that the
> json encoding happens once?

Yes. Or, as I did, split message preparation/encoding and sending into
two parts. Make sure that prepared message contains just 3-4 binaries
with content necessary for all available transports - raw json string,
quoted json string, double quoted json string. Sending between
processes such tuple with just few binaries is very fast, since
binaries are not copied, but only referenced.

> trying to backtrack through the code, looks like send() calls Rpid ! go
> which then falls into here
>
> go ->
> Req1 = sockjs_http:unhook_tcp_close(Req0),
> reply_loop(Req1, SessionId, ResponseLimit,
> Fmt, Service)
>
> And all the pain is in the sockjs_util:encode_frame() work ?

I've stripped all the code related to "go" and "reply" mechanisms and
simplified websocket transport to just one process, etc - as Majek
said.

Those two optimizations were enough to give a huge boost to the server
in terms of broadcast speed. HTH

AD

unread,
Oct 17, 2012, 2:09:12 PM10/17/12
to Gleb Peregud, Marek Majkowski, sockjs
Meaning you dont spawn a new sockjs_session for every connection ?

Gleb Peregud

unread,
Oct 17, 2012, 2:13:14 PM10/17/12
to AD, Marek Majkowski, sockjs
On Wed, Oct 17, 2012 at 8:09 PM, AD <straig...@gmail.com> wrote:
> Meaning you dont spawn a new sockjs_session for every connection ?

I do. But that process "attaches" itself to sockjs_session process and
passes socket's ownership to session. This greatly reduce latency.

For websocket connections, essentially, sockjs_ws_handler duplicates
all of sockjs_session functions, but in much simplified manner due to
simplicity of WS connection handling.

AD

unread,
Oct 18, 2012, 4:03:51 PM10/18/12
to Gleb Peregud, Marek Majkowski, sockjs
Thanks, not familiar with "attaching" a sockjs_session process really means.

Gleb Peregud

unread,
Oct 18, 2012, 4:07:50 PM10/18/12
to AD, Marek Majkowski, sockjs
It's my own word, sorry for not explaining. Essentially I am doing it this way:
1) socket handler process (let's call it A) receives a request
2) if a request is going to long-poll, then:
3) A transfers ownership of socket to session process (let's call it B)
4) A issues a infinity-timeout gen_server call with socket and Req to B
5) A is sleeping
6) B handles the request till the end of long-poll including sending the data
7) after it A resumes and finalizes all usual cowboy handling stuff
and, usually, dies

AD

unread,
Oct 25, 2012, 5:25:15 PM10/25/12
to Gleb Peregud, Marek Majkowski, sockjs
Here is my thought, let me know if this would work. This is just for raw websockets for now since this seems to be handled in different places

1) add a new method in sockjs_session called  send_multi/2 which would accept 2 params (Data, ListOfConn).
2) first step of send_multi would call sockjs_util:encode_frame on Data just once
3) loop through ListofConn and pass Frame output to send method in sockjs_session
4) modify sockjs_ws_handler:reply to NOT encode frame and just return iolist_to_binary(Frame)

Thoughts ?

Gleb Peregud

unread,
Oct 27, 2012, 3:47:25 AM10/27/12
to AD, Marek Majkowski, sockjs
Sounds good in general. But there's no need to do iolist_to_binary in the end if you do properly prepare the frame to be sent. Also please note that non-raw-websockets protocol needs different encodings, hence PreparedFame will have multiple versions of encoded data in the end (I've used record with 3 binary fields)
Reply all
Reply to author
Forward
0 new messages