How to make a high quality chat server?

1,522 views
Skip to first unread message

jason.桂林

unread,
May 6, 2012, 7:04:30 AM5/6/12
to nod...@googlegroups.com
I just join hackthon party, our team made a very cool chat web application in 24 hours.

But I know, it is a demo, It use socket.io, redis, I think it is a little expensive on session. and it can't communicate with processes it make it cluster.

What nodejs could be use to? frontend server? core internal server?

Some body said ZMQ is very fast message queue, is it help with this case?



--
Best regards,

桂林 (Gui Lin)

guileen@twitter

Arunoda Susiripala

unread,
May 6, 2012, 7:13:26 AM5/6/12
to nod...@googlegroups.com
You can use Socket.IO with multiprocess.
BTW, you should have look on your choice for that. There are reason for using every solution.

ZMQ is quite fast. If you need to programme it NodeJS would not be ideal.
FB used Earlang, But It so hard to learn erlang (for me)

There may be lot of other factors, list what you wan't to achieve and resources you have. (skills + money + servers etc..)
Then look for the tools you might need.

NodeJS is always the use case for IO. I hope you can lot out of with NodeJS as the backend for your app. You cannot compare ZMQ with Socket.IO .

You can accept request from Socket.IO and push it to ZMQ or Redis (as a message queue)
From the other end listen for the queue and send the message using Socket.IO



--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en



--
Arunoda Susiripala


Roly Fentanes

unread,
May 6, 2012, 11:01:17 AM5/6/12
to nod...@googlegroups.com

jason.桂林

unread,
May 6, 2012, 11:26:00 AM5/6/12
to nod...@googlegroups.com
Thanks Roly, it's very useful for single machine app.

I have a real app question. If we have millions of online user, how to computer system capacity, and how to design a architecture to fit the capacity?



2012/5/6 Roly Fentanes <rol...@gmail.com>
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Micheil Smith

unread,
May 6, 2012, 11:55:25 AM5/6/12
to nod...@googlegroups.com
If you have millions of users on line, I think you'll be facing other problems than just
Socket.io, some old-ish benchmarks showed socket.io maxing out at around 5-20K
concurrents in a single process, other websocket servers performed differently. If
you're serious about scaling realtime infrastructure, then you should probably have
a look at talks from Keeping It Realtime Conference (http://2011.krtconf.com/), as well
as looking into Autobahn Test Suite benchmarks.

Things to be cautious of:

- You'll need a way to do load balancing (Traditional load balancers tend to fail
pretty hard with WebSockets or persistent Connections)

- I would NOT recommend using redis or any other centralised message bus, this
is by far the easiest way to do scaling across multiple servers, however, it's also
the easiest way to shoot yourself in the foot if the message bus goes down
(process crash, server network isolation, etc).

- I would recommend looking into using more servers with lower load versus fewer
servers with higher load; This will enable you to scale much better in short bursts.
(experience tells me that generally you'll find that your application or service will
have peaks and troughs in usage, generally these match up well if the three main
timezone blocks (US, GMT, and East Asian / Oceanic)

Those points aside, getting above 100K concurrent users tends to be incredibly hard,
some of the largest apps around that I've seen have only just been pushing 250K (we're
talking like big service providers that have 500K -> 2M users, I can't name them due
to legal reasons).

As for storage of data, you will most likely need both realtime communication between
servers as well as some sort of key/value store for things like presence information and
authentication tokens. For the storage of data, I would actually recommend redis, it tends
to scale out really well for master / slave type stuff. As for message communication, I'm
beginning to think that pull-based may be better than push based, so something like
Apache Kafka (not that I've had personal experience with it.)

You will most likely want to also define a transport protocol on top of your connection,
dependent on your type of application, there aren't many resources on doing this, but
if you want help with that, give me a shout, I've done a lot of research into that area over
the last two years.

Alternatively, you could look at third party services for scaling your realtime architecture.
At present, given the information I have on various services, I would be inclined to
recommend PubNub (http://pubnub.com), they appear to have a very high quality setup.
(disclaimer, I did work for a competitor in the past, but that does not bias my choice,
another option is Pusher (http://pusher.com), or for more, you can look here:
http://www.leggetter.co.uk/real-time-web-technologies-guide )

Hopefully this gives some useful information or things to think about. Scaling realtime
architecture is kind of hard (not impossible, but can be a pain in the ass).

Regards,
Micheil Smith
--
BrandedCode.com

jason.桂林

unread,
May 6, 2012, 10:52:15 PM5/6/12
to nod...@googlegroups.com, mic...@brandedcode.com
Thanks Micheil, what you said is very professional, do you have a twitter or G+ account, I want follow you, heh.

1. What you said pull base rather than pull base, looks like a new  thinking, but I can understand why you said this, I have thought lots about push base message broadcast, very complex. Maybe pull base will be very simple also beautiful solution.  

2. You said transport protocol, I'd like to use msgpack as protocal, but I need help on the protocol, because msgpack is not compress on string, I am also afraid there are some security problem.

3. " I would recommend looking into using more servers with lower load versus fewer servers with higher load;  " I'd like talk more about this, 

we have to use more servers for scaling, but more servers means more complex, unlike other web applications, realtime service need communication  between servers, we have 1M users dispatched on 1K servers, 1K user on each server, 1 user send a message in a room, this message will send to others users. In worst case the server for sender and server for reciever on cover all 1K server, so this message will send to all 1K server.

if 100 user(10%) on each 1K servers send worst message, each server will recieve 100K messages in same time, it's horrible.

How to prevent this happen?


2012/5/6 Micheil Smith <mic...@brandedcode.com>

Pitt Mak

unread,
May 7, 2012, 12:22:12 AM5/7/12
to nod...@googlegroups.com
I knew you from NodeClub in china, cool topic.

Jason.桂林(Gui Lin)於 2012年5月6日星期日UTC+8下午7時04分30秒寫道:

Micheil Smith

unread,
May 8, 2012, 8:36:33 PM5/8/12
to jason.桂林, nod...@googlegroups.com
No worries, I am on twitter and github as "miksago".

1., Doing things pull based is a possibly new way of thinking about realtime communication,
I haven't yet seen it proven, but I think it makes sense, means that if a server starts getting
overloaded, it'd be able to throttle incoming load and not kill the rest of the servers in you
cluster (situation: broadcast messages).

2., I don't think msgpack is a protocol (in the sense of the word I was meaning), internally,
I would be using a more structured data format, such as Protobuf, which has a fairly strict
declaration and parser of data. Msgpack is more akin to JSON, in that it's just a data format,
not a data protocol, it's way you use it that makes it a protocol.

The protocols I was talking of were WebSocket Sub-Protocols, and pretty specific to your
application or domain.

3., I would be going with a max of 25-75K concurrrents per server in that case, which
would mean 16 to 40 server processes. (Most likely you'd have that 16 segmented as 4
servers * 4 processes, assuming 4 cores). Essentially, you want to make the load not
incredibly high on a single server, it's better to scale out horizontally a little bit more
than you need, and then use the high watermark on the servers as being "burst capacity".

That said, I would be surprised if anyone is really reach close to 500K concurrrents on a
single application (that's a number I'd expect from a service provider of realtime).

As for dealing with more servers, that's where something like Apache Kafka comes in,
however, I'm still uncertain as to using kafka. You could also go the route of mesh
networking with ZMQ, which does work fairly well, but the setup and development of
it is more complex. So, every server would talk to every other server.

You don't want to be using broadcast messages if possible. As in, if you go the pull
based setup, then each server would have a mailbox per channel on your chat
system, and servers would pull from only the servers and mailboxes that they are
interested in. Just like if you go the route of central brokers (not that i recommend
that), then you can structure you queues and their key spaces into segments
representing say something like "chats:{CHAT ID}", or perhaps even
"{PID / SERVER ID}:chats:{CHAT ID}", this would mean that servers would listen
on only a subset of messages, and wouldn't get all the messages in the system.

(hopefully that last part makes sense, I've a bit crammed for time to write it).

– Micheil

Christian Etuy

unread,
May 9, 2012, 4:59:28 AM5/9/12
to nod...@googlegroups.com
Perhaps you should take a look on "nowjs" ("NowJS makes realtime web apps really easy by making the server and client act like one program")
if you look how they build their solution you would get ideas for you ?
  http://nowjs.com/

chris

codepilot Account

unread,
May 9, 2012, 3:18:48 PM5/9/12
to nod...@googlegroups.com
When you get that big, 100k's of users I bet this article will apply.

http://news.cnet.com/8301-1009_3-57428067-83/fbi-we-need-wiretap-ready-web-sites-now/ 

What's harder, 1M users talking to each other, or logging and transmitting those chats to a third party?
Reply all
Reply to author
Forward
0 new messages