Forwarding this question asked on GitHub because I thought it might be useful to others:
---------- Forwarded message ----------
From: manast
Date: 27 May 2011 09:13
Subject: [faye] How does it scale relative amount of channels? (#62)
I just wonder how well does Faye scale relative amount of channels. I suspect the system is optimized for few channels and many users subscribed to them. But what if we have many users and every user is subscribed to its own private channel? Would Faye handle this scenario efficiently?
And my reply...
As always, it depends what you mean by 'efficiently'. I'll describe how Faye stores data so you can get a better idea of what's going on. This is based on the 0.6 release, which has a very different internal model to previously releases and makes certain operations like unsubscription faster.
The Faye engine needs to store several pieces of information:
1. A set of all active client IDs
2. For each channel, a set of client IDs subscribed to it
3. For each client, a set of channels it is subscribe to (i.e. reverse index for 2)
A set in Ruby and JavaScript is just implemented as a hashtable, so this data ends up looking like this (I will use JSON to represent it):
engine = {
"namespace": {
"foo": true,
"bar": true,
"qux": true
},
"channels": {
"/some/channel": {
"foo": true,
"qux": true
},
"/some/other/channel": {
"foo": true
}
},
"clients": {
"foo": {
"/some/channel": true,
"/some/other/channel": true
},
"bar": {
},
"qux": {
"/some/channel": true
}
}
}
The 'namespace' is the set of all active client IDs. A map with values on the left and 'true' on the right is just how we represent a set of the values on the left, since the table makes sure the keys are unique and can be efficiently deleted. 'channels' maps channel names to sets of client IDs, and 'clients' does the reverse mapping from client IDs to sets of channels.
So, subscribing a client to an existing channel, e.g. subscribing "bar" to "/some/channel", means adding "bar":true to the "/some/channel" set and "/some/channel":true to the "bar" set.
Creating a new channel is a little more expensive since we allocate a new object to store the client IDs for the channel. In the case where there is one channel per client, we simply have lots of sets with one member rather than one set with many members. How much this is a problem with depend on the overhead of allocating and object, since we still need to store the client IDs no matter what.
For example having one channel with many users looks like this:
"channels": {
"/some/channel": {
"foo": true,
"qux": true
}
}
Whereas having many channels with one user looks like this:
"channels": {
"/channels/foo": {
"foo": true
},
"/channels/qux": {
"qux": true
}
}
In the 'clients' section, each subscription means adding one member to a client's set, and this would be true no matter what the channels were called. The size of this collection depends only on the number of subscriptions, not on the number of channels.
The important thing about all these structures is that it is ideally O(1) to add and remove any element from them since all operations are just adding and removing from sets. This is true on the Redis engine as well, where the sadd and srem operations are O(1). Prior to 0.6 I used a more complex storage system that I think was O(N) for unsubscribing, but this is now O(1).
As to whether this is 'efficient', all you should really care about is whether it's efficient *enough*. You may find that object allocation in JavaScript and hash allocation in Ruby have different performance and GC characteristics, and you should load test with realistic application data to get a better idea of whether Faye suits your needs.
I hope this has shed some light for you to start to answer your question.