Re: [cometd-users] Rebuilding/Shadowing of CometD server state

21 views
Skip to first unread message

Simone Bordet

unread,
May 20, 2013, 12:46:54 PM5/20/13
to cometd-users
Hi,

On Mon, May 20, 2013 at 6:23 PM, DL <dennis...@gmail.com> wrote:
> Hi,
>
> I am wondering if it is possible to shadow the state of a primary CometD
> server to a secondary server. The goal of this is to allow clients'
> long-polling connections to be redirected to the backup without any
> requirement to re-handshake or re-subscribe.
>
> I have not begun to implement a solution but I have identified that it is
> possible to save incoming client data on handshake and subscribe and have
> that data passed to the secondary server. However I have not seen in the
> docs where it may be possible to manipulate the secondary server based on
> that information. For example, an in coming handshake creates a
> ServerSession for the client, and subsequently when the subscription for
> ChannelA occurs the ServerSession is added to the ServerChannel's list of
> subscribers. My goal would be to re-create those operations on the secondary
> server.
>
> So I have a couple of questions:
>
> Am I missing a more obvious (and hopefully easier) way to achieve the goal?
> Is the state re-creation possible or will I miss vital data changes that
> cannot be replicated via the public API?
>
> Basically if someone could tell me if I am completely insane to try this
> before I invest too much time I would appreciate it.

As often happens, you're asking directions on one solution that you
figured out, but you don't state the problem.

Mirroring the BayeuxServer state on another server does not make much sense.
The only purpose of a ServerSession is to communicate back to the
remote client. There is no point in maintaining one active in serverA,
and one mirrored in serverB which is actually unconnected and does
nothing.

Connection and subscriptions states live and are mastered by the client.
It's trivial for a well written client to rehandshake and resubscribe
with the new server and recreate in a blip the state that you want to
mirror. Really.

Now, you say that the requirement is to avoid rehandshaking and
resubscribing, but you don't say why you have this requirement.
A rehandshake is your friend, not an enemy you want to avoid.

What you may want to share across servers is application state, not
CometD state.

Imagine a chess game. If you share across servers the chessboard
positions, the player's names, turn and times, you probably have
enough to reconstruct the game no matter what happens.
If the server of one player crashes, the player will re-handshake on
another server, resubscribe as appropriate, and the server logic will
see the user again and "oh, you're back into this game... hey you have
this chessboard position, you have 1:15 left, and it's your turn."

We are working on a solution for sharing application state called
"Oort Objects", lookup the mailing list archives about.
I am going to push a branch soon for people to review and experiment.

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.

DL

unread,
May 20, 2013, 1:05:31 PM5/20/13
to cometd...@googlegroups.com
Thanks for the quick reply Simone.

You're right that I failed to explain the rational behind not wanting to re-handshake and re-subscribe, so I will try to do that now.

I (we) are working on a very large scale system that will need high availability at every point. The HA solution for the state of the application (the chess-board) is solved and that would not be a problem. We also have a solution for allowing the client to transparently reconnect to the system without any user interference (including automatically re-subscribing to all previously subscribed channels).
We are also using a listener on the meta/subscribe channel to detect new subscriptions and send all current state out to the client (the whole board). In normal operation then the delta (chess moves) are all that are sent.

So in the case of the failure of CometD Server we can provide a transparent recovery. However it involves possibly many tens of thousands of clients, subscribed to up to a couple of thousand different channels, all requesting a full state update at roughly the same time. The is a lot of bandwidth... it could be handled by a lot of hardware but maybe we can be even more silent in the failover?

The idea would be that if the primary CometD server (or server-cluster) went down, the current connections to the server would return, the client would try again and the proxy would redirect them to the backup CometD server. This request could then be served because the backup server "knows" about each of the clients involved (it has appropriate ServerSessions etc...). That would mean that we would just need to start sending deltas without sending the initial state again (to everyone).

Anyway that is the reason why I started to look at this.

Thanks again
Dennis.

Simone Bordet

unread,
May 21, 2013, 11:57:31 AM5/21/13
to cometd-users
Hi,

On Mon, May 20, 2013 at 7:05 PM, DL <dennis...@gmail.com> wrote:
> Thanks for the quick reply Simone.
>
> You're right that I failed to explain the rational behind not wanting to
> re-handshake and re-subscribe, so I will try to do that now.
>
> I (we) are working on a very large scale system that will need high
> availability at every point. The HA solution for the state of the
> application (the chess-board) is solved and that would not be a problem. We
> also have a solution for allowing the client to transparently reconnect to
> the system without any user interference (including automatically
> re-subscribing to all previously subscribed channels).
> We are also using a listener on the meta/subscribe channel to detect new
> subscriptions and send all current state out to the client (the whole
> board). In normal operation then the delta (chess moves) are all that are
> sent.
>
> So in the case of the failure of CometD Server we can provide a transparent
> recovery. However it involves possibly many tens of thousands of clients,
> subscribed to up to a couple of thousand different channels, all requesting
> a full state update at roughly the same time. The is a lot of bandwidth...

That's true, but mirroring ServerSessions or other CometD state won't
help you here, at all.

> it could be handled by a lot of hardware but maybe we can be even more
> silent in the failover?

I don't understand this one ?

> The idea would be that if the primary CometD server (or server-cluster) went
> down, the current connections to the server would return, the client would
> try again and the proxy would redirect them to the backup CometD server.
> This request could then be served because the backup server "knows" about
> each of the clients involved (it has appropriate ServerSessions etc...).
> That would mean that we would just need to start sending deltas without
> sending the initial state again (to everyone).

I think you're again confusing CometD state with application state.
If you have a ServerSession on the backup server, you're not going to
save any bandwidth, if your application requires to send down a full
state to recover.

Note that whether you can send deltas or not depends on your
application, not on CometD.
Most probably you have to send the full state in case of chess games,
but probably you can avoid the full state in case of a map tracking
system, where you don't need to track every point but just the
approximate current location.

You are concentrating on the bandwidth problem, so I guess you have
measured it, what figures are we talking ?
Did you take in account gzip compression to clients ?

Note that you're concentrating on CometD, so I assume that you have
solved the problem of a big spike in TCP connection opening to the new
server ?
I have seen the Linux TCP stack collapsing under moderate pressure in
these cases, so if you have a solution for 10-50k connection spikes, I
am interested in the details.

Finally, if you're facing a catastrophic failure of a server, then
chances are that the recovery takes time... it takes time for the load
balancer to figure out that the node is really gone and move to the
other server, it takes time to establish new TCP connections, it takes
time to re-handshake, etc.
In all this, you can squeeze in some throttling (CometD allows you to
do this via advices) to give your server some room, but nevertheless
the clients will probably notice the hiccup.

I mean that are a bazillion solutions, but I honestly don't see
mirroring CometD state useful in a failover case.

DL

unread,
May 21, 2013, 12:29:12 PM5/21/13
to cometd...@googlegroups.com
Hi Simone,

So no we have not measured this. It is a requirement that has recently come up, and what I am hoping to do is to put a theoretical solution in place. In my opinion the requirement is very premature, but I have to come back with a possible plan of action. Let me start with what I have working now:
I have a Jetty Server and JS Clients. The server sends a representation of the changes made to an object in the last interval every interval (the "moves" made in the last 5 seconds say).
For HA we have a second instance which is inactive until the first drops. The client at the point will detect the connection loss and reconnects (to a proxy), and then re-subscribes to any channels that were open when the disconnect happened. The server detects the subscribe and sends the full state.

Instead of subscribing when the proxy re-directs the long-polling requests then I want the backup server to be able to continue sending. The backup server already has a copy of the deltas that must be sent for the last interval, but it obviously also needs to know who to send this to. What I want to achieve by mirroring the state of the Bayeux Server is that when a long-polling request comes into the secondary server from a client who originally was connected to the primary, I want that to be accepted and served without needing to handshake and re-subscribe.

The goal is to avoid sending the state again on the switch to the secondary. To do that the secondary server must know if a user had subscribed previously on the primary server. One way to do that would be to mirror the Bayeux Server state (another way would be to put more logic in the subscribe listener).

As you said there are other ways to achieve this goal but I was thinking that it would be a neat solution... basically I have a hot-standby replica of my CometD server.

Thanks again.

Simone Bordet

unread,
May 21, 2013, 12:52:52 PM5/21/13
to cometd-users
Hi,

On Tue, May 21, 2013 at 6:29 PM, DL <dennis...@gmail.com> wrote:
> Hi Simone,
>
> So no we have not measured this. It is a requirement that has recently come
> up, and what I am hoping to do is to put a theoretical solution in place. In
> my opinion the requirement is very premature, but I have to come back with a
> possible plan of action. Let me start with what I have working now:
> I have a Jetty Server and JS Clients. The server sends a representation of
> the changes made to an object in the last interval every interval (the
> "moves" made in the last 5 seconds say).
> For HA we have a second instance which is inactive until the first drops.
> The client at the point will detect the connection loss and reconnects (to a
> proxy), and then re-subscribes to any channels that were open when the
> disconnect happened. The server detects the subscribe and sends the full
> state.
>
> Instead of subscribing when the proxy re-directs the long-polling requests
> then I want the backup server to be able to continue sending. The backup
> server already has a copy of the deltas that must be sent for the last
> interval, but it obviously also needs to know who to send this to. What I
> want to achieve by mirroring the state of the Bayeux Server is that when a
> long-polling request comes into the secondary server from a client who
> originally was connected to the primary, I want that to be accepted and
> served without needing to handshake and re-subscribe.

Not possible.
CometD is based on the Bayeux protocol, and for new connections it is
required a handshake at the Bayeux level.

If you forget for a second about CometD, and think about HTTP, you're
pretty much asking that whenever a client sends a request to the first
server, and then this one crashes before sending the response, then
the client can open a TCP connection to the other server and get the
HTTP response from the second server. No HA system I know can do that.

> The goal is to avoid sending the state again on the switch to the secondary.

I think you're over worrying :)

Just to save the state exchange, you give each state a unique ID,
which you share among servers.
Upon rehandshake, the client sends the state ID, the server can figure
out what delta to send or if the client is already up-to-date.

> To do that the secondary server must know if a user had subscribed
> previously on the primary server. One way to do that would be to mirror the
> Bayeux Server state (another way would be to put more logic in the subscribe
> listener).
>
> As you said there are other ways to achieve this goal but I was thinking
> that it would be a neat solution... basically I have a hot-standby replica
> of my CometD server.

Nah :)

DL

unread,
May 22, 2013, 9:46:08 AM5/22/13
to cometd...@googlegroups.com

Thanks for your input Simone. I'll keep the user group posted on what I did to solve the problem (if I manage to!).

Reply all
Reply to author
Forward
0 new messages