Ping timeouts on Websockets

900 views
Skip to first unread message

Andrew Newdigate

unread,
Jun 3, 2014, 8:39:13 AM6/3/14
to faye-...@googlegroups.com

I just wanted to put this out there to get some feedback from others on this mailing list:

On my fork of Faye, I've put a ping response timeout on the websocket transport, meaning that after the transport pings the server by sending an empty envelope message ("[]"), I start a timeout waiting for a response. 

If it doesn't receive a message via the socket within the timeout period, I close the socket and create a new one.

This seems to have led to a big improvement in handling dodgy websockets. We see a fair few of these dodgy sockets, particularly over bad networks. The easiest way for me to recreate a dodgy socket (in Chrome at least) is by establishing a websocket, the switching Wifi networks to one going through a different ADSL network, or switching over to a 3G tethering. After doing this, the Chrome websocket becomes really useless. I doesn't send, but also won't error. 

I'm not sure whether this problem is particular to my setup, as we're passing out websockets through a AWS ELB (in TCP mode) but whatever causes it, it causes us a lot of headaches.

Anyone else seeing this problem / dealing with it in the same / a different way?

Thanks
Andrew


James Coglan

unread,
Jun 3, 2014, 2:26:10 PM6/3/14
to faye-...@googlegroups.com
On 3 June 2014 13:39, Andrew Newdigate <and...@gitter.im> wrote:
On my fork of Faye, I've put a ping response timeout on the websocket transport, meaning that after the transport pings the server by sending an empty envelope message ("[]"), I start a timeout waiting for a response. 

If it doesn't receive a message via the socket within the timeout period, I close the socket and create a new one.

Anyone else seeing this problem / dealing with it in the same / a different way?

This would not be able to be merged into core because it's non-standard. Faye sends these "[]" messages as a way to make sure there's data going over the socket, to keep proxy connections open, knowing they will be ignored by other Bayeux clients and servers that will just see an empty message list and do nothing.

The Bayeux reference implementation does not do this. Faye's client and server can send these messages but must not rely on them ever being returned.

There might be another solution. I recently pushed https://github.com/faye/faye/commit/9d132d6eec70a2038794cf55052578ac915e2104, which makes is so that if the client detects a message timeout, it tells the transport to abort the request. This was designed to make sure that connections for normal HTTP requests don't hang open after being blocked by a firewall, but it could be extended to WebSocket, telling the WebSocket connection to close and reconnect if we detect that messages are timing out.

Have you explored that option?

Andrew Newdigate

unread,
Jun 3, 2014, 4:29:34 PM6/3/14
to faye-...@googlegroups.com
On 3 Jun 2014, at 19:26, James Coglan <jco...@gmail.com> wrote:

The Bayeux reference implementation does not do this. Faye's client and server can send these messages but must not rely on them ever being returned.

That makes sense. 

There might be another solution. I recently pushed https://github.com/faye/faye/commit/9d132d6eec70a2038794cf55052578ac915e2104, which makes is so that if the client detects a message timeout, it tells the transport to abort the request. This was designed to make sure that connections for normal HTTP requests don't hang open after being blocked by a firewall, but it could be extended to WebSocket, telling the WebSocket connection to close and reconnect if we detect that messages are timing out.
Have you explored that option?

I saw that commit and was wondering if something similar could be done for Websockets, but I haven’t tried anything myself yet. I’ll take a look at this when I get a chance but it may not be this week or next, judging by my backlog.

My only other attempt at a ping is outside the Faye client in my application code, using publishes from the client. The problem with this route is that the ‘ping’ is too far from the underlying transport. By the time I’ve timed-out, the transport may have already given up and be in the process of establishing a new connection. It doesn’t work well.

Thanks for the help as always,
a

signature.asc
Reply all
Reply to author
Forward
0 new messages