Stray connections in streaming and websocket

697 views
Skip to first unread message

Shashwat Srivastava

unread,
Oct 2, 2012, 3:04:39 AM10/2/12
to soc...@googlegroups.com
Hi,

I am using sockjs-node in my app which currently has 300 - 500 users concurrently.

It works properly when polling transports are used. But if I switch to streaming transports or websockets, then stray connections keep on accumulating until file limit is reached. This more actively happens when websockets are used. The number of connections keep on linearly increasing with time and eventually server crashes. This doesn't happen when number of connections are low.

I have been trying to debug this. Since this doesn't happen while using polling transports, is this a problem with sockjs? How can I verify this?

Thanks!

Shashwat Srivastava

unread,
Oct 2, 2012, 3:45:05 AM10/2/12
to cc maco young, soc...@googlegroups.com
Hi,

Thank you so much for replying!

I have already tried this logic.

When I use streaming transport (with this logic), then I keep on
receiving these errors - TypeError: Cannot call method 'didClose' of
null

And, when I use websockets with the same logic, then if I use lsof
command I see a are lot of entries like "can't identify protocol",
even if the number of connections inside sockjs-node app is accurate.
And eventually the server crashes.

Thanks,
Shashwat

On Tue, Oct 2, 2012 at 12:56 PM, cc "maco" young <bangk...@gmail.com> wrote:
> forgive me if others are more knowledgeable.
>
> seems that you could
> - maintain an array of connections on 'open'
> ids[conn.id] = conn;
> - assuming each live connection is registered in Redis or somewhere
> - on a cron-like setTimeout verify the connection array against live
> connections
> - those not,
> console.log( 'still open: '+id )
> ids[id].close()
>
> very interested in following your results!
Message has been deleted

Shashwat Srivastava

unread,
Oct 2, 2012, 4:30:49 AM10/2/12
to cc maco young, soc...@googlegroups.com
On Tue, Oct 2, 2012 at 1:56 PM, cc "maco" young <bangk...@gmail.com> wrote:
> in the same line, another thing to look at is keep-alive in the headers.
> anything you can do to stop the sockets from dropping out -
> http://en.wikipedia.org/wiki/Keep-alive
>
>
> On Tue, Oct 2, 2012 at 3:24 PM, cc "maco" young <bangk...@gmail.com>
> wrote:
>>
>> https://github.com/sockjs/sockjs-node
>>
>> heartbeat_delay (milliseconds)
>> In order to keep proxies and load balancers from closing long running http
>> requests we need to pretend that the connecion is active and send a
>> heartbeat packet once in a while. This setting controlls how often this is
>> done. By default a heartbeat packet is sent every 25 seconds.
>>
>> from my memory, socket.io uses 10 seconds. if it works this does not
>> solve the problem, but would make it less severe.
>>
>> love sockjs much, but, from my my reading of the methods, wish it would
>> expose more of its internals so that problems like this were more
>> approachable.
>>
>>

Thank you so much! I will try this out and check out that link too. I
am using a load balancer too with the default settings -
https://github.com/sockjs/sockjs-node/blob/master/examples/haproxy.cfg,
although I use only one sockjs-node server at the moment. Could that
also be an issue? Although, I have tried this before using load
balancer.

Thanks!
Shashwat

>> On Tue, Oct 2, 2012 at 2:59 PM, Shashwat Srivastava <dar...@gmail.com>
>> wrote:
>>>
>>> On Tue, Oct 2, 2012 at 1:29 PM, Shashwat Srivastava <dar...@gmail.com>
>>> wrote:
>>> > On Tue, Oct 2, 2012 at 1:26 PM, cc "maco" young <bangk...@gmail.com>
>>> > wrote:
>>> >> so it is your opinion that the connection is held by the system and
>>> >> sockjs
>>> >> assumes that it has been closed?
>>> >>
>>> >> ne thing I have not found is, within sockjs, the ability to get a
>>> >> list of
>>> >> open sockets and their state. for problems like this would be most
>>> >> helpful.
>>> >>
>>> >> just curious, have you tried shortening the handshake to see if less
>>> >> sockets
>>> >> are dropped?
>>> >>
>>> >>
>>> >
>>> > Yes, it seems to be something like this.
>>> >
>>> > How do I configure this setting?
>>> >
>>> > And, when do you get the best results? Have you played around with this
>>> > option?
>>> >
>>> > Thanks,
>>> > Shashwat
>>> >
>>> >> On Tue, Oct 2, 2012 at 2:45 PM, Shashwat Srivastava

Marek Majkowski

unread,
Oct 2, 2012, 5:39:55 AM10/2/12
to dar...@gmail.com, soc...@googlegroups.com
Hi,

I'd love to help you, but I need more details.
1) does haproxy also run of file descriptors?
2) what version of sockjs
3) what version of node
4) what version of haproxy
5) Are you using SSL? if not, can you try using SSL?

6) Are you really sure that the file descriptors also accumulate when
using streaming transports? Or only websockets?
(the codebase for streaming and websockets is quite different,
if the problem does happen for both streaming and websockets,
it's more likely a networking / haproxy problem than bug in sockjs)

7) Can you reproduce the problem locally? Can you open a browser
and see that "can't identify protocol" descriptors accumulate?

8) Please, use some charting tool to draw number of file descriptors
used by node, so that you could preemptively see if tweaks will
actually work.

9) Also you could chart haproxy file descriptors. Can you share
your haproxy config and can you provide some haproxy logs?

10) Is restarting haproxy helping - do the descriptors get freed
when you restart haproxy?


For the record, this is our previous conversation:
https://groups.google.com/forum/#!msg/sockjs/26lov2_BjAU/zhsX0GZFy6sJ

Here you say only "websocket" is problematic, "streaming" seem to be fine,
is that still the case?

Marek Majkowski

unread,
Oct 2, 2012, 5:54:56 AM10/2/12
to dar...@gmail.com, soc...@googlegroups.com
Someone at stackoverflow suggests that "can't identify protocol" message
is printed when the socket is created but bind() and connect() aren't called
on it. That would make sense, but this is beyond SockJS library - SockJS
doesn't deal with raw sockets, we get those passed from http or websocket
libraries.

Do you use redis or something else on per-connection basis - maybe it's
not sockjs that leaks.

Additionally, can you verify if the problem still happens even if you're not
using haproxy (can you just expose node.js directly to the interent?).

Shashwat Srivastava

unread,
Oct 2, 2012, 10:07:07 AM10/2/12
to soc...@googlegroups.com, dar...@gmail.com
Hi Marek,

This is not specific to haproxy. It happens without haproxy as well.

1) Yes, the file descriptors of haproxy keep on increasing with time.
2) sockjs v - 0.3.1
3) node v - 0.6.18
4) HA-Proxy version 1.4.19
5) I am using both sockjs on both HTTP and HTTPS. nginx is being used to forward SSL traffic to haproxy. I have tried every combination i.e. HTTPS website with sockjs connection on SSL, HTTP website with sockjs connection without SSL and HTTP website with sockjs connection over SSL. This doesn't seem to make any difference.
6) Yes, they do but not to that extent as when compared to websocket. Actually, earlier I was using a timeout logic as explained in earlier in this post. So, extra streaming connections used to get closed but sockjs-node used to throw a lot of errors. Once I removed that timeout logic, it started behaving similar to websockets but file descriptors don't increase that fast. And, this used to happen without haproxy too. I have placed hproxy afterwards.
7) I tried a lot to produce a test case to replicate this but it seems to happen only when there are high concurrent connections.
8) Could you please provide me some sample links? I have never tried this.
9) haproxy config is exactly similar as that given in sample socksj example
10) Yes, it does help. But I cannot do this again and again. It disconnects the clients from the app, although there is re-connect logic, but still clients notice that it is re-connecting.

Yes, in the earlier link I had posted this, as I was using timeout logic to forcefully close connections, it was working with streaming transport.

Thanks,
Shashwat

Shashwat Srivastava

unread,
Oct 2, 2012, 10:41:23 AM10/2/12
to soc...@googlegroups.com, dar...@gmail.com
Yes, I do use redis for storing app related information but the redis-client used for this is global and not created and destroyed per connection.

Also, it seems to work fine when polling transport is used.

I have observed that if I use websockets (and number of concurrent connections is high), then file descriptors keep on increasing at a rapid pace. If I use custom timeout logic to forcefully close connections, then I start getting "can't identify protocol" messages. This even happens in the sockjs-node dev version when you automatic heartbeats on client and server side to detect if a connection is broken.

When streaming transport is used, the stray connections do not increase that fast but still they build up over time. You don't get any "can't identify protocol" message if you forcefully close connections by using a timeout logic but the server starts reporting following error - TypeError: Cannot call method 'didClose' of null


Thanks,
Shashwat

Marek Majkowski

unread,
Oct 2, 2012, 10:45:05 AM10/2/12
to dar...@gmail.com, soc...@googlegroups.com
On Tue, Oct 2, 2012 at 3:41 PM, Shashwat Srivastava <dar...@gmail.com> wrote:
> When streaming transport is used, the stray connections do not increase that
> fast but still they build up over time. You don't get any "can't identify
> protocol" message if you forcefully close connections by using a timeout
> logic but the server starts reporting following error - TypeError: Cannot
> call method 'didClose' of null

Please paste full traceback.

Shashwat Srivastava

unread,
Oct 2, 2012, 1:29:09 PM10/2/12
to Shripad K, soc...@googlegroups.com
Shripad,

Thanks for your reply. Yes, I am aware of this. I was earlier using
stunnel only, when websockets was enabled.

Later when I disabled websocket, I started using ngnix.

Thanks,
Shashwat

On Tue, Oct 2, 2012 at 10:36 PM, Shripad K <assortme...@gmail.com> wrote:
>
>
> On Tue, Oct 2, 2012 at 7:37 PM, Shashwat Srivastava <dar...@gmail.com>
> wrote:
>>
>> Hi Marek,
>>
>> This is not specific to haproxy. It happens without haproxy as well.
>>
>> 1) Yes, the file descriptors of haproxy keep on increasing with time.
>> 2) sockjs v - 0.3.1
>> 3) node v - 0.6.18
>> 4) HA-Proxy version 1.4.19
>> 5) I am using both sockjs on both HTTP and HTTPS. nginx is being used to
>> forward SSL traffic to haproxy. I have tried every combination i.e. HTTPS
>> website with sockjs connection on SSL, HTTP website with sockjs connection
>> without SSL and HTTP website with sockjs connection over SSL. This doesn't
>> seem to make any difference.
>
>
> Nginx won't work with WebSockets (current version). Maybe thats the reason
> why websocket fails. Try Stud instead.

Shashwat Srivastava

unread,
Oct 4, 2012, 2:31:31 PM10/4/12
to soc...@googlegroups.com, dar...@gmail.com
Hi Marek,

Finally, managed to get a trace:

TypeError: Cannot call method 'didClose' of null
    at Session.close (../node_modules/sockjs/lib/transport.js:226:19)
    at SockJSConnection.close (../node_modules/sockjs/lib/transport.js:58:28)
    at SockJSConnection.end (../node_modules/sockjs/lib/transport.js:53:12)
    at SockJSConnection.<anonymous> (../XXXX.js:YYYY:ZZ)
    at SockJSConnection.emit (events.js:67:17)
    at Session.didMessage (../node_modules/sockjs/lib/transport.js:207:25)
    at App.xhr_send (../node_modules/sockjs/lib/trans-xhr.js:81:15)
    at ../node_modules/sockjs/lib/webjs.js:21:37
    at ../node_modules/sockjs/lib/webjs.js:95:18
    at IncomingMessage.<anonymous> (../node_modules/sockjs/lib/webjs.js:272:16)

Please let me know what you think.

Thanks,
Shashwat

Shashwat Srivastava

unread,
Oct 6, 2012, 12:07:48 PM10/6/12
to soc...@googlegroups.com, dar...@gmail.com
Hi Marek,

Another error stack:

Exception on "POST /chat/048/zhsj82kx/xhr_send" in filter "xhr_send":
TypeError: Cannot call method 'didClose' of null
    at Session.close (../node_modules/sockjs/lib/transport.js:226:19)
    at SockJSConnection.close (../node_modules/sockjs/lib/transport.js:58:28)
    at SockJSConnection.end (../node_modules/sockjs/lib/transport.js:53:12)
    at SockJSConnection.sockjs_server.on.socket.on.custom_disconnect (../XXXX.js:YYYY:ZZ)
    at SockJSConnection.EventEmitter.emit (events.js:93:17)
    at Session.didMessage (../node_modules/sockjs/lib/transport.js:207:25)
    at App.exports.app.xhr_send (../node_modules/sockjs/lib/trans-xhr.js:81:15)
    at execute_request (../node_modules/sockjs/lib/webjs.js:21:38)
    at exports.generateHandler.req.next_filter (../node_modules/sockjs/lib/webjs.js:95:18)
    at IncomingMessage.exports.GenericApp.GenericApp.expect_xhr.status (../node_modules/sockjs/lib/webjs.js:272:16)

Is this helpful? Please let me know your opinion. 

Thanks,
Shashwat

Marek Majkowski

unread,
Oct 8, 2012, 9:55:34 AM10/8/12
to dar...@gmail.com, soc...@googlegroups.com
On Sat, Oct 6, 2012 at 5:07 PM, Shashwat Srivastava <dar...@gmail.com> wrote:
> Hi Marek,
>
> Another error stack:

I replied to the traceback in this bug:
https://github.com/sockjs/sockjs-node/issues/91

It is not related to the socket problem.

Few comments:

6) maybe there is a problem with your disconnection logic? what
happens if you disable it?

8) cacti / nagios / mrtg / rrd / graphite, there are tons of charting tools,
they all require a significant investment in configuration but are
invaluable tool for ops.

9) can you provide haproxy logs?

Additionally, please send me a result of `lsof -p` for both node.js
app and haproxy
(I'm assuming haproxy speaks directly to node.js), your haproxy.cfg config, and
`netstat -p` results for both haproxy and node.js

As a final note, this what lsof FAQ says:

10.2.2 Why does /proc-based lsof report "can't identify protocol" for
some socket files?

/proc-based lsof may report:

COMMAND PID ... TYPE ... NODE NAME
pump 226 ... sock ... 309 can't identify protocol

This means that it can't identify the protocol (i.e., the
AF_* designation) being used by the open socket file. Lsof
identifies protocols by matching the node number associated
with the /proc/<PID>/fd entry to the node numbers found in
selected files of the /proc/net sub-directory. Currently
/proc-based lsof examines these protocol files:

/proc/net/ax25 (untested)
/proc/net/ipx (needs kernel patch)
/proc/net/raw
/proc/net/raw6
/proc/net/tcp
/proc/net/tcp6
/proc/net/udp
/proc/net/udp6
/proc/net/unix

If /proc-based lsof says it can't identify the protocol
for an open socket file, you may be able to identify the
protocol yourself by using grep to look for the specific
node number in the files of /proc/net -- e.g.,

$ grep <node_number> /proc/net/*

You may not be able to find the desired node number, because
not all kernel protocol modules fully support /proc/net
information.

If you find a matching node number in a /proc/net file that is
not currently being processed by lsof, contact me via e-mail at
<a...@purdue.edu>. I'll discuss adding support to /proc-based
lsof for the protocol of the /proc/net file with you. Make
sure "lsof" appears in the "Subject:" line so my e-mail filter
won't classify your letter as Spam.

The code that matches node numbers of open IPX protocol
socket files to those in /proc/net/ipx requires Jonathan
Sergent's Linux 2.1.79 patch to /usr/src/linux/net/ipx/af_ipx.c.
The patch, suitable for input to Larry Wall's patch program,
may be found in the lsof distribution file:

Shashwat Srivastava

unread,
Oct 8, 2012, 10:24:29 AM10/8/12
to soc...@googlegroups.com, dar...@gmail.com
Marek,

Regrading point 6, when I disable custom disconnect logic, the number of concurrent connections starts increasing linearly. Earlier I thought maybe it takes sometime to disconnect on its own, so if number of concurrent connection is high, there may be always a few connections which are getting cleaned up and are about to close. But this doesn't seem to stop, it keeps on increasing linearly.

Please let me know what you think.

Thanks,
Shashwat 

Marek Majkowski

unread,
Oct 8, 2012, 11:12:21 AM10/8/12
to dar...@gmail.com, soc...@googlegroups.com
On Mon, Oct 8, 2012 at 3:24 PM, Shashwat Srivastava <dar...@gmail.com> wrote:
> Marek,
>
> Regrading point 6, when I disable custom disconnect logic, the number of
> concurrent connections starts increasing linearly. Earlier I thought maybe
> it takes sometime to disconnect on its own, so if number of concurrent
> connection is high, there may be always a few connections which are getting
> cleaned up and are about to close. But this doesn't seem to stop, it keeps
> on increasing linearly.

Yes, without heartbeats (ie: sockjs default) it is quite likely that when users
network breaks, TCP/IP will keep connection around for a good while.
That's how tcp/ip works. This may give an illusion that the number of
connections is growing. (with a constant turnover of connections the
number will eventually flatten (ie: number of connections closed due
to a tcp/ip timeout will be equal to number of new connections)).

Those connections should be quite easy to identify - just
use netstat and look for connections with growing send buffer size.

It does not explain your "can't identify protocol" problem.

Marek

Shashwat Srivastava

unread,
Oct 8, 2012, 12:42:27 PM10/8/12
to soc...@googlegroups.com, dar...@gmail.com
Marek, 

Thank you for your response.

I don't face "can't identify protocol" issue in streaming. It comes up only with websockets.

In streaming:

if I use forceful close, then I get didClose errors

and if I don't, then the connections increase, they go over double the actual number of connections. Is this normal? Can I control this somehow? Or it doesn't matter (and I should just increase my file descriptor limit)?

Thanks,
Shashwat

Marek Majkowski

unread,
Oct 8, 2012, 12:51:47 PM10/8/12
to dar...@gmail.com, soc...@googlegroups.com
On Mon, Oct 8, 2012 at 5:42 PM, Shashwat Srivastava <dar...@gmail.com> wrote:
> Thank you for your response.
>
> I don't face "can't identify protocol" issue in streaming. It comes up only
> with websockets.
>
> In streaming:
>
> if I use forceful close, then I get didClose errors
>
> and if I don't, then the connections increase, they go over double the
> actual number of connections. Is this normal? Can I control this somehow? Or
> it doesn't matter (and I should just increase my file descriptor limit)?

Let's hope the fix I proposed solves this particular issue. If you have
any other sockjs-related crashes please open issues on github.

AFAICT this particular traceback is generally quite unlikely to trigger,
unless you:

a) reduced the limit of data for streaming connections
(ie: `response_limit` option)
b) send loads of data just before closing the connection.
ie: do write() just before close()

Again, I can't tell you more without looking at your logs,
configuration, and maybe code.

Shashwat Srivastava

unread,
Oct 8, 2012, 1:07:08 PM10/8/12
to soc...@googlegroups.com, dar...@gmail.com
Marek, 

Thank you so much! Case b indeed happens with me! Sometimes users just navigate away from the webpage, just after opening it (and this happens a lot). The app initializes itself with some user specific data and I guess the connection is closed in b/w. 

I have a query. When the client navigates away from the webpage, does sockjs-client inform sockjs-node that the client is about to go away?

Also, how much data would be considered too much in this case?

Thanks,
Shashwat

Marek Majkowski

unread,
Oct 8, 2012, 1:12:07 PM10/8/12
to dar...@gmail.com, soc...@googlegroups.com
On Mon, Oct 8, 2012 at 6:07 PM, Shashwat Srivastava <dar...@gmail.com> wrote:
> Thank you so much! Case b indeed happens with me! Sometimes users just
> navigate away from the webpage, just after opening it (and this happens a
> lot). The app initializes itself with some user specific data and I guess
> the connection is closed in b/w.
>
> I have a query. When the client navigates away from the webpage, does
> sockjs-client inform sockjs-node that the client is about to go away?

That's where SockJS differs from native websockets. In SockJS most
disconnects will be abrupt (user navigates away from the site), so no,
the connection just dies. You don't have a chance to send last will
on the server side.

> Also, how much data would be considered too much in this case?

More than response_limit which is 128KiB.

But generally, sending data and disconnecting immediately makes little sense.

Disconnects in SockJS aren't synchronous, so your data might get lost.
If you really need to send stuff and disconnect, consider doing it
synchonously, ie:
server sends data
client acknowledges it's receiving
server disconnects

The two last points of course may be: client disconnects.

Shashwat Srivastava

unread,
Oct 8, 2012, 1:21:04 PM10/8/12
to soc...@googlegroups.com, dar...@gmail.com
Thanks again!


On Monday, October 8, 2012 10:42:09 PM UTC+5:30, majek wrote:
On Mon, Oct 8, 2012 at 6:07 PM, Shashwat Srivastava <dar...@gmail.com> wrote:
> Thank you so much! Case b indeed happens with me! Sometimes users just
> navigate away from the webpage, just after opening it (and this happens a
> lot). The app initializes itself with some user specific data and I guess
> the connection is closed in b/w.
>
> I have a query. When the client navigates away from the webpage, does
> sockjs-client inform sockjs-node that the client is about to go away?

That's where SockJS differs from native websockets. In SockJS most
disconnects will be abrupt (user navigates away from the site), so no,
the connection just dies. You don't have a chance to send last will
on the server side.

> Also, how much data would be considered too much in this case?

More than response_limit which is 128KiB.

But generally, sending data and disconnecting immediately makes little sense.

But you don't have any control over user navigating away immediately.
 

Disconnects in SockJS aren't synchronous, so your data might get lost.
If you really need to send stuff and disconnect, consider doing it
synchonously, ie:
 server sends data
 client acknowledges it's receiving
 server disconnects

I see. Thank you for this explanation.

Shashwat Srivastava

unread,
Oct 8, 2012, 3:47:20 PM10/8/12
to soc...@googlegroups.com, dar...@gmail.com
Update: I have applied the fix and enabled streaming transports (with custom disconnect logic). It seems to work fine now :)

Thanks,
Shashwat

Marek Majkowski

unread,
Oct 8, 2012, 5:33:31 PM10/8/12
to dar...@gmail.com, soc...@googlegroups.com
I'm glad to hear that.

Shout if you encounter further issues.

Marek

Marek Majkowski

unread,
Oct 8, 2012, 8:16:58 PM10/8/12
to cc maco young, SockJS
On Tue, Oct 9, 2012 at 1:14 AM, cc "maco" young <bangk...@gmail.com> wrote:
> would it be possible to use a window close event to close the socket from
> the client side? this might help with users who go to the page and the
> right back out.

No, when user closes browser it's too late for any code in javascript to run.

Marek

Shashwat Srivastava

unread,
Oct 9, 2012, 11:29:05 PM10/9/12
to Michael nietzold, soc...@googlegroups.com
The fix has been explained at this link -
https://github.com/sockjs/sockjs-node/issues/91

Thanks,
Shashwat

On Wed, Oct 10, 2012 at 4:51 AM, Michael nietzold <niet...@gmail.com> wrote:
> Can you please explain which fix you applied?
>
> Von meinem iDingens gesendet...
Reply all
Reply to author
Forward
0 new messages