How to detect a malfunctioning TCP/IP connection ?

R.Wieser

unread,

Feb 13, 2012, 10:50:22 AM2/13/12

to

Hello All,

I've made a program (a client of sorts) which creates a TCP/IP connection to
a server, and keeps it open for hours. Normally this works alright, but
sometimes something happens and the connection goes dead.

The problem is that at my side the socket seems to keeps running, there is
just not coming any data thru it anymore, not even the heartbeat-signal.

When I terminate the program and try to re-connect the WS2#2.DLL "connect"
command errors-out after a while, indicating the other side is really at
fault.

In my code I'm using the WSAASyncSelect with the FD_READ and FD_CLOSE flags
set.

My question: is there something more I can add to that to get a "connection
dropped" event (or alike) ?

If not, can I create a hearbeat signal myself *without* actually sending
data to the program at the other side (something captured and echo-ed back
by the TCP/IP stack would be fine though).

Regards,
Rudy Wieser

Jeroen Mostert

unread,

Feb 13, 2012, 1:14:17 PM2/13/12

to

On 2012-02-13 16:50, R.Wieser wrote:
> I've made a program (a client of sorts) which creates a TCP/IP connection to
> a server, and keeps it open for hours. Normally this works alright, but
> sometimes something happens and the connection goes dead.
>
> The problem is that at my side the socket seems to keeps running, there is
> just not coming any data thru it anymore, not even the heartbeat-signal.
>

What's "the heartbeat-signal"? The only thing TCP/IP has akin to a heartbeat
signal is TCP keepalives -- by default this is not enabled, and even if you
do, the default interval is two hours, and even if you take that into
account, the only thing it does is send out 0-byte packets outside the
window (or a packet containing a single NUL byte, I forget which one it is
on Windows) that may or may not be processed correctly by intermediate
firewalls and the server itself. In other words, it's not as useful as you
might like.

> My question: is there something more I can add to that to get a "connection
> dropped" event (or alike) ?
>
> If not, can I create a hearbeat signal myself *without* actually sending
> data to the program at the other side (something captured and echo-ed back
> by the TCP/IP stack would be fine though).
>

In short: no. It's a feature of TCP/IP that connections can survive
interruptions. If you rather need guarantees about the availability of the
connection, your protocol must account for this, and most do this by either
an explicit keepalive package format or an invalid request that must be
responded to.

A crude alternative is to assume that any connection that hasn't seen any
activity for a period of time is no longer reliable, and rebuild it. For
this alternative, the "period of time" ought to be longer than the time-wait
delay (4 minutes by default), otherwise you can run out of sockets if the
other side remains unresponsive and you keep creating connections.

--
J.

R.Wieser

unread,

Feb 14, 2012, 12:05:30 PM2/14/12

to

Hello Jeroen,

> What's "the heartbeat-signal"?

> most do this by either an explicit keepalive package format

Exactly that.

> The only thing TCP/IP has akin to a heartbeat signal is TCP

> keepalives ... that may or may not be processed correctly

> by intermediate firewalls and the server itself.

And that thunks my idea to do something exactly like that ...

> A crude alternative is to assume that any connection that
> hasn't seen any activity for a period of time is no longer
> reliable, and rebuild it.

That was/is pretty-much my current idea, yes. Is there a non-crude/better
method available ?

Thanks,
Rudy Wieser

-- Origional message:
Jeroen Mostert <jmos...@xs4all.nl> schreef in berichtnieuws
4f3952fb$0$6936$e4fe...@news2.news.xs4all.nl...

Jeroen Mostert

unread,

Feb 14, 2012, 1:29:12 PM2/14/12

to

On 2012-02-14 18:05, R.Wieser wrote:
>> What's "the heartbeat-signal"?
>
>> most do this by either an explicit keepalive package format
>
> Exactly that.
>
>> The only thing TCP/IP has akin to a heartbeat signal is TCP
>> keepalives ... that may or may not be processed correctly
>> by intermediate firewalls and the server itself.
>
> And that thunks my idea to do something exactly like that ...
>

Keepalive packets are possible victims of network logic because of their
peculiar structure (out-of-order packet with either a garbage octet or no
data at all). A keepalive package within a protocol (i.e., actual data) is
no more or less reliable than other communication. The best way of verifying
the other side can accept your data within a reasonable time is to send it
-- if you have no real data, then fake it.

>> A crude alternative is to assume that any connection that
>> hasn't seen any activity for a period of time is no longer
>> reliable, and rebuild it.
>
> That was/is pretty-much my current idea, yes. Is there a non-crude/better
> method available ?
>

If you're sending packets, but getting back neither a response nor a
transmission timeout then either the other side is still there but not
capable of processing requests, or a proxy or firewall is eating your
requests but pretending everything's fine -- or your own code is wrong and
not handling errors properly (I mention this because I've worked with plenty
of communication libraries that had overzealous error handling that silently
swallowed problems on a lower level unless you cranked up the logging to
extra super extremely verbose).

Any of these cases may or may not be resolved by reconnecting -- there is no
general solution if you don't know the server architecture. If your protocol
demands a responsive connection and you don't have one, really the only
thing you can do is bail out and start over.

--
J.

R.Wieser

unread,

Feb 15, 2012, 4:01:28 AM2/15/12

to

Hello Jeroen,

> If you're sending packets,

Nope. Currently I'm just receiving packets and responding to them when
required.

I was thinking of taking a more active role and maybe send my own
heartbeat-packets, but first wanted to make sure that there was no other,
more standard way available.

> or your own code is wrong and not handling errors properly

:-) As I mentioned, I'm using the WASASyncSelect call with FD_READ and
FD_CLOSE flags as its arguments. In the callback function I check for both
and error-out when neither is found. As such I don't think I'm suppressing
any errors.

By the way: I'm using a rather low-level language, which does not do any
error-handling by itself: Assembly.

> If your protocol demands a responsive connection and you
> don't have one, really the only thing you can do is bail out
> and start over.

Currently I'm just focussing on how to detect a non-responsive connection.
How to handle it (automatically reconnect, just alerting the user or
something else) is of a later concern.

Regards,

Rudy Wieser

-- Origional message:
Jeroen Mostert <jmos...@xs4all.nl> schreef in berichtnieuws

4f3aa7fd$0$6876$e4fe...@news2.news.xs4all.nl...

Jeroen Mostert

unread,

Feb 16, 2012, 12:11:23 AM2/16/12

to

On 2012-02-15 10:01, R.Wieser wrote:
>> If you're sending packets,
>
> Nope. Currently I'm just receiving packets and responding to them when
> required.
>

Receivers can detect a closed connection (if the other side gracefully
closes) or a reset connection (the other side slams the connection shut by
sending a RST packet out of the blue). Any other failure, like complete
network outage or the other side simply not handling data anymore (the
lights are on but there's nobody home) can't be detected, at least not on
the TCP/IP level. This is considered a feature.

> I was thinking of taking a more active role and maybe send my own
> heartbeat-packets, but first wanted to make sure that there was no other,
> more standard way available.
>

No -- sending data at minimum regular intervals basically is the standard
way. I'm assuming you can't modify the server.

--
J.

R.Wieser

unread,

Feb 16, 2012, 5:44:48 AM2/16/12

to

Hello Jeroen,

> Any other failure, like complete network outage or the
>other side simply not handling data anymore (the lights
> are on but there's nobody home) can't be detected, at
> least not on the TCP/IP level. This is considered a feature.

Thanks for the heads-up. That means that a sending a heartbeat myself is
what I should (try to) do.

> I'm assuming you can't modify the server.

Correct. My program acts as a client to an already existing server-program
(which I have no control over)

Thanks for the info/help

Rudy Wieser

-- Origional message:
Jeroen Mostert <jmos...@xs4all.nl> schreef in berichtnieuws

4f3c9003$0$6928$e4fe...@news2.news.xs4all.nl...