"Failing fast" with TCP disconnects with TCPConn.Write() on Linux

smith.wi...@gmail.com

unread,

May 21, 2014, 8:59:23 PM5/21/14

to golan...@googlegroups.com

Trying to interface to a legacy TCP server where the protocol has no "ack" or "ping". I have two goroutines, a reader and a writer. The writer waits on a []byte channel and writes them to the TCP conn, the reader Read()s from the TCP conn and processes the received data.

It works great until the connection goes down (unplug the cable, reboot a router etc). I have enabled TCP_KEEPALIVE which does in fact detect the disconnect on the Read() side, although it takes a while (I can tune this using the tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes sysctl settings (or SOL_TCP socket options).

The client's Write() goroutine is the part I'm having trouble with. I found that once the connection goes down, if I try to Write() a []byte to the TCPConn, it wasn't detecting the error (presumably the kernel is buffering the packets). I tried calling SetWriteDeadline(30s) and that solved the issue on the Mac (although it takes 2x the interval to detect it). However, on Linux, I'm just not detecting any errors on the Write() side, even with the SetWriteDeadline().

I'm assuming it's because Go is able to write the data to the socket successfully and the kernel is buffering the packet, so with a bit of Googling, I found some code on golang-nuts to invoke a custom setsockopt() on a TCPConn and I'm attempting to set the TCP_USER_TIMEOUT socket option (which has been around for a while):

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c75e7e545694a9dd6288553f55c53e2a3a3

Here's the code I use to set the option:

// #include <linux/tcp.h>

import "C"

...

func net_set_timeout(conn *net.TCPConn, timeout time.Duration) error {

// We need to use the File object to get at the fd

f, err := conn.File(); if err != nil {

return err

}

defer f.Close()

// Convert to an integer/milliseconds

secs := int(timeout.Nanoseconds() / 1e6)

fd := int(f.Fd())

return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(fd, syscall.SOL_TCP, C.TCP_USER_TIMEOUT, secs))

}

Still doesn't detect the error. I *think* that it's probably because the TPCConn.Write() has already completed and now my writer goroutine is back waiting on the output channel. Again, eventually, the TCP_KEEPALIVE timer fires and reader side generates an error.

I did notice that SO_SNDTIMEO isn't exposed as an option, although I suspect it'll have the same problem as the TCP_USER_TIMEOUT option. The other thought is to have another goroutine that polls the socket with getsockopt(SO_ERROR) and then calls Close() on the TCPConn (which in turn brings down the reader & writer goroutines).

I really don't want to rely on the SO_KEEPALIVE option for this (and mess with the sysctl values for this), that's there to keep the connection alive and fresh in router tables. I really want to "fail fast" when there's a connectivity issue when I try to Write(), at this point I restart the connection (and associated goroutines) and carry on.

What's the correct "Go" way to handle this?

Thanks!

-W.

Mikio Hara

unread,

May 22, 2014, 2:21:19 AM5/22/14

to smith.wi...@gmail.com, golang-nuts

On Thu, May 22, 2014 at 9:59 AM, <smith.wi...@gmail.com> wrote:

> It works great until the connection goes down (unplug the cable, reboot a
> router etc).

you need a mechanism that handles both on-link and off-link failures
if you really want to detect any failures along the tcp path on ip
routing environment. as you know ip routers isolate and hide some
failures from you (to avoid bothering you with useless, weird and
complicated information).

> I really don't want to rely on the SO_KEEPALIVE option for this (and mess
> with the sysctl values for this), that's there to keep the connection alive
> and fresh in router tables.

for now using tcp keep-alive is the best way to detect a dead tcp peer
connection unless you write your own connection maintenance protocol
on top of the tcp. fwiw i have a tiny package you may be interest in.
it implements a bit more tcp-level socket options:
http://godoc.org/github.com/mikioh/tcp. hope this helps.

James Bardin

unread,

May 22, 2014, 9:39:32 AM5/22/14

to golan...@googlegroups.com, smith.wi...@gmail.com

Since the recv side of a tcp connection is usually where you first detect it closing, can you just Close the connection there, which will cause the writer goroutine to exit?

If not, do you have a sample of the read/write loops that you're having trouble with?

smith.wi...@gmail.com

unread,

May 22, 2014, 10:50:41 AM5/22/14

to golan...@googlegroups.com, smith.wi...@gmail.com

On Thursday, May 22, 2014 9:39:32 AM UTC-4, James Bardin wrote:

Since the recv side of a tcp connection is usually where you first detect it closing, can you just Close the connection there, which will cause the writer goroutine to exit?

That's what happens now, however it happens via the TCP_KEEPALIVE mechanism which on Linux is at least TCP_KEEPIDLE + (TCP_KEEPCNT * TCP_KEEPINTVL) seconds. By default this is quite large (KEEPIDLE is 7200s, or 2hrs, KEEPCNT is 9 and KEEPINTVL is 75s). Golang lets me alter this [KEEPIDLE] via the setKeepAlivePeriod() call, so if I set it down to 120s, I'm still looking at a delay in detecting the failure of 120+(9*75) == 795s == 13m20s.

In many cases, using KEEPALIVE and the Read() side to detect the broken connection is undesirable, often the connection will "come back". Unfortunately, the application protocol isn't well designed, so I don't know when to "expect" a message from the server (and I don't have the ability to change it). The only way I can detect the failure is by trying to "fail fast" on the Write() side.

If not, do you have a sample of the read/write loops that you're having trouble with?

The code is very simple, the write loop looks like this (below), it runs as a "supervised goroutine". That is, the caller uses recover() to capture any panics and that triggers it to Close() the TCPConn (which causes both the read and writer goroutines to exit) and it then sits in a retry loop until it can successfully connect at which point it restarts the reader/writer goroutines.

// Push outbound packets to the server

loop:

for {

// Wait for a packet ...

select {

case pkt := <- pkts:

if timeout >= 0 {

// Set a write timeout

deadline := time.Now().Add(time.Duration(timeout) * time.Second)

err := conn.SetWriteDeadline(deadline); if err != nil {

// This packet will be lost if not put back into the queue!!!

panic(err)

}

n, err := conn.Write(pkt.data[:pkt.size]); if err != nil {

panic(err)

}

if n != pkt.size {

panic(fmt.Sprintf("Failed to write to the network, %d vs %d", n, pkt.size))

}

case done = <- ctrl:

// Shutdown signal

break loop

}

smith.wi...@gmail.com

unread,

May 22, 2014, 10:59:56 AM5/22/14

to golan...@googlegroups.com, smith.wi...@gmail.com

On Thursday, May 22, 2014 2:21:19 AM UTC-4, Mikio Hara wrote:

On Thu, May 22, 2014 at 9:59 AM, <smith.wi...@gmail.com> wrote:

> It works great until the connection goes down (unplug the cable, reboot a
> router etc).

you need a mechanism that handles both on-link and off-link failures
if you really want to detect any failures along the tcp path on ip
routing environment. as you know ip routers isolate and hide some
failures from you (to avoid bothering you with useless, weird and
complicated information).

Right. I have found that setting the SO_KEEPALIVE timers too small results in getting random disconnects, it's better to keep it large.

> I really don't want to rely on the SO_KEEPALIVE option for this (and mess
> with the sysctl values for this), that's there to keep the connection alive
> and fresh in router tables.

for now using tcp keep-alive is the best way to detect a dead tcp peer
connection unless you write your own connection maintenance protocol
on top of the tcp.

The SO_KEEPALIVE option means that we'll regularly be sending the empty SYN packets instead of leaving the connection idle. The real reason for this is that I have found that many consumer grade firewall/NAT type routers tend to have small forwarding tables and can often "age out" idle connections. Enabling SO_KEEPALIVE is *preventing* dead connections from happening.

However, it's not good at *detecting* dead connections. I actually don't care if a connection dies if the application is idle, however when there's work to do, I need to quickly detect the dead connection and then reconnect. I was hoping that a combination of SetWriteDeadline() + Write() on a dead connection would give me an error if the TCP stack couldn't write the data by the deadline.

You are right, the best solution is to either add a protocol level heartbeat, or at least have the protocol ACK the messages being sent out, but unfortunately, I don't have the ability to alter the [existing] protocol to support this.

fwiw i have a tiny package you may be interest in.
it implements a bit more tcp-level socket options:
http://godoc.org/github.com/mikioh/tcp. hope this helps.

Thanks, I'll take a look!

James Bardin

unread,

May 23, 2014, 11:13:27 AM5/23/14

to golan...@googlegroups.com, smith.wi...@gmail.com

On Thursday, May 22, 2014 10:50:41 AM UTC-4, smith.wi...@gmail.com wrote:

In many cases, using KEEPALIVE and the Read() side to detect the broken connection is undesirable, often the connection will "come back". Unfortunately, the application protocol isn't well designed, so I don't know when to "expect" a message from the server (and I don't have the ability to change it). The only way I can detect the failure is by trying to "fail fast" on the Write() side.

OK, I see what you mean, but as you see, failing-fast on a tcp send is near impossible. TCP_USER_TIMEOUT is the only way I could image coming close, but I'm not sure why it's not having any effect. If you have time, maybe try doing it in C and see if it's problem with Go specifically. I don't think checking for SO_ERROR will change anything , because that should cause the Read to return anyway, which you already handle.

I'm still curious as to why the SetWriteDeadline acts differently on Linux.

Does you app have any sort of response to the data sent? Is there some way to could start a timer from when you send data, and abort the connection if nothing is received before the timer fires?

smith.wi...@gmail.com

unread,

May 23, 2014, 12:00:56 PM5/23/14

to golan...@googlegroups.com, smith.wi...@gmail.com

On Friday, May 23, 2014 11:13:27 AM UTC-4, James Bardin wrote:

OK, I see what you mean, but as you see, failing-fast on a tcp send is near impossible. TCP_USER_TIMEOUT is the only way I could image coming close, but I'm not sure why it's not having any effect. If you have time, maybe try doing it in C and see if it's problem with Go specifically. I don't think checking for SO_ERROR will change anything , because that should cause the Read to return anyway, which you already handle.

I'm still curious as to why the SetWriteDeadline acts differently on Linux.

I think it's because the underlying send()/write() must be completing immediately, but the kernel is queuing the data -- I can see in netstat that the Send-Q shows the outstanding bytes. As for TCP_USER_TIMEOUT, the spec says it is the "maximum amount of time in ms that *transmitted* data may remain unacknowledged". So I think because the data hasn't been *transmitted* yet, this timer hasn't been started.

I also tried to set the SO_SNDBUF to something small to force the kernel to either send the data, or block on the send()/write() (thus causing the write deadline to expire), but the minimum buffer size is 2048 bytes which is way bigger than my messages.

Does you app have any sort of response to the data sent? Is there some way to could start a timer from when you send data, and abort the connection if nothing is received before the timer fires?

Unfortunately, there sometimes is a response, but not always. I didn't design the protocol and I can't change it!!!

However, I have solved (or really worked around) my particular issue. Basically, in normal operation, the data is sent immediately; netstat's Send-Q shows 0 bytes outstanding on the connection. In the case when the kernel hasn't/can't send the data, I see that netstat shows outstanding bytes for the connection in the Send-Q field. This is quite useful, especially as it turns out that the Send-Q size is easily accessible via the SIOCOUTQ ioctl(). With the magic of CGO and a few lines of code, I fetch this in Go as follows:

var outqsize int = 0

// Query the output queue size: SIOCOUTQ (Linux only)

if _, _, errno := syscall.Syscall(syscall.SYS_IOCTL, fd, C.SIOCOUTQ, uintptr(unsafe.Pointer(&outqsize))); errno != 0 {

return os.NewSyscallError("ioctl(SIOCOUTQ)", syscall.Errno(errno)), -1

}

Where `fd` comes from TCPConn.File().Fd(). In my writer loop, I have modified my select {} such that in addition to waiting on the outbound packet channel I now have a 30s timeout with a time.After(). When this timer fires, I go and check ioctl(SIOCOUTQ) for the TCPConn and if it shows a Q size of >0, that means I've had data stuck in the Send-Q for *at least* 30s. At this point, I call panic() and my "supervised goroutine" exits, recover()s and causes the connection to be torn down, reconnected and this in turn starts up new reader and writer goroutines.

I'm also still setting the TCP_USER_TIMEOUT socket opt, so if the kernel *is* able to transmit the data (meaning Send-Q is now 0), then presumably this timer will now start and if there's no ACK within 30000ms it'll cause an error. If the writer is not actively writing data to the network, (i.e. it's in the select{}/time.After cycle and NOT in a Write() call), then I won't immediately see the error, so in the SIOCOUTQ check, I'm also grabbing SO_ERROR to look for any asynchronous errors that might result from the TCP_USER_TIMEOUT. Having said that, if the TCP_USER_TIMEOUT fires, the reader should also fail out of it's blocking Read() and I'll detect it that way also.

Finally, if I get a flood of outbound messages on the channel and the TCP buffer ends up filling up, then at some point one of the calls to Write() will block and the WriteDeadline should then fire.

So I *think* I have the bases all covered; in testing, with the SIOCOUTQ check along with TCP_USER_TIMEOUT and the write deadline I am now "failing fast" within my 30s timeout.