tcp i/o timeout issue

6,234 views
Skip to first unread message

jono...@gmail.com

unread,
Oct 16, 2013, 1:03:31 PM10/16/13
to golan...@googlegroups.com
Hello Friendly Gophers,

We have an issue where we are getting this error for a number of different services:

read tcp [ip]:[port]: i/o timeout

It seems to happen to many of our services independent of where or what it is connecting to (cassandra, memcache, zookeeper, etc)  so I am thinking it is an OS level issue and not a code specific issue.  We are running on Ubuntu boxes in Amazon, specifically m1.small's if you are familiar.  These are fairly underpowered machines but they provide enough for our purposes (or, in light of this issue, perhaps not).  

Originally we thought it might be a file descriptor issue but we don't have that many open connections (netstat says ~1000).  

What could cause these stray i/o timeouts in the net package?  Has anyone else seen this using the "net/http" package?

Thanks
Jono

James Bardin

unread,
Oct 16, 2013, 2:10:30 PM10/16/13
to golan...@googlegroups.com, jono...@gmail.com


On Wednesday, October 16, 2013 1:03:31 PM UTC-4, jono...@gmail.com wrote:


What could cause these stray i/o timeouts in the net package?  Has anyone else seen this using the "net/http" package?


Are you filtering anything with iptables on the servers?

This is usually what would happen if the remote server disappears (i.e. network down). You send some data, and wait for a response that will never come until the tcp window times out. Short of network partitions, this could be from an improperly configured firewall; losing track of connection state (you can only track so many connections at once on a very busy server, or the conntrack state times out) and silently dropping the rest. It's been a while since I've looked at their config , but Redhat (and CentOS) based systems used to have problems like this occasionally, as they relied on conntrack state for everything, only accepted incoming connections flagged as NEW or ESTABLISHED, while silently dropping "bad" packets.


jona...@hailocab.com

unread,
Oct 17, 2013, 10:17:14 AM10/17/13
to golan...@googlegroups.com, jono...@gmail.com
We have been able to reproduce this locally on a Mac where all the services are on one machine so it is not an OS specific issue nor is it network error.  I have set the timeout to be 60s (in gossie) and I am still seeing the error so it is way beyond the deadline.  Is there a chance that connections are being reused in the net package without the timeout resetting? 

The libraries we are seeing this in are:
Gossie (https://github.com/carloscm/gossie)

Any more thoughts...?

James Bardin

unread,
Oct 17, 2013, 11:48:52 AM10/17/13
to golan...@googlegroups.com, jono...@gmail.com, jona...@hailocab.com


On Thursday, October 17, 2013 10:17:14 AM UTC-4, jona...@hailocab.com wrote:
We have been able to reproduce this locally on a Mac where all the services are on one machine so it is not an OS specific issue nor is it network error.  I have set the timeout to be 60s (in gossie) and I am still seeing the error so it is way beyond the deadline.  Is there a chance that connections are being reused in the net package without the timeout resetting? 

The libraries we are seeing this in are:
Gossie (https://github.com/carloscm/gossie)

Any more thoughts...?


Are you only able to reproduce this with gossie?
I ask because that gossie library has been abandoned, and the thrift library it's using may be a dead fork as well. 

jona...@hailocab.com

unread,
Oct 17, 2013, 11:52:32 AM10/17/13
to golan...@googlegroups.com, jono...@gmail.com, jona...@hailocab.com
No, it happens on memcache and go-zookeeper as well.

It is unfortunate that gossie is now abandoned. We are not really sure what to do about that one.  But that is another issue.  

Interestingly, we don't see this in the nsq library and it is similarly setting the timeout (using conn.SetDeadline()).

D

unread,
Apr 2, 2015, 3:17:47 AM4/2/15
to golan...@googlegroups.com, jono...@gmail.com, jona...@hailocab.com
Hi,

Any update on this? I am also getting the same issue with Memcache (https://github.com/bradfitz/gomemcache)

Thanks,

Brad Fitzpatrick

unread,
Apr 2, 2015, 4:13:33 AM4/2/15
to D, golang-nuts, jono...@gmail.com, jona...@hailocab.com
D,

Your bug report lacks critical information: Go version, operating system, operating system version, etc. And relevant code / repro would be nice too.



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Debraj Manna

unread,
Apr 2, 2015, 5:28:54 AM4/2/15
to Brad Fitzpatrick, golang-nuts, jono...@gmail.com, jona...@hailocab.com
Go Version go1.4.1 linux/amd64
OS - Linux XXX 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux

The error message I am seeing:-
read tcp 127.0.0.1:11290: i/o timeout

This problem I am seeing most of the time with GetMulti() call and when memcache is under load.

I can not share the exact code now. I will try to get a toy repro code of the problem soon.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages