This is a thread for everyone who's having the "dead connection"
problem to log their details to help John K from Twitter diagnose
what could be causing the problem.
Here's my first post - others should follow (approximately) this
format - all times should be in UTC, if you need help converting them,
use this webpage: http://www.timezoneconverter.com/cgi-bin/tzc.tzc -
------
Account name: fennb
Time of death: 15:12:00 February 16, 2010 UTC
Symptoms: Absolutely nothing coming through stream - tcpdump reveals
NO packets, stream marked as ESTABLISHED in linux conntrack/kernel
------
Incidentally, from the other thread - all other users reported the
same error at exactly the same time, which means there was definitely
_something_ happening at this point.
Cheers/thanks,
Fenn.
--
Adam Green
CEO, Grazr Corp
ad...@alertrank.com
781-879-2960
Our sites:
http://alertrank.com
http://grazr.com
http://topstocktweets.com
------
Account name: fennb
Time of death: 17:25:00 February 18, 2010 in UTC
Symptoms: Dead connection - see logs below (in GMT+11:00):
--
Phirehose: Fri, 19 Feb 2010 04:29:51 +1100 Idle timeout: No statuses
received for > 300 seconds. Reconnecting.
Phirehose: Fri, 19 Feb 2010 04:29:51 +1100 Closing Phirehose
connection.
--
Note I'm using phirehose 0.2.3-alpha, which has the new reconnection
code in it (which does seem to work by the looks of it).
------
Anything else you need from us, John?
Cheers,
Fenn.
17:25:00 Thursday February 18, 2010 in UTC
> a...@alertrank.com
> 781-879-2960
>
> Our sites:http://alertrank.comhttp://grazr.comhttp://topstocktweets.com
I'll be switching to 0.2.3 tomorrow. What do I have to add to my code
to get the reconnect to work?
No code changes required. It comes with an idleReconnectTimeout of 300
seconds. You can override if necessary (though you should never go
lower than 120 seconds).
Keep in mind at this point that it actually specifically looks for
STATUSES this often, not just TCP activity (which I may change). This
means if you're monitoring a very quiet stream, you may get "false
positive" disconnects.
Once we've narrowed down on this problem a bit more with Twitter, I'll
probably tweak the code for the final release of 0.2.3 to have
separate timeouts for TCP activity, but the alpha version will get you
by fine whilst we're sorting that.
Cheers,
Fenn.
Account name: misja
Time of death: 17:25:00 GMT/UTC, February 18, 2010
2010-02-18T17:24:34+00:00 INFO (6): Consume rate: 1 status/sec (44
total), avg enqueueStatus(): 0.82ms, avg checkFilterPredicates():
0.02ms (11 total) over 60 seconds.
2010-02-18T17:29:47+00:00 INFO (6): Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
Best,
Misja
My log shows a reconnection by version 0.2.3 at 17:41 UTC after a 5
minute period with no tweets. That is one of my busiest times for
tweets, so I am assuming there were tweets available. I'd call that a
connection failure.
I'm upgrading to 2.3 now and I'll try to run this for a bit and report
back.
Jason
account: misja
time of death: 17:36:00, February 19, 2010
[18-Feb-2010 05:56:55] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[18-Feb-2010 17:29:13] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:06:21] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:17:19] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:22:54] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:28:29] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:36:11] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:05:44] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:15:45] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:22:07] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:30:42] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:36:18] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:46:35] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:52:10] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:05:38] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:10:59] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:24:12] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:29:47] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:35:22] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:40:57] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:53:04] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 05:17:19] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 05:36:59] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 05:48:20] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:05:24] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:26:45] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:32:20] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:42:44] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:55:07] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 07:05:07] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 07:29:55] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 17:41:15] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
There was nothing unusual in the logs across the cluster around this
time.
I'm assuming that after detecting the problem at 15:12, and
diagnosing, you manually disconnected and did not reconnect.
Our external monitor timed out on a connection to a different cluster
at 15:06, but the stream.twitter.com cluster showed no problems and a
steady reception of statuses through this period.
Moving on to other reports.
This timeout at 17:20 was the only error detected for the entire day.
In the server logs, I see twindicator log in at 17:34:31, which logs
out the previous twindicator connection which otherwise had no errors.
The old connection had 87864 seconds and 10886 messages.
Interesting.
Looking at the 20100219 report, I don't see any timeouts on any
cluster around this time. The server logs are also clean. This may be
a different situation than the others examined so far.
Mmm.
Mmm.
-John