The dead connection details thread

13 views
Skip to first unread message

Fenn

unread,
Feb 17, 2010, 7:50:39 PM2/17/10
to Phirehose Users
Hi All,

This is a thread for everyone who's having the "dead connection"
problem to log their details to help John K from Twitter diagnose
what could be causing the problem.

Here's my first post - others should follow (approximately) this
format - all times should be in UTC, if you need help converting them,
use this webpage: http://www.timezoneconverter.com/cgi-bin/tzc.tzc -
------
Account name: fennb
Time of death: 15:12:00 February 16, 2010 UTC
Symptoms: Absolutely nothing coming through stream - tcpdump reveals
NO packets, stream marked as ESTABLISHED in linux conntrack/kernel
------

Incidentally, from the other thread - all other users reported the
same error at exactly the same time, which means there was definitely
_something_ happening at this point.

Cheers/thanks,

Fenn.

Adam Green

unread,
Feb 18, 2010, 12:42:44 PM2/18/10
to phireho...@googlegroups.com
Connection failure
Account name: twindicator
Time of death: 17:25:00 February 18, 2010 UTC
Symptoms: No tweets received, process still running
Version of Phirehose: 0.2.2

--
Adam Green
CEO, Grazr Corp
ad...@alertrank.com
781-879-2960

Our sites:
http://alertrank.com
http://grazr.com
http://topstocktweets.com

Fenn

unread,
Feb 18, 2010, 7:25:01 PM2/18/10
to Phirehose Users
I can confirm Adam's dead connection - I experienced the same thing at
EXACTLY the same time:

------
Account name: fennb
Time of death: 17:25:00 February 18, 2010 in UTC
Symptoms: Dead connection - see logs below (in GMT+11:00):
--
Phirehose: Fri, 19 Feb 2010 04:29:51 +1100 Idle timeout: No statuses
received for > 300 seconds. Reconnecting.
Phirehose: Fri, 19 Feb 2010 04:29:51 +1100 Closing Phirehose
connection.
--
Note I'm using phirehose 0.2.3-alpha, which has the new reconnection
code in it (which does seem to work by the looks of it).
------

Anything else you need from us, John?

Cheers,

Fenn.


17:25:00 Thursday February 18, 2010 in UTC

> a...@alertrank.com
> 781-879-2960
>
> Our sites:http://alertrank.comhttp://grazr.comhttp://topstocktweets.com

Adam Green

unread,
Feb 18, 2010, 7:34:52 PM2/18/10
to phireho...@googlegroups.com
Fenn:

I'll be switching to 0.2.3 tomorrow. What do I have to add to my code
to get the reconnect to work?

Fenn

unread,
Feb 18, 2010, 7:58:00 PM2/18/10
to Phirehose Users
Hi Adam,

No code changes required. It comes with an idleReconnectTimeout of 300
seconds. You can override if necessary (though you should never go
lower than 120 seconds).

Keep in mind at this point that it actually specifically looks for
STATUSES this often, not just TCP activity (which I may change). This
means if you're monitoring a very quiet stream, you may get "false
positive" disconnects.

Once we've narrowed down on this problem a bit more with Twitter, I'll
probably tweak the code for the final release of 0.2.3 to have
separate timeouts for TCP activity, but the alpha version will get you
by fine whilst we're sorting that.

Cheers,

Fenn.

Misja Hoebe

unread,
Feb 19, 2010, 2:22:15 AM2/19/10
to Phirehose Users
Dead connection at the same time here too :)

Account name: misja
Time of death: 17:25:00 GMT/UTC, February 18, 2010

2010-02-18T17:24:34+00:00 INFO (6): Consume rate: 1 status/sec (44
total), avg enqueueStatus(): 0.82ms, avg checkFilterPredicates():
0.02ms (11 total) over 60 seconds.
2010-02-18T17:29:47+00:00 INFO (6): Idle timeout: No statuses received
for > 300 seconds. Reconnecting.

Best,

Misja

Adam Green

unread,
Feb 19, 2010, 1:35:59 PM2/19/10
to phireho...@googlegroups.com
account: twindicator
Time of death: 17:41:00 February 19, 2010 in UTC

My log shows a reconnection by version 0.2.3 at 17:41 UTC after a 5
minute period with no tweets. That is one of my busiest times for
tweets, so I am assuming there were tweets available. I'd call that a
connection failure.

Jason Striegel

unread,
Feb 19, 2010, 2:06:39 PM2/19/10
to Phirehose Users
Thanks Fenn!

I'm upgrading to 2.3 now and I'll try to run this for a bit and report
back.

Jason

Misja Hoebe

unread,
Feb 19, 2010, 2:50:20 PM2/19/10
to Phirehose Users
Yup, same here, a reconnect at 17.41 UTC

account: misja
time of death: 17:36:00, February 19, 2010

Sergi

unread,
Feb 19, 2010, 4:26:50 PM2/19/10
to Phirehose Users
Been running two queries, one with 0.2.2 and the other with 0.2.3
since Thu Feb 18 00:43:06 UTC 2010 and with 0.2.2 no problems
whatsoever, but with 0.2.3 I've been having a lot of "No statuses
received for > 300 seconds" (also one at 17:41 UTC). Times in UTC:

[18-Feb-2010 05:56:55] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[18-Feb-2010 17:29:13] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:06:21] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:17:19] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:22:54] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:28:29] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 02:36:11] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:05:44] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:15:45] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:22:07] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:30:42] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:36:18] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:46:35] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 03:52:10] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:05:38] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:10:59] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:24:12] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:29:47] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:35:22] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:40:57] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 04:53:04] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 05:17:19] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 05:36:59] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 05:48:20] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:05:24] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:26:45] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:32:20] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:42:44] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 06:55:07] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 07:05:07] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 07:29:55] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.
[19-Feb-2010 17:41:15] Phirehose: Idle timeout: No statuses received
for > 300 seconds. Reconnecting.

John Kalucki

unread,
Feb 20, 2010, 11:14:31 AM2/20/10
to Phirehose Users
OK. This could get interesting. Looking at the server logs, I can see
fennb logging in at 20100215-01:28:16.757 and then logging out at
20100216-15:25:31.996 after 136635 seconds and 235483 messages.

There was nothing unusual in the logs across the cluster around this
time.

I'm assuming that after detecting the problem at 15:12, and
diagnosing, you manually disconnected and did not reconnect.

Our external monitor timed out on a connection to a different cluster
at 15:06, but the stream.twitter.com cluster showed no problems and a
steady reception of statuses through this period.

Moving on to other reports.

John Kalucki

unread,
Feb 20, 2010, 11:25:49 AM2/20/10
to Phirehose Users
Interesting. Once again, a few minutes before, at
20100218-17:20:00.121, our external monitor timed out a connection to
a different cluster -- but detected no problems on stream.twitter.com
at or around this time.

This timeout at 17:20 was the only error detected for the entire day.

In the server logs, I see twindicator log in at 17:34:31, which logs
out the previous twindicator connection which otherwise had no errors.
The old connection had 87864 seconds and 10886 messages.

Interesting.

John Kalucki

unread,
Feb 20, 2010, 11:30:39 AM2/20/10
to Phirehose Users
Nothing unusual in the server logs for this fennb connection. I see a
reconnect at 20100218-17:29:52.606, which logs out the older
connection.

Looking at the 20100219 report, I don't see any timeouts on any
cluster around this time. The server logs are also clean. This may be
a different situation than the others examined so far.

Mmm.

John Kalucki

unread,
Feb 20, 2010, 11:33:50 AM2/20/10
to Phirehose Users
Nothing unusual in the server logs here.

Mmm.

John Kalucki

unread,
Feb 20, 2010, 11:35:15 AM2/20/10
to Phirehose Users
With no account information, I can't look into this. Perhaps the query
is malformed?

-John

Reply all
Reply to author
Forward
0 new messages