best practice for dealing with streaming api connection close

789 views
Skip to first unread message

AJ

unread,
Jun 14, 2009, 6:31:00 PM6/14/09
to Twitter Development Talk
The streaming api is great, but it sometimes closes the connection for
whatever reason. my realtime system must figure out when to reconnect
automatically. the auto-reconnection can't blindly request a
connection whenever it is not connected, otherwise it will floor the
api and may cause the api to ban or refuse the user's request. it's
bad to bombard the api server with repeated connection requests.
Could the api team recommend some best practice for dealing with auto-
reconnection?

maybe certain error code or error message can indicate the cause of
dropping connection and wait time for next connection request. I just
a long list of exceptions from streaming api as a result of repeated
connection, and the different messages are:

twitter4j.TwitterException: Address already in use: connect
twitter4j.TwitterException: Authentication credentials were missing or
incorrect.
twitter4j.TwitterException: Connection refused: connect
twitter4j.TwitterException: No route to host: connect
twitter4j.TwitterException: Stream closed.
twitter4j.TwitterException: The request is understood, but it has been
refused. An accompanying error message will explain why.
twitter4j.TwitterException: connect timed out

How to prevent such situation of repeated connections requests?

thanks,
aj

John Kalucki

unread,
Jun 14, 2009, 11:14:49 PM6/14/09
to Twitter Development Talk
AJ,

If you had a valid connection and the connection drops, reconnect
immediately. This is encouraged!

If you attempt a connection and get a TCP or IP level error, back off
linearly, but cap the backoff to something fairly short. Perhaps start
at 20 milliseconds, double, and cap at 15 seconds. There's probably a
transitory network problem and it will probably clear up quickly.

If you get a HTTP error (4XX), backoff linearly, but cap the backoff
at something longer, perhaps start at 250 milliseconds, double, and
cap at 120 seconds. Whatever has caused the issue isn't going away
anytime soon. There's not much point in polling any faster and you are
just more likely to run afoul of some rate limit.

The service is fairly lenient. You aren't going to get banned for a
few dozen bungled connections here and there. But, if you do anything
in a while loop that also doesn't have a sleep, you'll eventually get
the hatchet for some small number of minutes. If you get the hatchet
repeatedly, you'll be cut off for an indeterminate period of time.

There are four main reasons to have your connection closed:
* Duplicate clients logins (earlier connections terminated)
* Hosebird server restarts (code deploys)
* Lagging connection getting thrown off (client too slow, or
insufficient bandwidth)
* General Twitter network maintenance (Load balancer restarts, network
reconfigurations, other very very rare events)

We plan to have enough spare capacity on the surviving servers to
absorb the load from server restarts. You must ensure that your client
is fast enough and that you have sufficient bandwidth and a stable
enough connection to consume your stream. I usually see connections
that survive for a few days before mysteriously being dropped. Just
reconnect in these cases.

-John Kalucki
Services, Twitter Inc.

AJ Chen

unread,
Jun 15, 2009, 12:06:16 AM6/15/09
to twitter-deve...@googlegroups.com
John, great information. thanks a lot. I'll put in a proper wait time before next re-connection. 
-aj
--
AJ Chen, PhD
Co-Chair, Semantic Web SIG, sdforum.org
Technical Architect, healthline.com
http://web2express.org
Palo Alto, CA

danielo

unread,
Jun 23, 2009, 6:49:49 PM6/23/09
to Twitter Development Talk
I had a similar question. I think you've mostly answered it, but I
want to be clear so as to avoid harassing the API.

I'm developing a client to connect to the streaming API (nothing fancy
at the moment; just spritzer), and of course, I'm bungling it up
regularly. I'll hack a bunch, try it, watch it break, shut it down,
and hack some more. Is there a practical limit at which point I should
apply the human throttle-back? Or is there no realistic human limit at
which I risk a ban from the streaming service? I imagine that if a 15-
second wait period is sufficient to avoid bad things, the more likely
1-to-2-minute wait between my attempts will be fine. I ask,
nonetheless, as my repeated requests will persist for the duration of
my work, whereas a running client would (hopefully) snag a valid
connection after some time and stop "spamming" at that point.

Thanks!

On Jun 14, 8:14 pm, John Kalucki <jkalu...@gmail.com> wrote:
> AJ,
>
> If you had a validconnectionand theconnectiondrops, reconnect
> immediately. This is encouraged!
>
> If you attempt aconnectionand get a TCP or IP level error, back off
> linearly, but cap the backoff to something fairly short. Perhaps start
> at 20 milliseconds, double, and cap at 15 seconds. There's probably a
> transitory network problem and it will probably clear up quickly.
>
> If you get a HTTP error (4XX), backoff linearly, but cap the backoff
> at something longer, perhaps start at 250 milliseconds, double, and
> cap at 120 seconds. Whatever has caused the issue isn't going away
> anytime soon. There's not much point in polling any faster and you are
> just more likely to run afoul of some rate limit.
>
> The service is fairly lenient. You aren't going to get banned for a
> few dozen bungled connections here and there. But, if you do anything
> in a while loop that also doesn't have a sleep, you'll eventually get
> the hatchet for some small number of minutes. If you get the hatchet
> repeatedly, you'll be cut off for an indeterminate period of time.
>
> There are four main reasons to have yourconnectionclosed:
> * Duplicate clients logins (earlier connections terminated)
> * Hosebird server restarts (code deploys)
> * Laggingconnectiongetting thrown off (client too slow, or
> insufficient bandwidth)
> * General Twitter network maintenance (Load balancer restarts, network
> reconfigurations, other very very rare events)
>
> We plan to have enough spare capacity on the surviving servers to
> absorb the load from server restarts. You must ensure that your client
> is fast enough and that you have sufficient bandwidth and a stable
> enoughconnectionto consume your stream. I usually see connections
> that survive for a few days before mysteriously being dropped. Just
> reconnect in these cases.
>
> -John Kalucki
> Services, Twitter Inc.
>
> On Jun 14, 3:31 pm, AJ <cano...@gmail.com> wrote:
>
> > Thestreamingapiis great, but it sometimes closes theconnectionfor
> > whatever reason. my realtime system must figure out when to reconnect
> > automatically.  the auto-reconnection can't blindly request a
> >connectionwhenever it is not connected, otherwise it will floor the
> >apiand may cause theapito ban or refuse the user's request. it's
> > bad to bombard theapiserver with repeatedconnectionrequests.
> > Could theapiteam recommend some best practice for dealing with auto-
> > reconnection?
>
> > maybe certain error code or error message can indicate the cause of
> > droppingconnectionand wait time for nextconnectionrequest. I just
> > a long list of exceptions fromstreamingapias a result of repeated
> >connection, and the different messages are:
>
> > twitter4j.TwitterException: Address already in use: connect
> > twitter4j.TwitterException: Authentication credentials were missing or
> > incorrect.
> > twitter4j.TwitterException:Connectionrefused: connect

AJ Chen

unread,
Jun 23, 2009, 8:34:28 PM6/23/09
to twitter-deve...@googlegroups.com
I use two levels of controls, which seems working smoothly.
1. when exception is thrown, check if it's the type that results from connection dropping, i.e. IOException or HTTP code=4xx or error message; reconnect only if this is true. there may be many other types of exceptions, but don't reconnect in those cases. Normally, I only notice a couple of disconnection a day.
2. set a max number of reconnection; when reaching the max, don't auto-reconnect, but requires a manual reconnect instead.  This way, if the api server goes wrong, you won't bombard the server.

-aj
--
AJ Chen, PhD
Co-Chair, Semantic Web SIG, sdforum.org

John Kalucki

unread,
Jun 23, 2009, 11:23:54 PM6/23/09
to Twitter Development Talk
Don't worry about getting banned while you are doing development. A
code/test/debug cycle isn't going to get you into any permanent
trouble, or any trouble at all. If you happen to get locked out, wait
a few minutes, and you'll be back in.

The thing to look for is automated continuous reconnects in the face
of 4XX errors. Always sleep and back-off in these cases. If your
password is wrong, or your parameter list is invalid, it's going to be
just as wrong 60 seconds from now.

-John Kalucki
Services, Twitter Inc.



> > > Could theapiteam recommend somebestpracticefor dealing with auto-
Reply all
Reply to author
Forward
0 new messages