How to reuse TCP listening socket immediately after it was connected at least once?

16 views
Skip to first unread message

Igor Katson

unread,
May 24, 2009, 4:45:33 AM5/24/09
to pytho...@python.org
I have written a socket server and some arbitrary clients. When I
shutdown the server, and do socket.close(), I cannot immediately start
it again cause it has some open sockets in TIME_WAIT state. It throws
address already in use exception at me. I have searched for that in
google but haven't found a way to solve that.

Tried
setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
but that does not help.

Is there a nice way to overcome this?

Lawrence D'Oliveiro

unread,
May 24, 2009, 6:45:11 AM5/24/09
to
In message <mailman.651.1243154...@python.org>, Igor Katson
wrote:

> I have written a socket server and some arbitrary clients. When I
> shutdown the server, and do socket.close(), I cannot immediately start
> it again cause it has some open sockets in TIME_WAIT state. It throws
> address already in use exception at me.

There's a reason for that. It's to ensure that there are no leftover packets
floating around the Internet somewhere, that you might mistakenly receive
and think they were part of a new connection, when they were in fact part of
an old one.

The right thing to do is try to ensure that all your connections are
properly closed at shutdown. That may not be enough (if your server crashes
due to bugs), so the other thing you need to do is retry the socket open,
say, at 30-second intervals, until it succeeds.

Igor Katson

unread,
May 24, 2009, 6:46:24 AM5/24/09
to pytho...@python.org
Igor Katson wrote:
> I have written a socket server and some arbitrary clients. When I
> shutdown the server, and do socket.close(), I cannot immediately start
> it again cause it has some open sockets in TIME_WAIT state. It throws
> address already in use exception at me. I have searched for that in
> google but haven't found a way to solve that.
>
> Tried
> setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
> but that does not help.
>
> Is there a nice way to overcome this?
Solved myself. SO_REUSEADDE should be used on the second listening
socket creation (while time_wait already hangs)

Дамјан Георгиевски

unread,
May 24, 2009, 7:02:45 AM5/24/09
to

This should work, AFAIK you only need to do it before you call .bind(..)
on the accept-ing socket

--
дамјан ( http://softver.org.mk/damjan/ )

Give me the knowledge to change the code I do not accept,
the wisdom not to accept the code I cannot change,
and the freedom to choose my preference.

Roy Smith

unread,
May 24, 2009, 9:21:48 AM5/24/09
to
In article <gvb8fn$7gm$1...@lust.ihug.co.nz>,

Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:

> In message <mailman.651.1243154...@python.org>, Igor Katson
> wrote:
>
> > I have written a socket server and some arbitrary clients. When I
> > shutdown the server, and do socket.close(), I cannot immediately start
> > it again cause it has some open sockets in TIME_WAIT state. It throws
> > address already in use exception at me.
>
> There's a reason for that. It's to ensure that there are no leftover packets
> floating around the Internet somewhere, that you might mistakenly receive
> and think they were part of a new connection, when they were in fact part of
> an old one.

In theory, that is indeed the reason for the TIME_WAIT state. In practice,
however, using SO_REUSEADDR is pretty safe, and common practice.

You've got several things working in your favor. First, late-delivery of
packets is pretty rare. Second, if some late packet were to arrive, the
chances of them having the same local and remote port numbers as an
existing connection is slim. And, finally, the TCP sequence number won't
line up.

One thing to be aware of is that SO_REUSEADDR isn't 100% portable. There
are some systems (ISTR HP-UX) which use SO_REUSEPORT instead of
SO_REUSEADDR. The original specifications weren't very clear, and some
implementers read them in strange ways. Some of that old code continues in
use today. I only mention this because if you try SO_REUSEADDR and it's
not doing what you expect, it's worth trying SO_REUSEPORT (or both) to see
what happens on your particular system.

> The right thing to do is try to ensure that all your connections are
> properly closed at shutdown. That may not be enough (if your server crashes
> due to bugs), so the other thing you need to do is retry the socket open,
> say, at 30-second intervals, until it succeeds.

That may be a reasonable thing to do for production code, but when you're
building and debugging a server, it's a real pain to not be able to restart
it quickly whenever you want (or need) to.

Igor Katson

unread,
May 24, 2009, 10:44:49 AM5/24/09
to Roy Smith, pytho...@python.org
Thanks for a great answer, Roy!

Lawrence D'Oliveiro

unread,
May 24, 2009, 10:59:24 PM5/24/09
to

On the contrary, I run exactly the same logic--and that includes socket-
handling logic--in both test and production servers. How else can I be sure
it'll work properly in production?

Roy Smith

unread,
May 25, 2009, 7:27:37 AM5/25/09
to
In article <gvd1id$8jj$2...@lust.ihug.co.nz>,

Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:

> In message <roy-3C4DED.0...@news.panix.com>, Roy Smith wrote:
>
> > In article <gvb8fn$7gm$1...@lust.ihug.co.nz>,
> > Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
> >
> >> The right thing to do is try to ensure that all your connections are
> >> properly closed at shutdown. That may not be enough (if your server
> >> crashes due to bugs), so the other thing you need to do is retry the
> >> socket open, say, at 30-second intervals, until it succeeds.
> >
> > That may be a reasonable thing to do for production code, but when you're
> > building and debugging a server, it's a real pain to not be able to
> > restart it quickly whenever you want (or need) to.
>
> On the contrary, I run exactly the same logic--and that includes socket-
> handling logic--in both test and production servers. How else can I be sure
> it'll work properly in production?

If running without SO_REUASEADDR works for you, that's great. I was just
pointing out how it can be useful in cases such as the OP's, where he's
getting bind errors when he restarts his server.

Lawrence D'Oliveiro

unread,
May 27, 2009, 1:55:38 AM5/27/09
to
In message <roy-50FE2E.0...@news.panix.com>, Roy Smith wrote:

> In article <gvd1id$8jj$2...@lust.ihug.co.nz>,
> Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
>
>> In message <roy-3C4DED.0...@news.panix.com>, Roy Smith wrote:
>>
>> > In article <gvb8fn$7gm$1...@lust.ihug.co.nz>,
>> > Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
>> >
>> >> The right thing to do is try to ensure that all your connections are
>> >> properly closed at shutdown. That may not be enough (if your server
>> >> crashes due to bugs), so the other thing you need to do is retry the
>> >> socket open, say, at 30-second intervals, until it succeeds.
>> >
>> > That may be a reasonable thing to do for production code, but when
>> > you're building and debugging a server, it's a real pain to not be able
>> > to restart it quickly whenever you want (or need) to.
>>
>> On the contrary, I run exactly the same logic--and that includes socket-
>> handling logic--in both test and production servers. How else can I be
>> sure it'll work properly in production?
>

> I was just pointing out how it can be useful in cases such as the OP's,
> where he's getting bind errors when he restarts his server.

And I was pointing out how important it was to make sure your code deals
gracefully with those errors.

Thomas Bellman

unread,
May 28, 2009, 6:42:05 AM5/28/09
to
Roy Smith <r...@panix.com> wrote:

Speaking as a sysadmin, running applications for production,
programs not using SO_REUSEADDR should be taken out and shot.

You *can't* ensure that TCP connections are "properly closed".
For example, a *client* crashing, or otherwise becoming
unreachable, will leave TCP connections unclosed, no matter
what you do.

Not using SO_REUSEADDR means forcing a service interruption of
half an hour (IIRC) if for some reason the service must be
restarted, or having to reboot the entire machine. No thanks.
I have been in that situation.


--
Thomas Bellman, Lysator Academic Computer Club, Linköping University
"Never let your sense of morals prevent you ! Sweden ; +46-13 177780
from doing what is right." -- Salvor Hardin ! bel...@lysator.liu.se

Lawrence D'Oliveiro

unread,
May 28, 2009, 5:07:03 PM5/28/09
to
In message <gvlppt$hk0$1...@news.lysator.liu.se>, Thomas Bellman wrote:

> Speaking as a sysadmin, running applications for production,
> programs not using SO_REUSEADDR should be taken out and shot.

> Not using SO_REUSEADDR means forcing a service interruption of


> half an hour (IIRC) if for some reason the service must be
> restarted, or having to reboot the entire machine.

No, you do not recall correctly. And anybody wanting to reboot a machine to
work around a "problem" like that should be taken out and shot.

Thomas Bellman

unread,
May 30, 2009, 1:44:57 AM5/30/09
to
Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:

> In message <gvlppt$hk0$1...@news.lysator.liu.se>, Thomas Bellman wrote:

>> Speaking as a sysadmin, running applications for production,
>> programs not using SO_REUSEADDR should be taken out and shot.

>> Not using SO_REUSEADDR means forcing a service interruption of
>> half an hour (IIRC) if for some reason the service must be
>> restarted, or having to reboot the entire machine.

> No, you do not recall correctly.

*Tests* It seems to be 100 seconds in Fedora 9 and 60 seconds in
Solaris 10. OK, that amount of time is not totally horrible, in
many cases just annoying. Still much longer for an interruption
of service that could have been just 1-2 seconds.

However, I *have* used systems where it took much longer. It was
slightly more than ten years ago, under an earlier version of
Solaris 2, problably 2.4. It may be that it only took that long
under certain circumstances that the application we used always
triggered, but we did have to wait several tens of minutes. It
was way faster to reboot the machine than waiting for the sockets
to time out.

> And anybody wanting to reboot a machine to
> work around a "problem" like that should be taken out and shot.

We weren't exactly keen on rebooting the machine, but it was the
fastest way of getting out of that situation that we could figure
out. How *should* we have dealt with it in your opinion?


--
Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
"God is real, but Jesus is an integer." ! bellman @ lysator.liu.se
! Make Love -- Nicht Wahr!

Lawrence D'Oliveiro

unread,
May 31, 2009, 4:00:22 AM5/31/09
to
In message <gvqh4p$obn$1...@news.lysator.liu.se>, Thomas Bellman wrote:

> We weren't exactly keen on rebooting the machine, but it was the
> fastest way of getting out of that situation that we could figure
> out. How *should* we have dealt with it in your opinion?

Remember, the timed_wait timeout is there for a reason, and trying to defeat
it could reduce the reliability of your application--that's why cutting
corners is a bad idea.

If you want to minimize the effect of the timeout, then just use different
ports, and have the clients find them via DNS SRV records.

Reply all
Reply to author
Forward
0 new messages