best way to detect TCP self-connect

Michael Wojcik

unread,

Mar 24, 2004, 3:31:17 PM3/24/04

to

I have a wrapper for connect() in which I need to implement self-
connect detection. In the event of a self-connect, the wrapper will
close the socket and respond as if it had received ECONNREFUSED.
(The documented semantics for this wrapper implicitly make self-
connect equivalent to a failure to connect at all.)

(For readers not familiar with TCP self-connect: it's permissible for
a socket to connect to itself if the local and remote ports match and
the remote IP address is in fact a local IP address, ie the address
of one of the local host's interfaces. This condition is most often
met when a process tries to connect to a local port in the ephemeral
range which is currently unused, and gets assigned that port as the
socket's local port.)

This wrapper has been around for years; the self-connect issue only
came to light because on Linux a bug caused it to loop through a
great number of connect attempts - sometimes several million in the
space of 5 seconds or so. Consequently it did a good job of walking
through the ephemeral port range. In this case the server it was
trying to connect to doesn't use a fixed port; instead it binds to 0
and then registers the port assigned by the stack with a port
registrar. After the server was shut down the registrar still had
the most recent port assignment, which was in the ephemeral range,
but the port was currently unused...

Anyway: I've implemented simple self-connect detection that just
calls getsockname() after a successful connect and compares both IP
address and port with the destination I was connecting to. If they
match, it's a self-connect.

However, I was concerned that it might be possible on a multihomed
host (which would really include just about every host, if we count
the loopback interface) for the local address to use one local IP
address and the remote to use another, in which case I'd get a false
negative. Does anyone know offhand if that's legal, and if so of any
implementations that behave that way?

And does anyone have a suggestion for a better way of detecting self-
connect? In theory, I really ought to compare the destination IP
address against the list of local interfaces, but this wrapper runs
on many Unix platforms and Windows, and AFAIK there's no portable way
of enumerating local interfaces.

--
Michael Wojcik michael...@microfocus.com

The lark is exclusively a Soviet bird. The lark does not like the
other countries, and lets its harmonious song be heard only over the
fields made fertile by the collective labor of the citizens of the
happy land of the Soviets. -- D. Bleiman

glen herrmannsfeldt

unread,

Mar 25, 2004, 3:41:50 AM3/25/04

to

Michael Wojcik wrote:

> I have a wrapper for connect() in which I need to implement self-
> connect detection. In the event of a self-connect, the wrapper will
> close the socket and respond as if it had received ECONNREFUSED.
> (The documented semantics for this wrapper implicitly make self-
> connect equivalent to a failure to connect at all.)
>
> (For readers not familiar with TCP self-connect: it's permissible for
> a socket to connect to itself if the local and remote ports match and
> the remote IP address is in fact a local IP address, ie the address
> of one of the local host's interfaces. This condition is most often
> met when a process tries to connect to a local port in the ephemeral
> range which is currently unused, and gets assigned that port as the
> socket's local port.)

(snip)

> However, I was concerned that it might be possible on a multihomed
> host (which would really include just about every host, if we count
> the loopback interface) for the local address to use one local IP
> address and the remote to use another, in which case I'd get a false
> negative. Does anyone know offhand if that's legal, and if so of any
> implementations that behave that way?

It still seems a little strange that self connect coult happen.

My guess is that self connect, as you describe, is a problem because
(local IP, local port, remote IP, remote port) is exactly the same as
(remote IP, remote port, local IP, local port), though that would not
be true if two different addresses on the same host were used.

You would fail to detect it, but I don't see why it wouldn't work.

-- glen

Keith Wansbrough

unread,

Mar 25, 2004, 8:59:50 AM3/25/04

to kw217

glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:

> Michael Wojcik wrote:
>
> > However, I was concerned that it might be possible on a multihomed
> > host (which would really include just about every host, if we count
> > the loopback interface) for the local address to use one local IP
> > address and the remote to use another, in which case I'd get a false
> > negative. Does anyone know offhand if that's legal, and if so of any
> > implementations that behave that way?
>
> It still seems a little strange that self connect coult happen.
>
> My guess is that self connect, as you describe, is a problem because
> (local IP, local port, remote IP, remote port) is exactly the same as
> (remote IP, remote port, local IP, local port), though that would not
> be true if two different addresses on the same host were used.

I agree - the 4-tuples would be different, and so it would not be a
self-connection. Instead, it would most likely give ECONNREFUSED due
to the lack of a listening socket at the other endpoint.

--KW 8-)
--
Keith Wansbrough <kw...@cl.cam.ac.uk>
http://www.cl.cam.ac.uk/users/kw217/
University of Cambridge Computer Laboratory.

Michael Wojcik

unread,

Mar 25, 2004, 10:54:03 AM3/25/04

to

In article <yqcr7vh...@astrocyte.cl.cam.ac.uk>, Keith Wansbrough <kw...@cl.cam.ac.uk> writes:
> glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:
>
> > Michael Wojcik wrote:
> >
> > > However, I was concerned that it might be possible on a multihomed
> > > host (which would really include just about every host, if we count
> > > the loopback interface) for the local address to use one local IP
> > > address and the remote to use another, in which case I'd get a false
> > > negative. Does anyone know offhand if that's legal, and if so of any
> > > implementations that behave that way?
> >
> > It still seems a little strange that self connect coult happen.

I was surprised too, but a quick Google review pulled up dozens of
references, and _TCP/IP Illustrated_ v2 has a section (28.9)
discussing it (along with simultaneous open). Apparently the BSD
stack prior to 4.4 didn't handle it correctly, which is one reason
it's not well- known.

Simultaneous open is, of course, a deliberate feature of TCP -
there are some comments in _TCP/IP Illustrated_ v1 about why it's
desirable and how TCP differs in this respect from eg OSI.

It's less clear that self-connect is a feature. I suppose it
might be useful for testing in some situations, or for processes
that wanted to use a socket as an internal queue (on a platform
that didn't already have a better mechanism for that, which seems
unlikely). Stevens implies that self-connect just falls out from
simultaneous open, which is plausible.

> > My guess is that self connect, as you describe, is a problem because
> > (local IP, local port, remote IP, remote port) is exactly the same as
> > (remote IP, remote port, local IP, local port), though that would not
> > be true if two different addresses on the same host were used.
>
> I agree - the 4-tuples would be different, and so it would not be a
> self-connection. Instead, it would most likely give ECONNREFUSED due
> to the lack of a listening socket at the other endpoint.

Maybe. In v2 Stevens says:

A process creates a socket and connects it to itself using the
system calls: socket, bind a local port (say 3000), and then
connect to this same port and some local IP address. (960)

The "some local IP address" bit is what worries me. It appears,
based on Stevens' description of the control flow (962), that any
local IP address will be a match, because the implemenation he's
looking at - BSD 4.4 - handles an outbound packet for any local
interface by queuing it for the loopback interface.

I ran across an interesting piece by Craig Milo Rogers on the
subject of self-connect in Linux.[1] It seems to imply that Linux
is unusual in allowing *accidental* self-connect, which is what's
happening here; that some (most?) implementations where self-
connect is possible at all (which it should be) have a check to
prevent assigning an ephemeral port which matches the destination
port (possibly only if the destination address is a local one,
though there's not really any need to make that check).

He believes he remembers Jon (Postel, presumably) recommending
this check in the stack as a defense against accidental self-
connect.

So unfortunately it appears that accidental self-connect is all
too possible on Linux, and that it may happen when source address
!= destination address, making it troublesome to detect. OTOH,
fixing my wrapper code so it only tries a reasonable number of
connects will make it *much* less likely, and checking for the
easy case where source address == destination address will catch
at least some of any accidental self-connects that sneak through.

1. http://www.ussg.iu.edu/hypermail/linux/kernel/9909.3/0510.html

--
Michael Wojcik michael...@microfocus.com

She felt increasingly (vision or nightmare?) that, though people are
important, the relations between them are not, and that in particular
too much fuss has been made over marriage; centuries of carnal
embracement, yet man is no nearer to understanding man. -- E M Forster

glen herrmannsfeldt

unread,

Mar 25, 2004, 3:53:59 PM3/25/04

to

Michael Wojcik wrote:

(snip regarding TCP connections with the same source
and destination IP and port)

>>>My guess is that self connect, as you describe, is a problem because
>>>(local IP, local port, remote IP, remote port) is exactly the same as
>>>(remote IP, remote port, local IP, local port), though that would not
>>>be true if two different addresses on the same host were used.

>>I agree - the 4-tuples would be different, and so it would not be a
>>self-connection. Instead, it would most likely give ECONNREFUSED due
>>to the lack of a listening socket at the other endpoint.

OH, I thought the idea was that there is a listening socket.

Say I decide to run a telnet server on port 60000 hoping
that no-one will notice. (Security through obscurity.)

> Maybe. In v2 Stevens says:

> A process creates a socket and connects it to itself using the
> system calls: socket, bind a local port (say 3000), and then
> connect to this same port and some local IP address. (960)

> The "some local IP address" bit is what worries me. It appears,
> based on Stevens' description of the control flow (962), that any
> local IP address will be a match, because the implemenation he's
> looking at - BSD 4.4 - handles an outbound packet for any local
> interface by queuing it for the loopback interface.

> I ran across an interesting piece by Craig Milo Rogers on the
> subject of self-connect in Linux.[1] It seems to imply that Linux
> is unusual in allowing *accidental* self-connect, which is what's
> happening here; that some (most?) implementations where self-
> connect is possible at all (which it should be) have a check to
> prevent assigning an ephemeral port which matches the destination
> port (possibly only if the destination address is a local one,
> though there's not really any need to make that check).

I might have thought that before assigning ephemeral ports
that the system would check that the port wasn't already in use.

Though as machines could make tens of thousands of connections
that would be a little too strict. Checking that the same
quad wasn't already in use wouldn't catch self connect.

Otherwise, if I telnet localhost 60000 and the source
port happens to be 60000 and the source IP is different then
I would think it should work.

Though I believe that many systems adjust the source
address to match the destination net, in which case it
would not be able to do that.

> He believes he remembers Jon (Postel, presumably) recommending
> this check in the stack as a defense against accidental self-
> connect.

It isn't so obvious that self connect can't be made to work.
If the system knows which socket the packet came from it
should just send it to the other one. That slightly violates
that rule that the quad is unique to each side of the TCP
connection. That seems more work than the OS detecting it.

> So unfortunately it appears that accidental self-connect is all
> too possible on Linux, and that it may happen when source address
> != destination address, making it troublesome to detect. OTOH,
> fixing my wrapper code so it only tries a reasonable number of
> connects will make it *much* less likely, and checking for the
> easy case where source address == destination address will catch
> at least some of any accidental self-connects that sneak through.

-- glen

Keith Wansbrough

unread,

Mar 25, 2004, 6:04:23 PM3/25/04

to kw217

mwo...@newsguy.com (Michael Wojcik) writes:

> > I agree - the 4-tuples would be different, and so it would not be a
> > self-connection. Instead, it would most likely give ECONNREFUSED due
> > to the lack of a listening socket at the other endpoint.
>
> Maybe. In v2 Stevens says:
>
> A process creates a socket and connects it to itself using the
> system calls: socket, bind a local port (say 3000), and then
> connect to this same port and some local IP address. (960)
>
> The "some local IP address" bit is what worries me. It appears,
> based on Stevens' description of the control flow (962), that any
> local IP address will be a match, because the implemenation he's
> looking at - BSD 4.4 - handles an outbound packet for any local
> interface by queuing it for the loopback interface.

Yes, but TCP connections are not distinguished by interface, but by IP
address. It is perfectly possible for a multihomed host (with IP
addresses i1 and i1') to have two concurrent TCP connections to a
single other host (i2), with IP addresses and ports (i1,p1,i2,p2) and
(i1',p1,i2,p2). These connections are distinct, since the quadruples
differ. The same would be true for a purported self-connection from
(i1,p1) to (localhost,p1) - the quadruples differ, so the connection
wouldn't happen.

glen herrmannsfeldt

unread,

Mar 26, 2004, 5:24:04 AM3/26/04

to

Keith Wansbrough wrote:

> Yes, but TCP connections are not distinguished by interface, but by IP
> address. It is perfectly possible for a multihomed host (with IP
> addresses i1 and i1') to have two concurrent TCP connections to a
> single other host (i2), with IP addresses and ports (i1,p1,i2,p2) and
> (i1',p1,i2,p2). These connections are distinct, since the quadruples
> differ. The same would be true for a purported self-connection from
> (i1,p1) to (localhost,p1) - the quadruples differ, so the connection
> wouldn't happen.

I am not sure what you mean by wouldn't happen.

The quad (i1,p1,127.1,p1) should make a perfectly fine TCP
connection. The quad (i1,p1,i1,p1) might not.

-- glen

Keith Wansbrough

unread,

Mar 26, 2004, 7:32:07 AM3/26/04

to kw217

glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:

Sorry, I mean that (i1,p1,127.1,p1) != (127.1,p1,i1,p1), and so
there's no self-connection.

Michael Wojcik

unread,

Mar 26, 2004, 8:44:50 AM3/26/04

to

In article <HFH8c.95811$po.734093@attbi_s52>, glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:
> Michael Wojcik wrote:
>

> OH, I thought the idea was that there is a listening socket.

Not with self-connect. Self-connect connects a single socket to
itself. The socket goes from CLOSED to SYN_SENT to SYN_RCVD to
ESTABLISHED. Stevens has a more detailed description.

> Say I decide to run a telnet server on port 60000 hoping
> that no-one will notice. (Security through obscurity.)

Not sure where you're going with this, but a client trying to connect
to port 60000 on the local machine when the server isn't running (no
socket in LISTEN state for port 6000) should self- connect if its
source port is 60000.

As I noted previously, though, BSD 4.3 and some implementations based
on it had a bug that prevented self-connect (and simultaneous open)
from succeeding; some later implementations had a bug that would
crash the stack (cf the "LAND attack"); and many specifically avoid
assigning an ephemeral port that matches the destination port, so
accidental self-connect is impossible. (It's still possible on such
systems to deliberately self-connect by binding the client to the
destination port before calling connect.)

> I might have thought that before assigning ephemeral ports
> that the system would check that the port wasn't already in use.

Apparently most do, but not Linux.

> Though as machines could make tens of thousands of connections
> that would be a little too strict. Checking that the same
> quad wasn't already in use wouldn't catch self connect.

No, but checking that next-ephemeral-port != destination-port is
trivial when assigning the ephemeral port in connect().

> Though I believe that many systems adjust the source
> address to match the destination net, in which case it
> would not be able to do that.

Yes, that occurred to me. When I'm connecting with an unbound
socket, the stack has to assign a source IP address as well as
an ephemeral port, of course, and if I'm connecting to a local
address then it seems sensible that the stack would assign that
same address - and so my check for self-connect would work.

> > He believes he remembers Jon (Postel, presumably) recommending
> > this check in the stack as a defense against accidental self-
> > connect.
>
> It isn't so obvious that self connect can't be made to work.

It's supposed to, and it does. The problem (for my library) is
detecting it if it happens accidentally, or (for the stack)
preventing it from happening accidentally.

> If the system knows which socket the packet came from it
> should just send it to the other one. That slightly violates
> that rule that the quad is unique to each side of the TCP
> connection.

Apparently that rule is a gloss which doesn't correctly cover the
case of self-connect. According to the references I've seen,
self-connect, bizzare though it seems, is not only legal but
necessary for a fully-compliant TCP implementation. It falls out
from support for simultaneous open (which RFC 1122 requires) coupled
with the description of the TCP state machine.

Thanks for your comments.

--
Michael Wojcik michael...@microfocus.com

"Well, we're not getting a girl," said Marilla, as if poisoning wells were
a purely feminine accomplishment and not to be dreaded in the case of a boy.
-- L. M. Montgomery, _Anne of Green Gables_

Michael Wojcik

unread,

Mar 26, 2004, 8:52:51 AM3/26/04

to

In article <yqcad24...@astrocyte.cl.cam.ac.uk>, Keith Wansbrough <kw...@cl.cam.ac.uk> writes:

> mwo...@newsguy.com (Michael Wojcik) writes:
>
> > Maybe. In v2 Stevens says:
> >
> > A process creates a socket and connects it to itself using the
> > system calls: socket, bind a local port (say 3000), and then
> > connect to this same port and some local IP address. (960)
> >
> > The "some local IP address" bit is what worries me. It appears,
> > based on Stevens' description of the control flow (962), that any
> > local IP address will be a match, because the implemenation he's
> > looking at - BSD 4.4 - handles an outbound packet for any local
> > interface by queuing it for the loopback interface.
>
> Yes, but TCP connections are not distinguished by interface, but by IP
> address.

Right, of course. Processed by the loopback interface but not
necessarily with the loopback interface's address.

> The same would be true for a purported self-connection from
> (i1,p1) to (localhost,p1) - the quadruples differ, so the connection
> wouldn't happen.

True. The stack has to assign an address to the socket before it
can send the SYN, and that address has to match the destination
address for self-connect to succeed. So comparing source and
destination addresses and ports is sufficient to check for self-
connect.

Thanks.

--
Michael Wojcik michael...@microfocus.com

Americans have five disadvantages which you should take into account
before giving us too hard a time:
- We're landlocked
- We're monolingual
- We have poor math and geography skills -- Lucas MacBride