Concepts and terminology useful for improving the ipaddr library

1 view

Skip to first unread message

Clay McClure

unread,

Jun 2, 2009, 1:58:38 PM6/2/09

to ipaddr...@googlegroups.com

Hello,

In continuing the discussions linked here:

http://mail.python.org/pipermail/python-dev/2009-June/089809.html
http://bugs.python.org/issue3959

it will be helpful to take a step back and define some concepts.

Also, let me note up front that I appreciate the work you have done
with ipaddr, and nothing that follows is meant to diminish your
efforts. Rather, I hope that this is constructive criticism that can
be used to improve the library so that it appeals to a broader set of
developers.

In our previous exchanges, we seem to be talking past each other
without understanding what the other is saying. I believe that's
because we're using subtly different meanings of the same words. You
are applying very specific meanings relevant to your jobs at Google;
I'm applying more general meanings with broader application.

For example, when you talk about IP addresses, you seem to be
referring to either the address as it is configured on an interface,
or the address as it appears in the routing table. Interface
assignments and routes are useful concepts, but they are subtly
different than IP addresses and deserve individual treatment -- I will
cover them later in this email.

When I refer to an IP address, I'm referring to the concept of a
protocol endpoint used as a source or destination address in IP
datagrams. This definition comes from the canonical document on the
subject, RFC-791. Addresses are simple scalars, that is, they
represent a single value -- a 32-bit address (I'll assume IPv4 for
brevity, but IPv6 also applies). Address do not have masks or
broadcast addresses, and they cannot contain other addresses. IP
addresses can be mapped to hardware addresses (with ARP), can be
configured on interfaces, and can be leased to hosts via DHCP. None of
these operations makes sense with networks.

Conversely, networks are collection types that contain addresses.
Networks have two key attributes: an address and a mask. Together,
these two attributes uniquely define the network (within a given
routing domain, of course). Networks have a prefix length (another way
of representing the mask), a size (dictated by the mask), and a
broadcast address. None of these things makes sense with addresses.

These two concepts, address and networks, deserve first-class
treatment in any meaningful IP address library, using separate but
related classes. Some functionality could be shared between the
classes. For example, properties like address_class, is_private,
is_link_local, is_multicast, and is_experimental might make sense for
both addresses and networks.

Having now defined the two principal classes required for an IP
address library (Address and Network), we can look at more specific
use cases for each. First, we'll look at network interface
configuration.

When a network interface is configured, four pieces of information
need to provided:

* The IP network to which the interface is attached. This counts as
two pieces of information because both the network address and the
network mask must be supplied. This network is added to the host's
routing table in order to route datagrams destined for the network out
the appropriate interface.

* The host's IP address on this network. This is added to the list of
addresses that the host accepts as its own, and is also used as the
source address for datagrams routed out the interface.

* The broadcast address for the network. This is also added to the
list of addresses that the host accepts as its own. Additionally, the
IP stack maps outbound packets destined for the broadcast address into
link-layer broadcasts. The broadcast address can usually be inferred
from the network address and mask, but it is actually a configurable
property and can be any arbitrary address in the network.

Since the interface IP address will always be contained within the
interface network, the network can be inferred from the address, in
one of two ways: either the network mask is explicitly provided, or
the network mask is inferred from the address class.

What this means in practice is that commands that configure interfaces
require at most two pieces of information: the interface address and
the network mask. When used on classful networks, only one piece of
information, the interface address, needs to be supplied.

Because it is possible to infer the interface network from the
interface address, you can configure an interface like so:

ifconfig en0 192.168.1.1 netmask 255.255.255.0

Behind the scenes, ifconfig computes the network address by masking
the interface address with the network mask, arriving at 192.168.1.0.
When this is done, it is able to configure a route to 192.168.1.0/24
out en0.

Since CIDR notation is now universal, this same command can also be written as:

ifconfig en0 192.168.1.1/24

Again, ifconfig computes the network address by masking the interface
address with the network mask implied by the prefix length, and then
configures a route to 192.168.1.0/24 out en0.

From this use of CIDR notation in the ifconfig command, you might
believe that IP addresses have masks. But that is not accurate. The
*network* 192.168.1.0 has a mask of 255.255.255.0. The address
192.168.1.1 does not have a mask -- indeed no address has a mask. At
most you could say that 192.168.1.1 is contained by the network
192.168.1.0, which has a mask of 255.255.255.0.

We can look at routes from the same perspective. The IP routing
process looks up the next hop (or local network) for each inbound IP
datagram based on its destination address. Again, this destination
address has no mask -- addresses never have masks. The routing process
consults the host's routing table, which consists of tuples of the
form (destination network, next hop) -- we ignore metric here as it
doesn't add to the discussion.

A route's destination network consists of an address and a mask -- all
networks have masks. The route's next hop is either the address of a
gateway closer to the destination network, or an interface attached to
a directly connected network.

The routing process selects the most specific route for each datagram.
The most specific route is determined by first selecting all networks
in the routing table that contain the destination address (as
determined by masking the destination address with the route's network
mask and then comparing with the route's network address), and then
selecting the route with the longest network prefix.

Many of the networks populated in the routing table will have prefix
lengths less than 32, indicating networks that consist of multiple
individual addresses. For example, a routing table entry for
192.168.1.0/24 indicates that the 256 individual addresses contained
within the 192.168.1.0/24 network are reachable via that route's next
hop.

Often, however, routing tables will contain entries whose network
prefix lengths equal 32. These routes are known as host routes, as
described in RFC-4632. Host routes work exactly like other routes:
they have a network (consisting of address and mask) and a next hop.
The routing process selects a host route using the same process as it
does for other routes. There is nothing inherently special or unique
about host routes. They merely indicate that a single IP address is
accessible via the advertised next hop.

When configuring routes, we must supply the destination network (both
its address and its mask) and the next hop. Since CIDR notation is
common, these two commands are equivalent:

route add 192.168.2.3 netmask 255.255.255.255 192.168.1.1
route add 192.168.2.3/32 192.168.1.1

Behind the scenes, the route command translates the string
"192.168.2.3/32" into an internal representation of the network of the
form 192.168.2.3/32. This host route is then added to the routing
table.

Some network software allows destination networks to be supplied
without a mask. In these cases, the software might assume a host route
is being configured. Thus the following syntax may be valid:

route add 192.168.2.3 192.168.1.1

But behind the scenes, the route command still translates the string
"192.168.2.3" into an internal representation of the network of the
form 192.168.2.3/32 before adding it to the routing table.

From this example, you might think that the strings '192.168.2.3' and
'192.168.2.3/32' are functionally equivalent. But that is only true in
this very specific context, with software that supports this kind of
inference. You can continue to insist that the strings '192.168.2.3'
and '192.168.2.3/32' are equivalent, but here is a clear example of
where they are not:

# ifconfig en0 192.168.2.3
# ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 192.168.2.3 netmask 0xffffff00 broadcast 192.168.2.255
...

# ifconfig en0 192.168.2.3/32
# ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 192.168.2.3 netmask 0xffffffff broadcast 192.168.2.3
...

Unless you believe that the authors of Darwin's IP stack (which is
based on the BSD stack, perhaps the longest-lived IP stack in
existence) are wrong on this point, you should consider the
possibility that there is a semantic difference between an address and
a network.

In summary, I believe your misunderstanding of domain concepts
(address, network, interface, route) has led you to design an IP
address library that is unusable by anyone who does not share your own
imprecise understanding of IP.