CIDR/AddrRange collection object

oubiwann

unread,

Aug 27, 2008, 5:47:11 PM8/27/08

to netaddr

Hey folks,

Here's the thread I mentioned in my last post.

Note that I also opened a ticket for this, and you can see that here:
http://code.google.com/p/netaddr/issues/detail?id=10

d

---------- Forwarded message ----------
From: DrKJam <drk...@gmail.com>
Date: Wed, Aug 27, 2008 at 3:12 PM
Subject: Re: First branch
To: Duncan McGreggor <duncan.m...@gmail.com>

2008/8/27 Duncan McGreggor <duncan.m...@gmail.com>
>
> Hey man,
>
> I'd like to create a branch for adding __add__ support to CIDR
> obejcts. Are you okay with this? I will not touch trunk and will open
> a ticket with a full description of the purpose for the branch, etc.
>
> Hrm. Actually... looking at the code, that would complicate things
> terribly. first() and last() would loose their meaning. I guess there
> really should be another object similar (but better) to what I did in
> NetCDR: we need a collection-of-ranges object.
>
> So I guess I really want to start a branch for adding netaddr.network.* ...
>
> Thoughts?
>
> d

My preference would be to create a new 'namespace' alongside the
existing ones (netaddr.network.* sounds fine) for NetCIDR porting and
have the checkins done directly on the trunk. Merges are ugly at the
best of times and I think it's excessive to have a branch at this
early stage in the project when things are moving so quickly. Things
will probably break more often, but we can also spot and fix them more
quickly and easily too and hopefully move ahead at a better pace.
Branch and merge has its place but not in this case.

As a general rule, we should be creating unit tests ahead of time,
pinning down what we'd like to see in the interface and then writing
code to implement it. This is classic test driven development and is
how I've built up netaddr so far.

I vote for a new object that contains the sequence of CIDR (actually
AddrRange) objects and logic to test for the existence of an IP
(actually Addr) object within each one.

Do you mind if we move this discussion and all future proposals to the
netaddr group? I think it's best to try as much as possible to keep
everyone in the loop and maintain a searchable historical record of
all decisions that are made as we go along. It'll probably slow things
down a bit but I'd rather do that have happy users with the features
exposed in the way they want :-)

oubiwann

unread,

Aug 27, 2008, 5:55:56 PM8/27/08

to netaddr

On Aug 27, 3:12 pm, DrKJam <drk...@gmail.com> wrote:
>
> 2008/8/27 Duncan McGreggor <duncan.mcgreg...@gmail.com>

>
> > Hey man,
>
> > I'd like to create a branch for adding __add__ support to CIDR
> > obejcts. Are you okay with this? I will not touch trunk and will open
> > a ticket with a full description of the purpose for the branch, etc.
>
> > Hrm. Actually... looking at the code, that would complicate things
> > terribly. first() and last() would loose their meaning. I guess there
> > really should be another object similar (but better) to what I did in
> > NetCDR: we need a collection-of-ranges object.
>
> > So I guess I really want to start a branch for adding netaddr.network.* ...
>
> > Thoughts?
>
> > d
>
> My preference would be to create a new 'namespace' alongside the
> existing ones (netaddr.network.* sounds fine) for NetCIDR porting and
> have the checkins done directly on the trunk.

Great. That works for me. I'll create that module as a stub, and then
start writing unit tests.

> Merges are ugly at the
> best of times and I think it's excessive to have a branch at this
> early stage in the project when things are moving so quickly. Things
> will probably break more often, but we can also spot and fix them more
> quickly and easily too and hopefully move ahead at a better pace.
> Branch and merge has its place but not in this case.

I hear ya. I'm part of the Twisted project, and we do all tickets in
separate branches... but then again, we have custom tools for easily
creating, updating (merging forward), and merging branches. But
Twisted is a large project with lots of contributors and a long
history.

> As a general rule, we should be creating unit tests ahead of time,
> pinning down what we'd like to see in the interface and then writing
> code to implement it. This is classic test driven development and is
> how I've built up netaddr so far.

This sounds great to me :-) It's how we do it in Twisted too. For some
of my own projects, I tend to do bug-driven testing... the notable
exceptions being projects that have highly explicit requirements, such
as IP address calculations :-)

In short, I'm really glad to hear that this is your preferred
approach.

> I vote for a new object that contains the sequence of CIDR (actually
> AddrRange) objects and logic to test for the existence of an IP
> (actually Addr) object within each one.

Excellent and ditto.

> Do you mind if we move this discussion and all future proposals to the
> netaddr group? I think it's best to try as much as possible to keep
> everyone in the loop and maintain a searchable historical record of
> all decisions that are made as we go along. It'll probably slow things
> down a bit but I'd rather do that have happy users with the features
> exposed in the way they want :-)

Done!

d

oubiwann

unread,

Aug 27, 2008, 6:40:43 PM8/27/08

to netaddr

Hrm. I have a question about this, though...

The first feature I want to add is the one mentioned in the ticket and
would allow us to do this:
IP(addr) in CIDR(block1) + CIDR(block2)

But if we're going to implement this feature for AddrRange, it's not
just for IPs/CIDR blocks. Does that make sense, though? For instance,
what does it mean for a MAC (EUI) address to be "in" a MAC address
range? Well, I guess that could be potentially useful... you could
programmatically test if a NIC model belonged to a manufacturer. Well,
if the range was set properly. Then, there's this:

>>> from netaddr import EUI, AddrRange
>>> mac1 = EUI('00:15:3E:21:B4:B8')
>>> mac2 = EUI('00:16:3E:21:B4:B8')
>>> mac3 = EUI('00:17:3E:21:B4:B8')
>>> r = AddrRange(mac1, mac3)
>>> mac 2 in r
True

Implementation-wise, this makes sense. But I don't think it makes much
sense functionally... manufacturer 2 being "in" the MAC block defined
by manufacturer 1 and manufacturer 3?

But then again, as long as generally true (regardless of semantics
attached to various bits) and it's logically sound, I guess there's
probably no reason not to.

Thoughts?

d

David Moss

unread,

Aug 29, 2008, 7:14:40 AM8/29/08

to netaddr

> The first feature I want to add is the one mentioned in the ticket and
> would allow us to do this:
> IP(addr) in CIDR(block1) + CIDR(block2)

If I'm reading this right the snippet above implicitly combines 2 (or
more) possibly non-adjacent CIDR blocks into a sequence against which
a __contains__ test is then performed for the IP. I would have thought
a CIDRGroup (or similar wrapper object) would make more sense in this
case :-

>>> IP(addr) in CIDRGroup(block1, block2, ...)

I would envisage using addition on CIDR objects to combine two
*adjacent* CIDR blocks of the same size, returning a new larger CIDR
and (importantly) only where it logically makes sense. For example,
you couldn't combined 2 CIDRs of different size such as
CIDR('192.168.0.0/24') and CIDR('192.168.1.0/25') into a new single
CIDR. This would also allow us to define a subtraction operator as
well in following pseudocode examples :-

Addition

>>> cidr = CIDR('192.168.0.0/24') + CIDR('192.168.1.0/24')
>>> str(cidr)
'192.168.0.0/23'

Subtraction

>>> cidr_plus = CIDR('192.168.0.0/23') - CIDR('192.168.1.0/24')
>>> str(cidr_plus)
'192.168.0.0/24'

>>> cidr_minus = CIDR('192.168.0.0/23') - CIDR('192.168.0.0/24')
>>> str(cidr_minus)
'192.168.1.0/24'

>
> But if we're going to implement this feature for AddrRange, it's not
> just for IPs/CIDR blocks. Does that make sense, though? For instance,
> what does it mean for a MAC (EUI) address to be "in" a MAC address
> range?

You're right that MAC usage wouldn't make any sense. I was thinking
more about trying to mix and match Wildcard objects in a single
AddrGroup() object, but maybe that wouldn't necessarily be sensible
either. We'll have to look into this in more detail.

James William Pye

unread,

Aug 29, 2008, 11:10:31 AM8/29/08

to net...@googlegroups.com

On Aug 29, 2008, at 4:14 AM, David Moss wrote:
>> The first feature I want to add is the one mentioned in the ticket
>> and
>> would allow us to do this:
>> IP(addr) in CIDR(block1) + CIDR(block2)
>
> If I'm reading this right the snippet above implicitly combines 2 (or
> more) possibly non-adjacent CIDR blocks into a sequence against which
> a __contains__ test is then performed for the IP. I would have thought
> a CIDRGroup (or similar wrapper object) would make more sense in this
> case :-

ISTM that this is dangerously close to a set([...]) of CIDR blocks.
Not sure if we could easily work in the desired semantics by using a
set() instance alone, so CIDRGroup might be a good way to go.(subclass
set?)

>>>> IP(addr) in CIDRGroup(block1, block2, ...)
>
> I would envisage using addition on CIDR objects to combine two
> *adjacent* CIDR blocks of the same size, returning a new larger CIDR
> and (importantly) only where it logically makes sense. For example,
> you couldn't combined 2 CIDRs of different size such as
> CIDR('192.168.0.0/24') and CIDR('192.168.1.0/25') into a new single
> CIDR. This would also allow us to define a subtraction operator as
> well in following pseudocode examples :-

+1

> Addition
>
>>>> cidr = CIDR('192.168.0.0/24') + CIDR('192.168.1.0/24')
>>>> str(cidr)
> '192.168.0.0/23'
>
> Subtraction
>
>>>> cidr_plus = CIDR('192.168.0.0/23') - CIDR('192.168.1.0/24')
>>>> str(cidr_plus)
> '192.168.0.0/24'
>
>>>> cidr_minus = CIDR('192.168.0.0/23') - CIDR('192.168.0.0/24')
>>>> str(cidr_minus)
> '192.168.1.0/24'

+1, very nice.

What should happen if they are not adjacent? Exception?

Duncan McGreggor

unread,

Aug 29, 2008, 11:31:06 AM8/29/08

to net...@googlegroups.com

Yeah, I think so. Hrm.. maybe an exception with a note to use the group class...

Or maybe even better, for non-adjacent blocks, instead of returning a
CIDR object switch internally and return a CIDR group object.

d

James William Pye

unread,

Aug 30, 2008, 1:07:31 AM8/30/08

to net...@googlegroups.com, Duncan McGreggor

On Aug 29, 2008, at 8:31 AM, Duncan McGreggor wrote:
> Or maybe even better, for non-adjacent blocks, instead of returning a
> CIDR object switch internally and return a CIDR group object.

At first I liked this idea and starting considering some potential
corollaries. However, I feel that if you want a CIDRGroup, you should
create one. That is, if __add__ is to return either CIDRGroup or CIDR
then you can only rely on the intersection of the interfaces in order
to use it in a transparent manner(sure, you'll get ``__contains__``
and ``__iter__``(?), but I can't imagine much beyond that).

ISTM that it would be better to be explicit:

cidr1 + cidr2 = cidr3 where type(cidr3) always equals CIDR

cidrg = CIDRGroup([cidr1, cidr2])

I would imagine that most of the time people would just end up using
CIDRGroup.

Also, curious, what are the semantics of CIDRGroup? Can a group have
adjacent CIDRs? If the answer is yes, then I think it would be
desirable to include a reduce() method in order to simplify the group
as much as possible... Furthermore, would the only difference between
CIDRGroup and set([cidr1, ..., cidrN]) be a convenient interface(not
that there is anything wrong with a convenient interface ;)?

Personally, I think it would be useful to automatically conjoin
adjacent CIDRs to allow for presumed reductions. This would also help
make a clear, semantic distinction between a CIDRGroup and a simple
set([]) of CIDRs. The win being further justification for CIDRGroup's
existence. :)

hrm, perhaps I should have looked at the other project that implements
cidrgroup before I wrote this, but I'm having fun musing, sooo... :)

David Moss

unread,

Sep 8, 2008, 6:06:24 PM9/8/08

to netaddr

Unfortunately, I've spent most of my time so far on the low-level
mechanics and haven't spent significant time dwelling on the higher-
level use cases.

There is a lot involved in getting a collection of address groups for
netaddr right. Perhaps this is why I've steered clear of making any
firm decisions on their implemention to date. I always thought I'd
leave this up to the end user more than include it within netaddr
itself ;-)

The set() subclass idea isn't bad at all and is something I have
considered during netaddr's early phases of R&D. Here is a link to
Heiko Wundram's recipe in the online Python Cookbook that does just
this (IPv4 only mind you) :-

http://code.activestate.com/recipes/466298/

This code contains rather more than you would probably need and is
really quite complex. Getting the hashing and immutability right seem
to be the significant challenges, particular with the rather large
address values found in the IPv6 address space. Some heavy engineering
to include it into netaddr codebase, not to mention the licensing
issues.

It seems like a lot of effort - any volunteers to work on it ;-)

> Also, curious, what are the semantics of CIDRGroup? Can a group have
> adjacent CIDRs? If the answer is yes, then I think it would be
> desirable to include a reduce() method in order to simplify the group
> as much as possible...

Merging adjacent CIDRs internally in a class implmentation might not
be what the end user expects. Consider that someone might want to name
individual CIDR objects by decorating them with an attribute. Changing
the internal representation of a CIDRGroup class would actually end up
discarding vital information unintentionally. Because Python is
inherently flexible in this way, I always try not to assume too much
about how code might be used.

One property I would like to see in such a class would be some kind of
tree (or trie) based binary search to get away from strictly O(N)
membership tests where you have potentially nestable CIDRs (routing
tables spring to mind here).

In summary, it seems like we have a few possibilities for the addition
of CIDR objects.

A) Allow addition only between adjacent CIDRs of equal size.

>>> CIDR('192.168.0.0/24') + CIDR('192.168.1.0/24')

CIDR('192.168.0.0/23')

'Adding' them the other way around would also be valid yielding the
same result, like so :-

>>> CIDR('192.168.1.0/24') + CIDR('192.168.0.0/24')
CIDR('192.168.0.0/23')

Non-adjacent CIDRs or CIDRs that were adjacent but of different sizes
would have to raise exceptions.

On reflection, this doesn't actually seem like a very useful bit of
functionality that would get used much as you could easily get the
same effect like this :-

>>> c1 = CIDR('192.168.0.0/24')
>>> c1.prefixlen += 1
>>> str(c1)
'192.168.0.0/23'

You would be modifying your original CIDR though when you might
actually want a new one.

B) Allow 'addition' of CIDRs objects (adjacent or not) and return a
single CIDR that 'spans' all of them.

>>> CIDR('192.168.0.0/24') + CIDR('192.168.2.0/24')
CIDR('192.168.0.0/22')

This would lead to larger CIDRs and might actually be useful. It'd
probably be better to expose this in the API as a more general method
though that accepted a list of IPs and/or CIDRs.

C) Forget __add__ on CIDR altogether and go for a class that acts as a
container implementing __contains__ and __iter__.

>>> cg = CIDRGroup(['192.168.0.0/24'), CIDR('192.168.2.0/24'),
CIDR('224.0.0.0/4')])
>>> CIDR('192.168.0.128/25') in cg
True
>>> IP('239.0.0.1') in cg
True

An iterator would flatten and yield all IPs in the ranges. This might
be quite useful but end users could probably just roll their own
easily enough with whatever features they required.

D) Go for a full blown AddrRangeSet(set) object.

I don't know about you, but this gives me nightmares!

A), B), C), D) or something we haven't yet considered?

I guess I'm just having trouble justifying the effort without (very
selfishly) having my own use case ;-)

We probably need to thrash this out a bit more (what a long post this
is).

Dave M.

James William Pye

unread,

Sep 14, 2008, 6:19:35 AM9/14/08

to net...@googlegroups.com

On Sep 8, 2008, at 3:06 PM, David Moss wrote:
> Merging adjacent CIDRs internally in a class implmentation might not
> be what the end user expects. Consider that someone might want to name
> individual CIDR objects by decorating them with an attribute. Changing
> the internal representation of a CIDRGroup class would actually end up
> discarding vital information unintentionally. Because Python is
> inherently flexible in this way, I always try not to assume too much
> about how code might be used.

hrm. I've never been fond of relying on those kind of decorations
personally, but certainly it should be kept in mind. It does come in
handy from time to time...

However, with that, would we allow duplicate CIDRs in a CIDRGroup? If
no, how do we choose the one to use when presented with a choice in
the creation of a new CIDRGroup:

>>> p1 = netaddr.address.CIDR('192.168.0.0/16')
>>> p1.my_descriptor = 'my network!'
>>> p2 = netaddr.address.CIDR('192.168.0.0/16')
>>> p2.name = 'private network'
>>>
>>> g = netaddr.address.CIDRGroup([p1, p2])

But I digress; I'm luke warm on my thought(auto-reductions) anyways.
Really, a ``reduce()`` method returning the reduced variant seems
better as it would give you the ability to compare and contrast:

>>> c1=netaddr.CIDR('10.32.0.0/12')
>>> c2=netaddr.CIDR('10.48.0.0/12')
>>> c1.first()
netaddr.address.IP('10.32.0.0/32')
>>> c2.last()
netaddr.address.IP('10.63.255.255/32')
>>>
>>> cn=netaddr.CIDR('10.32.0.0/11')
>>> cn.first()
netaddr.address.IP('10.32.0.0/32')
>>> cn.last()
netaddr.address.IP('10.63.255.255/32')

So:

>>> cg = netaddr.CIDRGroup(['10.32.0.0/12', '10.48.0.0/12'])
>>> cg
netaddr.CIDRGroup(['10.32.0.0/12', '10.48.0.0/12'])
>>> cg.reduce()
netaddr.CIDRGroup(['10.32.0.0/11'])

I imagine this would be an important feature to have when dealing with
set operations(except,union,intersection)...

Hrm, I think we would at least have to(want to?) protect against
overlapping blocks....

> One property I would like to see in such a class would be some kind of
> tree (or trie) based binary search to get away from strictly O(N)
> membership tests where you have potentially nestable CIDRs (routing
> tables spring to mind here).

Aye.

> C) Forget __add__ on CIDR altogether and go for a class that acts as a
> container implementing __contains__ and __iter__.
>
> >>> cg = CIDRGroup(['192.168.0.0/24'), CIDR('192.168.2.0/24'),
> CIDR('224.0.0.0/4')])
> >>> CIDR('192.168.0.128/25') in cg
> True
> >>> IP('239.0.0.1') in cg
> True
>
> An iterator would flatten and yield all IPs in the ranges. This might
> be quite useful but end users could probably just roll their own
> easily enough with whatever features they required

+1 on the class, but, IIUC, -1 on the default __iter__ flattening as
it would likely eliminate--at least make more difficult--the
possibility of natural re-creation without condition:

>>> cg = CIDRGroup(['192.168.0/24', '192.168.25/24'])
>>> cg2 = CIDRGroup(cg)

Sure, we could make an explicit condition checking for CIDRGroup in
__init__, but why if we don't have to? :)

This isn't saying that flattening out the IPs isn't useful, it's
saying that it would probably make a nice generator method:

class CIDRGroup():
...

def ips(self):
'yield all the IPs of all the CIDRs in the group'
for x in self:
for ip in x:
yield ip

Additionally:

def has_ip(self, addr):
'check if the IP is in any of the CIDRs in the group'
...

> D) Go for a full blown AddrRangeSet(set) object.
>
> I don't know about you, but this gives me nightmares!

Sounds interesting. :)

> A), B), C), D) or something we haven't yet considered?
>
> I guess I'm just having trouble justifying the effort without (very
> selfishly) having my own use case ;-)

Well, I do have a use case for detecting and conjoining adjacent
CIDRs, so I may try to take a stab at a patch real soon now.