python bijection

Joshua Bronson

unread,

Nov 19, 2009, 6:24:46 PM11/19/09

to

I couldn't find a library providing a bijective map data structure
(allowing for constant-time lookups by value) in the few minutes I
looked, so I took a few more minutes to code one up:
http://bitbucket.org/jab/toys/src/tip/bijection.py

Is this at all worth releasing? Comments and suggestions welcome.

Josh

Steven D'Aprano

unread,

Nov 19, 2009, 7:05:45 PM11/19/09

to

On Thu, 19 Nov 2009 15:24:46 -0800, Joshua Bronson wrote:

> I couldn't find a library providing a bijective map data structure
> (allowing for constant-time lookups by value) in the few minutes I
> looked, so I took a few more minutes to code one up:
> http://bitbucket.org/jab/toys/src/tip/bijection.py
>
> Is this at all worth releasing?

You just did :)

> Comments and suggestions welcome.

If I want a mapping a <-> b, I generally just create a dict {a:b, b:a}.
What is the advantages or disadvantages of your code over the simplicity
of the dict approach?

(That is, sell us on the features of your approach.)

--
Steven

Joshua Bronson

unread,

Nov 19, 2009, 7:39:37 PM11/19/09

to

On Nov 19, 7:05 pm, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> If I want a mapping a <-> b, I generally just create a dict {a:b, b:a}.
> What is the advantages or disadvantages of your code over the simplicity
> of the dict approach?

Well for one, you don't have to manually update the mapping from b ->
a if ever the mapping from a -> b changes. With your method you have
to write something like "d[a] = c; d[c] = a; del d[b]" instead of just
"d[a] = c", "del d[d.pop(a)]" instead of just "del d[a]", etc.

More significantly, your approach doesn't actually model a bijection
since there's no distinction between keys (the domain) and values (the
range). In other words, you lose information about which way is the
forward mapping and which is the inverse mapping. Worse, d.keys() and
d.values() would each give you the combination of your keys and
values, neither of which would be right, and d.items() would also
return twice as many elements as you expect with no way to distinguish
which side of the mapping a given pair comes from.

Carl Banks

unread,

Nov 19, 2009, 9:17:36 PM11/19/09

to

Apart from the GPL, it seems perfectly fine to release, and looks like
an interesting strategy. I've wanted one of those once in a while,
never enough to bother looking for one or writing one myself.

But you should absolutely not inherit from dict if you're overriding
all it's methods. It's useless and wasteful to do that, perhaps
dangerous. You end up using bytes for a small hash table that's never
used.

Plus Python 3 has a notion of Abstract Base Classes: it will allow
customization of isinstance to advertise that your class implements
MutableMapping, which is the right way to do it.

Carl Banks

Ben Finney

unread,

Nov 19, 2009, 9:36:19 PM11/19/09

to

Carl Banks <pavlove...@gmail.com> writes:

> On Nov 19, 3:24 pm, Joshua Bronson <jabron...@gmail.com> wrote:
> > I couldn't find a library providing a bijective map data structure
> > (allowing for constant-time lookups by value) in the few minutes I
> > looked, so I took a few more minutes to code one
> > up:http://bitbucket.org/jab/toys/src/tip/bijection.py
> >
> > Is this at all worth releasing? Comments and suggestions welcome.
>
> Apart from the GPL, it seems perfectly fine to release, and looks like
> an interesting strategy. I've wanted one of those once in a while,
> never enough to bother looking for one or writing one myself.

I would think GPL is an excellent choice for such a library then, if the
author's intention is to encourage more software to be free software so
that it can incorporate a unique library like this.

--
\ “The fact that I have no remedy for all the sorrows of the |
`\ world is no reason for my accepting yours. It simply supports |
_o__) the strong probability that yours is a fake.” —Henry L. Mencken |
Ben Finney

Joshua Bronson

unread,

Nov 20, 2009, 12:33:18 AM11/20/09

to Carl Banks

On Nov 19, 9:17 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> Apart from the GPL

what Ben said :)

> it seems perfectly fine to release, and looks like
> an interesting strategy. I've wanted one of those once in a while,
> never enough to bother looking for one or writing one myself.

glad to hear it! i'll release it to pypi if such feedback continues.

> But you should absolutely not inherit from dict if you're overriding
> all it's methods. It's useless and wasteful to do that, perhaps
> dangerous. You end up using bytes for a small hash table that's never
> used.
>
> Plus Python 3 has a notion of Abstract Base Classes: it will allow
> customization of isinstance to advertise that your class implements
> MutableMapping, which is the right way to do it.

Actually that's what I was originally thinking of doing but didn't go
through with it in my first pass out of concern that users might want
isinstance(bijection(), dict) to be True. Now that you bring it up, I
agree that it's the correct way to do it, and have reimplemented
bijection as a MutableMapping (ABCs are actually in Python 2.6). Take
a peek at the new and improved http://bitbucket.org/jab/toys/src/tip/bijection.py
if you get a chance and let me know how it looks!

Anyone have any other feedback? For instance, is offering the __call__
syntax for the inverse mapping wonderful or terrible, or maybe both?

Thanks,
Josh

Terry Reedy

unread,

Nov 20, 2009, 3:09:41 PM11/20/09

to pytho...@python.org

Joshua Bronson wrote:

> Anyone have any other feedback? For instance, is offering the __call__
> syntax for the inverse mapping wonderful or terrible, or maybe both?

Terrible ;-)

Use standard subscripting with slices, and only that, to both get and set.

Let m[4] == m[4:] == 'abc' # m[4:] is suggested alternative addition
Then m[:'abc'] == 4

m[4:] passes slice(4,None,None) to __getitem__
m[:'abc'] passes slice(None,'abc',None)

It just happens that dict items and slices use the same notation, but
they do, so take advantage of that. In fact, to emphasize the symmetry
of the bijective map, consider disallowing m[key] as ambiguous and
require m[key:], along with m[:key] to access and set.

Note that m[slice(1,2,3):] passes slice(slice(1, 2, 3), None, None), so
this approach does not even prevent using slices as keys/values.

In __setitem__, m[a:b] which passes slice(a,b,None) would have to be an
error. In __getitem__, it could either be a error or return True/False
depending on whether the pair is in the map. But this depends on whether
__contains__ only tests keys or is modified to test pairs.

Terry Jan Reedy

Joshua Bronson

unread,

Nov 20, 2009, 4:59:24 PM11/20/09

to Terry Reedy, pytho...@python.org

absolutely genius. implemented in the latest version:
http://bitbucket.org/jab/toys/src/tip/bijection.py

thank you for the terrific idea!

Joshua Bronson

unread,

Nov 20, 2009, 4:59:24 PM11/20/09

to pytho...@python.org, pytho...@python.org, Terry Reedy

On Nov 20, 3:09 pm, Terry Reedy <tjr...@udel.edu> wrote:

absolutely genius. implemented in the latest version:

Joshua Bronson

unread,

Nov 21, 2009, 2:34:15 AM11/21/09

to Terry Reedy, pytho...@python.org, Joshua Bronson

On Nov 20, 3:09 pm, Terry Reedy <tjr...@udel.edu> wrote:

> Use standard subscripting with slices, and only that, to both get and set.

i did this for __delitem__ too, so you can do e.g. del m[:'abc'].

> In fact, to emphasize the symmetry of the bijective map, consider
> disallowing m[key] as ambiguous and require m[key:]

my initial feeling is that i'd prefer to be able to say m[key], the
defense being that it's not ambiguous if it's documented, and users
who don't like it don't have to use it, while users who do like it
won't be alienated. but i am definitely still open to this.

> Note that m[slice(1,2,3):] passes slice(slice(1, 2, 3), None, None), so
> this approach does not even prevent using slices as keys/values.

except slices aren't hashable.</nitpick> but props for even thinking
of that!

> In __setitem__, m[a:b] which passes slice(a,b,None) would have to be an
> error. In __getitem__, it could either be a error or return True/False
> depending on whether the pair is in the map.

i went with raising an error for __getitem__(slice(a,b,None)).
returning True/False for this usage based on whether a -> b is in the
bijection is an interesting idea, but i feel like it complicates
things for no good reason: if that's what you wanted to know you'd
just ask whether m[a] == b.

> But this depends on whether __contains__ only tests keys or is modified to test pairs.

i have __contains__ only testing keys, and similarly [i for i in
bijection] only gives you the keys, again on the theory that deviating
too much from the dict api increases the learning (adoption) curve, so
i think we should only do it if it buys us a tremendous usability win.

thanks again for your insightful input, i'm pretty psyched about how
this is coming along!

any further feedback is always welcome.

josh

Joshua Bronson

unread,

Nov 21, 2009, 2:34:15 AM11/21/09

to pytho...@python.org, Joshua Bronson, pytho...@python.org, Terry Reedy

On Nov 20, 3:09 pm, Terry Reedy <tjr...@udel.edu> wrote:

> Use standard subscripting with slices, and only that, to both get and set.

i did this for __delitem__ too, so you can do e.g. del m[:'abc'].

> In fact, to emphasize the symmetry of the bijective map, consider

> disallowing m[key] as ambiguous and require m[key:]

my initial feeling is that i'd prefer to be able to say m[key], the

defense being that it's not ambiguous if it's documented, and users
who don't like it don't have to use it, while users who do like it
won't be alienated. but i am definitely still open to this.

> Note that m[slice(1,2,3):] passes slice(slice(1, 2, 3), None, None), so

> this approach does not even prevent using slices as keys/values.

except slices aren't hashable.</nitpick> but props for even thinking
of that!

> In __setitem__, m[a:b] which passes slice(a,b,None) would have to be an

> error. In __getitem__, it could either be a error or return True/False
> depending on whether the pair is in the map.

i went with raising an error for __getitem__(slice(a,b,None)).

returning True/False for this usage based on whether a -> b is in the
bijection is an interesting idea, but i feel like it complicates
things for no good reason: if that's what you wanted to know you'd
just ask whether m[a] == b.

> But this depends on whether __contains__ only tests keys or is modified to test pairs.

i have __contains__ only testing keys, and similarly [i for i in

Raymond Hettinger

unread,

Nov 21, 2009, 9:22:30 PM11/21/09

to

On Nov 19, 3:24 pm, Joshua Bronson <jabron...@gmail.com> wrote:

Hello Joshua,

I have a few design ideas and comments for you.

* The idea of using __call__ for looking-up inverse values was
inspired. That is useable, clean, and easy to remember; however, as
discussed below, there are issues though with its actual use in real
code.

* Am not excited by the inverse iterators. With just a regular
mapping you can write:

for a, b in m.items() ... # consider either a or b be the
key and the other to be the value

That meets all of the needs that would have been served by
iter_inverse_keys() or iter_inverse_values() or whatnot. The mirrored
API doesn't really provide much in the way of value added.

* After exercising the API on a couple of samples, I'm worried that
almost any inverse-method based API makes it difficult to review code
while keeping straight the intended meaning of the forward and inverse
relationships. Am thinking that it is simpler, faster, and clearer to
just use two dictionaries -- that approach lets the variable names
communicate the important info. For example, the following code helps
keep the coder and code reviewer from conflating the forward and
inverse directions:

myurl = ip2url[myip]
myip = url2ip[myurl]

Contrast that with:

myurl = site_bijection[myip]
myip = site_bijection(myurl)

With the latter, it is darned difficult to detect accidental
conflation of brackets with parentheses.

* So, I'm thinking that code needing a bijection would be better-off
with two ordinary dicts, perhaps augmented by a couple of convenience
functions:

biject_add(site_bijection, ip=myip, url=myurl) # Create a
new pairing, raise ValueError if either key
# maps to
more than one value (in violation of the
# bijection
invariant: one-to-one and onto)

biject_remove(ip=myip) # Clear an
entry from both dicts
or
biject_remove(url=myurl)

Alternatively, another possible approach is to used use the class
generation approach (such as that used by named_tuple()) to generate a
custom bijection class with either attribute based or keyworded
accessors:

Attribute based accessors:

site = Bijection('ip', 'url')
site.url[myip] = myurl

for ip, url in site.items() ...
print site.ip[myurl]
myurl = site.url.pop(myip)

Keyword accessors:

site = Bijection('ip', 'url')
site.set(ip=myip, url=myurl)
myurl = site.get(ip=myip)
myip = set.get(url=myurl)
myurl = site.pop(ip=myip)
site.del(ip=myip)
site.del(url=myurl)

Hope these ideas help. The ultimate success of the Bijection code
will depend on its clarity, simplicity, and speed. Experiment with
various approaches to find-out which looks the best in real code. It
cannot be error-prone or it is doomed. Also, it should not introduce
much overhead processing or else people will avoid it. The API should
be trivially simple so that people remember how to use it months after
seeing it for the first time.

Good luck and happy hunting,

Raymond

Joshua Bronson

unread,

Nov 24, 2009, 1:21:58 PM11/24/09

to Raymond Hettinger, Terry Reedy

Hey Raymond,

Thanks for your thoughtful reply! I think your idea for a class-
generation approach in the spirit of namedtuple is brilliant; looking
forward to coding this up and seeing how it feels to use it.

(By the way, it occurred to me that "bijection" is perhaps the wrong
term to use for this data structure; really it's just an injective
mapping, as it has no idea whether the function whose mappings it
contains is also surjective. (Unless we take the domain, codomain, and
range of the function being modeled to be exactly defined by the state
of the mapping at any given time. But it feels more correct to me to
interpret the mapping as a sampling of some underlying function, where
the sampling can change but the function stays the same.) So I'm
thinking of renaming the class injectivedict or idict instead of
bijection. Is that crazy?)

More responses inline:

On Nov 21, 9:22 pm, Raymond Hettinger <pyt...@rcn.com> wrote:
> * The idea of using __call__ for looking-up inverse values was
> inspired. That is useable, clean, and easy to remember; however, as
> discussed below, there are issues though with its actual use in real
> code.

Totally agree the call syntax has issues. Did you happen to see
Terry's suggestion to use slice syntax instead? Now *that* was
inspired. It's also much better because it works for setitem and
delitem too. I replaced the call syntax with the slice syntax on
Friday night -- would be interested to hear whether you think it's an
improvement.

> * Am not excited by the inverse iterators. With just a regular
> mapping you can write:
>
> for a, b in m.items() ... # consider either a or b be the
> key and the other to be the value
>
> That meets all of the needs that would have been served by
> iter_inverse_keys() or iter_inverse_values() or whatnot. The mirrored
> API doesn't really provide much in the way of value added.

Hm, the one value I see the latest version of ``inverted`` adding (may
not have been in the version you saw) is that you can pass it either a
mapping, an iterable, or any object implementing an __inverted__
method. So in one case it's just syntax sugar for writing [(v, k) for
(k, v) in d.items()], but in other cases it's providing some
abstraction.

> Hope these ideas help. The ultimate success of the Bijection code
> will depend on its clarity, simplicity, and speed. Experiment with
> various approaches to find-out which looks the best in real code. It
> cannot be error-prone or it is doomed. Also, it should not introduce
> much overhead processing or else people will avoid it. The API should
> be trivially simple so that people remember how to use it months after
> seeing it for the first time.

Thank you for the sage advice.

Best,
Josh

Gregory Ewing

unread,

Nov 24, 2009, 6:49:10 PM11/24/09

to

Joshua Bronson wrote:
> So I'm
> thinking of renaming the class injectivedict or idict instead of
> bijection. Is that crazy?)

I think you'd be better off calling it something more
down-to-earth such as bidict (bidirectional dictionary).
That way people without an advanced degree in mathematics
have a better shot at not being baffled by it.:-)

--
Greg

Joshua Bronson

unread,

Nov 24, 2009, 10:28:38 PM11/24/09

to Gregory Ewing

heh, duly noted:) bidict it is!

Joshua Bronson

unread,

Nov 26, 2009, 2:45:10 PM11/26/09

to

On Nov 24, 10:28 pm, Joshua Bronson <jabron...@gmail.com> wrote:
> bidict it is!

now available at http://bitbucket.org/jab/toys/src/tip/bidict.py

and now featuring new shiny namedbidict goodness!

as always, feedback welcome.

josh

Francis Carr

unread,

Nov 27, 2009, 1:12:36 PM11/27/09

to

I was really inspired by this discussion thread! :-)

After much tinkering, I think I have a simpler solution. Just make
the inverse mapping accessible via an attribute, -AND- bind the
inverse of -THAT- mapping back to the original. The result is a
python dict with NO NEW METHODS except this inverse-mapping
attribute. I have posted it on code.activestate.com as <a
href="http://code.activestate.com/recipes/576968/">Recipe 576968:
Flipdict -- python dict that also maintains a one-to-one inverse
mapping</a>

-- F. Carr

Gabriel Genellina

unread,

Nov 27, 2009, 9:36:13 PM11/27/09

to pytho...@python.org

En Fri, 27 Nov 2009 15:12:36 -0300, Francis Carr <coldt...@gmail.com>
escribi�:

Nice idea! Just a couple of comments:

Instead of:
self._flip = dict.__new__(self.__class__)
I'd write:
self._flip = self.__class__()
unless I'm missing something (but see the next point).

Also, although Python's GC is able to handle them, I prefer to avoid
circular references like those between x and x._flip. Making self._flip a
weak reference (and dereferencing it in the property) should be enough.

--
Gabriel Genellina

Joshua Bronson

unread,

Nov 28, 2009, 4:30:44 AM11/28/09

to Gabriel Genellina, Francis Carr, Joshua Bronson

On Nov 27, 9:36 pm, "Gabriel Genellina" <gags...@yahoo.com.ar>
wrote:

> En Fri, 27 Nov 2009 15:12:36 -0300, Francis Carr <coldt...@gmail.com>

> escribió:

>
> > I was really inspired by this discussion thread! :-)
>
> > After much tinkering, I think I have a simpler solution. Just make
> > the inverse mapping accessible via an attribute, -AND- bind the
> > inverse of -THAT- mapping back to the original. The result is a
> > python dict with NO NEW METHODS except this inverse-mapping
> > attribute. I have posted it on code.activestate.com as <a
> > href="http://code.activestate.com/recipes/576968/">Recipe 576968:
> > Flipdict -- python dict that also maintains a one-to-one inverse
> > mapping</a>
>
> Nice idea!

Indeed! Thanks for sharing! I liked this so much I added something
similar in http://bitbucket.org/jab/toys/src/tip/bidict.py (I made the
inverse available via a .inv property, as well as via the unary ~
operator (by analogy to bitwise inverse)). I also got rid of getinv,
popinv, et al. to keep the API leaner as you recommend. I've kept the
slice syntax though as well as namedbidect, so for now I guess I'm
allowing for many ways to skin this cat.

> Just a couple of comments:
>
> Instead of:
> self._flip = dict.__new__(self.__class__)
> I'd write:
> self._flip = self.__class__()
> unless I'm missing something (but see the next point).

How would this not cause infinite recursion?

> Also, although Python's GC is able to handle them, I prefer to avoid
> circular references like those between x and x._flip. Making self._flip a
> weak reference (and dereferencing it in the property) should be enough.

If both self._flip and self._flip._flip are weak references, no strong
references to the inverse mapping survive leaving the constructor
scope. Unless I'm missing something, only one of these can be a weak
reference, and then you'd have to do something like this in the
property to prevent "TypeError: FlipDict is not callable":

@property
def flip(self):
try:
# we're an inverse, self._flip is a weak reference
return self._flip()
except TypeError:
# we're a forward mapping, self._flip is a strong
reference
return self._flip

Patrick Maupin

unread,

Nov 28, 2009, 10:31:51 AM11/28/09

to

On Nov 19, 8:36 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:

> Carl Banks <pavlovevide...@gmail.com> writes:
> > On Nov 19, 3:24 pm, Joshua Bronson <jabron...@gmail.com> wrote:
> > Apart from the GPL, it seems perfectly fine to release, and looks like
> > an interesting strategy. I've wanted one of those once in a while,
> > never enough to bother looking for one or writing one myself.
>
> I would think GPL is an excellent choice for such a library then, if the
> author's intention is to encourage more software to be free software so
> that it can incorporate a unique library like this.

Well, yes and no.

This bidict class sounds nice and full-featured, especially after the
changes prompted by the fruitful discussion. I personally use inverse
mappings on a regular basis, but for the most part, my data doesn't
change all that dynamically (or performance doesn't really matter), so
when I need to go backwards I often do something like:

inverse_mapping = dict((y, x) for (x, y) in forward_mapping.iteritems
())

Having said that, if I ever actually *need* something more full-
featured to add to non-GPLed software, I'd just write it (and release
it under a permissive license). IMHO, GPLing something this simple is
really the tail trying to wag the dog.

Case in point: I was just using rst2pdf to combine some restructured
text and SVG images, using svglib. svglib had some bugs and didn't
work right on my PDFs. svglib is not developed publicly, and the
author is somewhat slow to accept patches. Since svglib is reasonably
small, if it had been released under a permissive license, or even the
LGPL, I probably would have just copied it into the rst2pdf repository
and fixed it. If it were any smaller, I would have rewritten it. I
don't own the rst2pdf package, and didn't really want a license
discussion about 1K of source lines dictating a license change on 15K
lines. As it is, I figure the svglib author will probably get around
to fixing the bug at some point anyway, and for now I can easily use
PDFs for my graphics input format, so I cleaned up and added to some
old PDF code I had lying around, and released it as the open source
pdfrw package, and now rst2pdf can use that to import PDFs as vector
images without rasterizing them -- a new capability. So in this case,
the GPL spurred open-source development, in exactly the same way that
proprietary licenses do...

I'm quite happy to *use* GPLed software (and greatly appreciate the
authors' efforts), and I'm even sometimes willing to debug or add
features and submit patches to GPLed software, and I might even have a
(business, not political) reason to release some software under the
GPL myself someday. But if I ever did release something under the GPL
for business reasons, it would be because I also had the right to also
offer a proprietary version. This would require that I owned _all_
the code in the package, so the implication is: I'm not going to use
your tiny little GPLed code in any major software I write for release,
whether my software is GPLed or not.

The LGPL is different. I view the LGPL as a statement of "if you ever
add related functionality to this or fix a bug in this in a shipping
product, I'd like to see the fix, please" and I could even see myself
releasing something with this license under the right circumstances.

Now if I were doing a web service, it would be a different story. I
would be quite happy to add your GPLed software into the mix, so if
that's a terrible thing, perhaps you should consider affero for your
future offerings :-)

Best regards,
Pat

Joshua Bronson

unread,

Dec 1, 2009, 1:31:34 PM12/1/09

to Francis Carr, Raymond Hettinger

I noticed the phonebook example in your ActiveState recipe and thought
you might consider changing it to something like husbands to wives,
since the names-to-phone-numbers relation is many-to-many. The built-
in htmlentifydefs module provides fodder for a real-world example: It
maintains name2codepoint and codepoint2name as two separate dicts.

Raymond, do you think there might be any future in including a built-
in bidict data structure in Python? At least there's one built-in
module that might benefit from it.

P.S. I moved bidict to its own repo at http://bitbucket.org/jab/bidict/
and released it to PyPI as http://pypi.python.org/pypi/bidict.

Raymond Hettinger

unread,

Dec 1, 2009, 2:11:39 PM12/1/09

to

[Joshua Bronson]

> Raymond, do you think there might be any future in including a built-
> in bidict data structure in Python?

I don't think so. There are several forces working against it:

* the recipe is new, so it hasn't had a chance to mature
or to gain a fan club.

* there are many approaches to the solving the problem and
there is no reason to assume this one is the best.

* it extends the language with arcane syntax tricks instead of
using the language as designed by Guido. That makes it harder
to learn and remember.

* we've already got one (actually two). The two dictionary approach
uses plain python, requires no new learning, and is more flexible.
Also, sqlite3 provides another way to use multiple lookups to a
single record. The database approach is much more general
(extending to trijections, allowing multiple sort orders,
providing persistence, etc).

* the semantics of a bijection aren't obvious:

b['x'] = 'ex' # first record: ('x', 'ex')
b['y'] = 'why' # second record: ('y', 'why')
b[:'why'] = 'x' # do two records collapse into one? is there
an error?

* the proposed syntax doesn't address the issue covered in my previous
post.
Since bijections are symmetrical, they do not have an obvious
direction
(which is the primary key, the husband or the wife?). The syntax
needs to
allow user names to make it clear which is being accessed:

marriages.h2w['john'] = 'amy'
marriages.w2h['amy'] = 'john'

Contrast this with:

marriages['jordan'] = 'taylor' # are you sure you got the
order correct?
marriages[:'taylor'] = 'jordan' # this is easy to get backwards

Raymond

Joshua Bronson

unread,

Dec 1, 2009, 4:19:22 PM12/1/09

to Raymond Hettinger

On Dec 1, 2:11 pm, Raymond Hettinger <pyt...@rcn.com> wrote:
> [Joshua Bronson]
>
> > Raymond, do you think there might be any future in including a built-
> > in bidict data structure in Python?
>
> I don't think so. There are several forces working against it:
>
> * the recipe is new, so it hasn't had a chance to mature
> or to gain a fan club.
>
> * there are many approaches to the solving the problem and
> there is no reason to assume this one is the best.
>
> * it extends the language with arcane syntax tricks instead of
> using the language as designed by Guido. That makes it harder
> to learn and remember.
>
> * we've already got one (actually two). The two dictionary approach
> uses plain python, requires no new learning, and is more flexible.
> Also, sqlite3 provides another way to use multiple lookups to a
> single record. The database approach is much more general
> (extending to trijections, allowing multiple sort orders,
> providing persistence, etc).

all good points.

> * the semantics of a bijection aren't obvious:
>
> b['x'] = 'ex' # first record: ('x', 'ex')
> b['y'] = 'why' # second record: ('y', 'why')
> b[:'why'] = 'x' # do two records collapse into one? is there
> an error?

In my implementation the two records collapse into one, on the theory
that if you say it you mean it, but you're right that the semantics
aren't obvious, especially since in sql this would be an error. Thank
you for pointing this out, it totally slipped my mind to document it!
(Noted now.) If my bidict package ever has >1 user, and they prefer
this to be an error, I'd totally change it.

> * the proposed syntax doesn't address the issue covered in my previous
> post.
> Since bijections are symmetrical, they do not have an obvious
> direction
> (which is the primary key, the husband or the wife?). The syntax
> needs to
> allow user names to make it clear which is being accessed:
>
> marriages.h2w['john'] = 'amy'
> marriages.w2h['amy'] = 'john'
>
> Contrast this with:
>
> marriages['jordan'] = 'taylor' # are you sure you got the
> order correct?
> marriages[:'taylor'] = 'jordan' # this is easy to get backwards

The "namedbidict" class factory I wrote on your recommendation allows
for the former. But it's still up to the user to choose names which
indicate the direction of the mapping, whether she uses namedbidict or
not:

marriages.husbands['john'] = 'amy' # namedbidict, direction
unclear
h2w['john'] = 'amy' # regular bidict, but not unclear b/c name is
'h2w'

(you can use

>>> marriages.inv is marriages.husbands
False

to tell that husbands is the forward mapping, but that sucks -- better
to have called it h2w or some such in the first place.)

Thanks for the thoughtful reply.

Josh

Aahz

unread,

Dec 1, 2009, 8:17:12 PM12/1/09

to

In article <85100df7-a8b0-47e9...@r31g2000vbi.googlegroups.com>,

Joshua Bronson <jabr...@gmail.com> wrote:
>
>I noticed the phonebook example in your ActiveState recipe and thought
>you might consider changing it to something like husbands to wives,
>since the names-to-phone-numbers relation is many-to-many.

What makes you think husbands to wives is one-to-one? ;-) (Even
assuming monogamy, you have husbands-to-husbands and wives-to-wives.)
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

The best way to get information on Usenet is not to ask a question, but
to post the wrong information.

Chris Rebert

unread,

Dec 1, 2009, 9:03:31 PM12/1/09

to Aahz, pytho...@python.org

On Tue, Dec 1, 2009 at 5:17 PM, Aahz <aa...@pythoncraft.com> wrote:
> In article <85100df7-a8b0-47e9...@r31g2000vbi.googlegroups.com>,
> Joshua Bronson <jabr...@gmail.com> wrote:
>>
>>I noticed the phonebook example in your ActiveState recipe and thought
>>you might consider changing it to something like husbands to wives,
>>since the names-to-phone-numbers relation is many-to-many.
>
> What makes you think husbands to wives is one-to-one? ;-) (Even
> assuming monogamy, you have husbands-to-husbands and wives-to-wives.)

Reminds me of this quite funny blog post:
"Gay marriage: the database engineering perspective"
http://qntm.org/?gay

Cheers,
Chris
--
http://blog.rebertia.com

Joshua Bronson

unread,

Dec 2, 2009, 6:21:33 PM12/2/09

to Aahz

On Dec 1, 8:17 pm, aa...@pythoncraft.com (Aahz) wrote:
> In article <85100df7-a8b0-47e9...@r31g2000vbi.googlegroups.com>,
> Joshua Bronson <jabr...@gmail.com> wrote:
> >I noticed the phonebook example in your ActiveState recipe and thought
> >you might consider changing it to something like husbands to wives,
> >since the names-to-phone-numbers relation is many-to-many.
>
> What makes you think husbands to wives is one-to-one? ;-) (Even
> assuming monogamy, you have husbands-to-husbands and wives-to-wives.)

Hah! I knew this was coming and even put "assuming monogamy" in the
source!
http://bitbucket.org/jab/bidict/src/712da6e2dd26/bidict.py#cl-65 ;P

As for husbands-to-husbands and wives-to-wives, those are just
separate one-to-one mappings! Doesn't mean husbands-to-wives ain't one-
to-one!

At any rate, apologies to the community for my heteronormative
example. It was merely pedagogical and reflects nothing about my
personal views! If you have any further concerns, please send them to
my lawyer, /dev/null.

Joshua Bronson

unread,

Dec 2, 2009, 6:52:11 PM12/2/09

to

On Dec 1, 9:03 pm, Chris Rebert <c...@rebertia.com> wrote:
> Reminds me of this quite funny blog post:
> "Gay marriage: the database engineering perspective"
> http://qntm.org/?gay

amazing

Aahz

unread,

Dec 2, 2009, 7:37:49 PM12/2/09

to

In article <9a6902a1-327e-435e...@u20g2000vbq.googlegroups.com>,

Apology accepted. ;-)

M.-A. Lemburg

unread,

Dec 3, 2009, 7:04:48 AM12/3/09

to Raymond Hettinger, pytho...@python.org

I think the only major CS data type missing from Python is some
form of (fast) directed graph implementation � la kjGraph:

http://gadfly.sourceforge.net/kjbuckets.html

With these, you can easily build all sorts of relations between
objects and apply fast operations on them. In fact, it should then
be possible to build a complete relational database in Python
(along the lines of Gadfly).

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Dec 03 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/

geremy condra

unread,

Dec 3, 2009, 11:54:27 AM12/3/09

to M.-A. Lemburg, pytho...@python.org, Raymond Hettinger

> I think the only major CS data type missing from Python is some

> form of (fast) directed graph implementation à la kjGraph:

>
> http://gadfly.sourceforge.net/kjbuckets.html
>
> With these, you can easily build all sorts of relations between
> objects and apply fast operations on them. In fact, it should then
> be possible to build a complete relational database in Python
> (along the lines of Gadfly).

If you're in the market for a Python graph library, you may want
to check out Graphine- I'm obviously biased (I wrote most of it)
but it has a few more bells and whistles than kjbuckets, and is
pretty darned easy to use. It also supports undirected and
bridge graphs.

Geremy Condra

M.-A. Lemburg

unread,

Dec 3, 2009, 12:57:18 PM12/3/09

to geremy condra, pytho...@python.org, Raymond Hettinger

geremy condra wrote:
> On Thu, Dec 3, 2009 at 7:04 AM, M.-A. Lemburg <m...@egenix.com> wrote:
>> I think the only major CS data type missing from Python is some

>> form of (fast) directed graph implementation � la kjGraph:

>>
>> http://gadfly.sourceforge.net/kjbuckets.html
>>
>> With these, you can easily build all sorts of relations between
>> objects and apply fast operations on them. In fact, it should then
>> be possible to build a complete relational database in Python
>> (along the lines of Gadfly).
>
> If you're in the market for a Python graph library, you may want
> to check out Graphine- I'm obviously biased (I wrote most of it)
> but it has a few more bells and whistles than kjbuckets, and is
> pretty darned easy to use. It also supports undirected and
> bridge graphs.

Thanks for the hint :-)

The lib looks nice and would probably serve as a good prototype
for writing a new built-in type for Python.

This would have to be written in C, though, and come under
a Python compatible license. With the built-in feature moratorium
currently in place, there's about 1.5-2 years time to get this
done; perhaps a good GSoC project for next year :-)

Francis Carr

unread,

Dec 3, 2009, 4:17:02 PM12/3/09

to

[In re R. Hettinger's critiques]

> * it extends the language with arcane syntax tricks...
I think most of these in the current version of J. Bronson's "bidict"
can be left unused, or removed altogether. In almost all cases, a
bidict should be accessed as an ordinary python dict.

> * we've already got one (actually two).

> The two dictionary approach...
Solutions such as bidict just automate the two-dict approach.

> ...sqlite3 provides another way...
In many many cases, using a dB (even a lightweight such as sqlite3) is
swatting the fly with a sledgehammer :-)

> Since bijections are symmetrical, they do not have an obvious
> direction (which is the primary key, the husband or the wife?).

I think this is easy to solve with a classmethod constructor that
produces a pair of "linked" dicts. For example,
husband2wife, wife2husband = bidict.makepair(('Fred', 'John'),
('Mary', 'Ruth'))
Obviously from the code this pair of bidicts are "linked", and the
direction of each mapping is obvious from its name. Metaprogramming
idioms like namedtuple are not required.

> * the semantics of a bijection aren't obvious:
> b['x'] = 'ex' # first record: ('x', 'ex')
> b['y'] = 'why' # second record: ('y', 'why')
> b[:'why'] = 'x' # do two records collapse into one?

> # is there an error?
Among the various critiques, I think this is the most significant.

When I was fiddling with my implementation, I was disturbed that the
code
bidict[newKey] = oldValue
should have the subtle side-effect
del bidict.inv[oldValue]

And there is a much stranger case. Suppose bidict[key1]=val1 and
bidict[key2]=val2. Then the code
bidict[key1] = val2
should have the extremely subtle side-effects
del bidict[key2] # because val2 has been re-mapped
del bidict.inv[val1] # because key1 has been re-mapped
Blech!

I think there must be something better. It would be interesting to
hear more opinions on the matter.

I propose raising ValueError when operating on one key would also
silently re-map or delete a different (key,value) pair. This would
disallow both of the above cases. To get the effect of the first
case, one would simply operate on the inverse mapping:
bidict.inv[oldValue] = newKey
This should not be confusing: it's exactly how a python dict would
operate, except the "linked" mapping is altered to match, which is the
bookkeeping we want to automate in the first place. To get the effect
of the second case, one would have to explicitly demand the side-
effects:
del bidict[key2]
del bidict.inv[val1]
bidict[key1] = val2
Also not confusing.

-- FC

geremy condra

unread,

Dec 3, 2009, 6:54:47 PM12/3/09

to M.-A. Lemburg, pytho...@python.org, Raymond Hettinger

On Thu, Dec 3, 2009 at 12:57 PM, M.-A. Lemburg <m...@egenix.com> wrote:
> geremy condra wrote:
>> On Thu, Dec 3, 2009 at 7:04 AM, M.-A. Lemburg <m...@egenix.com> wrote:
>>> I think the only major CS data type missing from Python is some

>>> form of (fast) directed graph implementation à la kjGraph:

>>>
>>> http://gadfly.sourceforge.net/kjbuckets.html
>>>
>>> With these, you can easily build all sorts of relations between
>>> objects and apply fast operations on them. In fact, it should then
>>> be possible to build a complete relational database in Python
>>> (along the lines of Gadfly).
>>
>> If you're in the market for a Python graph library, you may want
>> to check out Graphine- I'm obviously biased (I wrote most of it)
>> but it has a few more bells and whistles than kjbuckets, and is
>> pretty darned easy to use. It also supports undirected and
>> bridge graphs.
>
> Thanks for the hint :-)
>
> The lib looks nice and would probably serve as a good prototype
> for writing a new built-in type for Python.

I suspect that it would have a better chance at getting into
collections than becoming a builtin, but who knows. I'd just
like to have something like it in the standard library.

> This would have to be written in C, though,

That's currently in the works, along with database backing.
We'd welcome any help though... hint, hint...

> and come under a Python compatible license.

I'm willing to dual license under the Python license if
there were a substantial interest in doing so, and I'm
confident that the other authors and maintainers
would feel the same way. The question in my mind is
whether such an interest exists.

> With the built-in feature moratorium
> currently in place, there's about 1.5-2 years time to get this
> done; perhaps a good GSoC project for next year :-)

I'd love to have Graphine be a GSoC project, although
if the target were to get it into collections the
moratorium wouldn't change the timeline AFAICS.

Geremy Condra

M.-A. Lemburg

unread,

Dec 4, 2009, 6:46:13 AM12/4/09

to geremy condra, pytho...@python.org, Raymond Hettinger

geremy condra wrote:
> On Thu, Dec 3, 2009 at 12:57 PM, M.-A. Lemburg <m...@egenix.com> wrote:
>> geremy condra wrote:
>>> On Thu, Dec 3, 2009 at 7:04 AM, M.-A. Lemburg <m...@egenix.com> wrote:
>>>> I think the only major CS data type missing from Python is some

>>>> form of (fast) directed graph implementation � la kjGraph:

>>>>
>>>> http://gadfly.sourceforge.net/kjbuckets.html
>>>>
>>>> With these, you can easily build all sorts of relations between
>>>> objects and apply fast operations on them. In fact, it should then
>>>> be possible to build a complete relational database in Python
>>>> (along the lines of Gadfly).
>>>
>>> If you're in the market for a Python graph library, you may want
>>> to check out Graphine- I'm obviously biased (I wrote most of it)
>>> but it has a few more bells and whistles than kjbuckets, and is
>>> pretty darned easy to use. It also supports undirected and
>>> bridge graphs.
>>
>> Thanks for the hint :-)
>>
>> The lib looks nice and would probably serve as a good prototype
>> for writing a new built-in type for Python.
>
> I suspect that it would have a better chance at getting into
> collections than becoming a builtin, but who knows. I'd just
> like to have something like it in the standard library.

Integrating an easy-to-use graph library into the collections
module (and it's C companion) is good idea.

>> This would have to be written in C, though,
>
> That's currently in the works, along with database backing.
> We'd welcome any help though... hint, hint...
>
>> and come under a Python compatible license.
>
> I'm willing to dual license under the Python license if
> there were a substantial interest in doing so, and I'm
> confident that the other authors and maintainers
> would feel the same way.

Great !

> The question in my mind is whether such an interest exists.

Since Python is being used more and more in CS classes,
such an addition would complete the tool-set and make Python
even more attractive for undergrad CS courses.

Finding out how much interest exists in advance is always
a bit difficult with new data-structures. People have to
get a feeling of how they can be put to good use first,
so it's a chicken-and-egg problem.

We've seen the same thing happen with sets. They were first
made available via a separate module and then became built-ins
after people realized how useful they are in practice.

With graphs, it's probably going to take a little longer
before people realize their usefulness - graph theory is
certainly a lot more complicated than set theory :-)

>> With the built-in feature moratorium
>> currently in place, there's about 1.5-2 years time to get this
>> done; perhaps a good GSoC project for next year :-)
>
> I'd love to have Graphine be a GSoC project, although
> if the target were to get it into collections the
> moratorium wouldn't change the timeline AFAICS.

True.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Dec 04 2009)

MRAB

unread,

Dec 4, 2009, 11:17:20 AM12/4/09

to pytho...@python.org

I'd like to add that people (myself included) were already using dicts
for sets before the module was written, so there was already a clear
demand for them.

geremy condra

unread,

Dec 4, 2009, 2:53:39 PM12/4/09

to pytho...@python.org

On Fri, Dec 4, 2009 at 2:52 PM, geremy condra <deba...@gmail.com> wrote:

> On Fri, Dec 4, 2009 at 11:17 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
>> M.-A. Lemburg wrote:
>>>
>>> geremy condra wrote:
>>>>
>>>> On Thu, Dec 3, 2009 at 12:57 PM, M.-A. Lemburg <m...@egenix.com> wrote:
>>>>>
>>>>> geremy condra wrote:
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 7:04 AM, M.-A. Lemburg <m...@egenix.com> wrote:
>>>>>>>
>>>>>>> I think the only major CS data type missing from Python is some

>>>>>>> form of (fast) directed graph implementation à la kjGraph:

To be fair, I don't think you'd have to look very far to find places
where a graph representation is approximated using some
combination of dicts, sets, and lists. ElementTree comes to mind
immediately, and the dict-of-dicts idea for logging recently
discussed on python-dev explicitly states that it uses that
structure to represent object graphs. To me that says that there
is at least some demand.

Geremy Condra

Lie Ryan

unread,

Dec 4, 2009, 3:10:14 PM12/4/09

to

On 12/5/2009 6:53 AM, geremy condra wrote:
> To be fair, I don't think you'd have to look very far to find places
> where a graph representation is approximated using some
> combination of dicts, sets, and lists. ElementTree comes to mind
> immediately, and the dict-of-dicts idea for logging recently
> discussed on python-dev explicitly states that it uses that
> structure to represent object graphs. To me that says that there
> is at least some demand.

Though I've never used ElementTree extensively before, I thought it was
supposed to use a Tree structure (though it is a subset of graph, the
need for tree is much more common than full-blown graph package).

geremy condra

unread,

Dec 4, 2009, 3:46:37 PM12/4/09

to Lie Ryan, pytho...@python.org

Sure, its a tree, which is also a graph. In this case it looks to
me more like a directed acyclic graph than anything, but its
pretty much just semantics since the interface is functionally
equivalent.

Geremy Condra

Terry Reedy

unread,

Dec 4, 2009, 4:43:06 PM12/4/09

to pytho...@python.org

M.-A. Lemburg wrote:

> Integrating an easy-to-use graph library into the collections
> module (and it's C companion) is good idea.
>
>>> This would have to be written in C, though,
>> That's currently in the works, along with database backing.
>> We'd welcome any help though... hint, hint...

The current thinking among deveopers is that new modules should be
written and maintained in Python, which all implementations can use,
with speed-critical parts written in C for speed and imported by the
Python code.

You can write a PEP offering Graphine. To be accepted, there would need
to be some evidence that is the best Python graph module avaible and has
some community support.

I would probably use it more that 90% of the stdlib.

Terry Jan Reedy

Carl Banks

unread,

Dec 4, 2009, 5:41:16 PM12/4/09

to

On Dec 4, 12:46 pm, geremy condra <debat...@gmail.com> wrote:
more common than full-blown graph package).
> Sure, its a tree, which is also a graph. In this case it looks to
> me more like a directed acyclic graph than anything, but its
> pretty much just semantics since the interface is functionally
> equivalent.

I'd have to agree with Lie, yes a tree is a graph, but it's simply not
an argument that Python community is grasping for graph structures.
It's like arguing that the Python community could benefit from a
quaternion type, because quaternions are actually heavily used in
Python, because a scalar number is a quarternion.

Carl Banks

(Would be +1 on a good graph implementation... just not because of
ElementTree.)

Lie Ryan

unread,

Dec 4, 2009, 7:42:15 PM12/4/09

to

I think this could be an interpretation of the Zen:

Simple is better than complex.
Complex is better than complicated.

can be read as:
List is better than Tree
Tree is better than Graph

not having Tree and Graph package in the standard library force most
people to find List-based solution. And people that know they need
graphs will find them in 3rd party modules. I have needed Trees a few
times in python, but very rarely a Graph (except for playing around).
YMDWV (your mileage definitely will vary).

geremy condra

unread,

Dec 4, 2009, 8:26:17 PM12/4/09

to Carl Banks, pytho...@python.org

On Fri, Dec 4, 2009 at 5:41 PM, Carl Banks <pavlove...@gmail.com> wrote:
> On Dec 4, 12:46 pm, geremy condra <debat...@gmail.com> wrote:
> more common than full-blown graph package).
>> Sure, its a tree, which is also a graph. In this case it looks to
>> me more like a directed acyclic graph than anything, but its
>> pretty much just semantics since the interface is functionally
>> equivalent.
>
> I'd have to agree with Lie, yes a tree is a graph, but it's simply not
> an argument that Python community is grasping for graph structures.
> It's like arguing that the Python community could benefit from a
> quaternion type, because quaternions are actually heavily used in
> Python, because a scalar number is a quarternion.

Fair enough. I suspect that other examples could be provided
easily enough that I'm not going to fight over that one.

> Carl Banks
>
> (Would be +1 on a good graph implementation... just not because of
> ElementTree.)

I'd love it if you'd take a look at Graphine and see whether
it would meet the standard for a good graph implementation.

Geremy Condra

Carl Banks

unread,

Dec 4, 2009, 8:38:09 PM12/4/09

to

If you want a better example, consider various database schemas that
have one-to-one, one-to-many, many-to-one, and many-to-many
relationships. I daresay these are very common. All of these can be
represented by a non-directed graph.

Another common use of directed graphs is for dependency
relationships. In practice, a lot of times running things in order of
dependency is done by assigning everything a scalar priotity, and
executing in order of priority. This can work ok, but it's fragile.
If there's a graph type in Python maybe people will be encouraged to
handle dependencies explicitly.

Carl Banks

geremy condra

unread,

Dec 4, 2009, 8:38:33 PM12/4/09

to Lie Ryan, pytho...@python.org

Where a list will do, use a list- duh. But when you need a graph, you
shouldn't have to homebrew an implementation any more than you
should have to homebrew an odict or named tuple, both of which
are substantially easier to get right than a graph is.

Geremy Condra

Lie Ryan

unread,

Dec 4, 2009, 8:47:39 PM12/4/09

to

On 12/5/2009 12:38 PM, geremy condra wrote:
>
> Where a list will do, use a list- duh. But when you need a graph, you
> shouldn't have to homebrew an implementation any more than you
> should have to homebrew an odict or named tuple, both of which
> are substantially easier to get right than a graph is.

That's what I was trying to say, though I can't find a good way to. The
comparison with the Zen was to meant that if List is sufficient (if
Simple is sufficient) don't use Tree (don't design a Complex). I wasn't
implying to reduce all problem into a list at whatever costs, sorry if
my wording appears to imply so...

geremy condra

unread,

Dec 4, 2009, 10:45:14 PM12/4/09

to Carl Banks, pytho...@python.org

I actually considered using dependencies as an example on the
"graphine for pythonistas"[1] article, but decided to do the maze
run instead. In any event, the uses of graphs in general computing
are well enough established that I don't really think that's where
the majority of the difficulty in coming up with something for the
standard library will be- deciding how it should look and behave,
especially in terms of scope and target audience, that's going to
be the difficult part.

Geremy Condra

[1]: http://gitorious.org/graphine/pages/GraphineForPythonistas

Steven D'Aprano

unread,

Dec 5, 2009, 12:18:09 AM12/5/09

to

On Sat, 05 Dec 2009 11:42:15 +1100, Lie Ryan wrote:

> I think this could be an interpretation of the Zen:
>
> Simple is better than complex.
> Complex is better than complicated.
>
> can be read as:
> List is better than Tree

Because O(N) searches are better than O(log N) searches. Not.

How about "The right tool for the right job"?

> Tree is better than Graph
>
> not having Tree and Graph package in the standard library force most
> people to find List-based solution.

If you have to be *forced* to use a list-based solution, that's a good
sign that a list is *not* the right tool for the job.

--
Steven

Raymond Hettinger

unread,

Dec 5, 2009, 3:32:25 AM12/5/09

to

[Me]

> > * we've already got one (actually two).
> > The two dictionary approach...

[Francis Carr]

> Solutions such as bidict just automate the two-dict approach.

They do so at the expense of implementing a new API to support it and
at the expense with having non-obvious behaviors (i.e. how it handles
collapsing two records into one, which exceptions can be raised, how
the bijection invariants are maintained, which of the two symmetric
accessors is the primary, etc). This is not a cost-free choice of
simply "automating something".

IMO, "just automating" something that is already clear is not
necessarily a net win. Each new class has a learning curve and it is
sometimes simpler to use the basic dictionary API instead of inventing
a new one. I would *much* rather debug code written by someone using
two dictionaries than code using any of the APIs discussed so far --
all of those hide important design decisions behind a layer of
abstraction. The API must be easy to learn and remember (including
all of it quirks); otherwise, it is a net mental tax on the programmer
and code reviewers.

Also, I've presented examples of usability problems with APIs that do
not require the user to specify names for the two directions. It is
too easy to make mistakes that are hard to see. Which is correct,
phonelist['raymond'] or phonelist[raymonds_phonenumber]? There is no
way to tell without looking back at the bijection specification. The
API must be self-documenting. An API that is error-prone is worse
than having no bijection class at all.

Further, the API needs to be consistent with the rest of the language
(no abusing syntax with the likes of phonelist[:number]).

Unfortunately, Mark Lemburg has thrown gas on this fire, so the
conversation will likely continue for a while. If so, it would be
helpful if the conversation centered around real-world examples of
code that would be improved with a bijection class. In my experience,
there are examples where bijections arise in programs but it doesn't
happen often enough to warrant a new class for it. In cases where it
does arise, people should try-out the proposed APIs to see if in-fact
the code is made more clear or whether simple work is just being
hidden behind a layer of abstraction. For grins, insert an error into
the code (conflating the primary key with the secondary key) and see
if the error is self-evident or whether it is hidden by the new API.
Also, test the API for flexibility (how well can it adapt to
injections and surjections, can it handle use cases with default
values, is there a meaningful interpretation of dict.fromkeys() in a
bijection, can a user specify how to handle violations of the
bijection invariants by raising exceptions, supplying default values,
collapsing records, etc.)

Even if a proposed API passes those smell tests, demonstrates the
required flexibility, is easy to learn and use, is not error-prone,
and actually improves real-world use cases, it is my opinion that a
recipe for it will not garner a fan club and that it will have limited
uptake. ISTM that it is more fun to write classes like this than it
is to actually use them.

Raymond

P.S. It also makes sense to look at other mature languages to see
whether they ever find a need to include a bijection class in their
standard library or builtin collections. If Smalltalk, Java, Forth,
Go, Io, Haskell and C++ couldn't be sold on it, perhaps the buyer
should beware. If some language is found that did include a bijection
class, then do a Google code search to see how it fared in the real-
world. My bet is that it either wasn't used much or that it actually
make code worse by making errors harder to spot.

Raymond Hettinger

unread,

Dec 5, 2009, 4:14:55 AM12/5/09

to

> > ...sqlite3 provides another way...
>
> In many many cases, using a dB (even a lightweight such as sqlite3) is
> swatting the fly with a sledgehammer :-)

I'm sure it seems that way, but look at the generic description of the
problem: "I have a list of n-ary tuples with named fields and would
like to be able to retrieve a tuple using any of m-fields as a lookup
key (where 2 <= m <= n). Also, I would like to enforce the constraint
that those m-fields are all unique keys."

That pretty much sounds like a typical database problem featuring a
single table with multiple indexes. One can wish-away the complexity
of a database but you will end-up re-inventing an in-memory, one-table
version of database.

===============
phonelist
===============
name idnumber
---- --------
ray 1234
josh 5423
carl 8674
tery 5409
greg 3402
mark 2108

tasks
-----
list names
list idnumbers
find idnumber=2108
find name=greg
del name=tery
update idnumber=4321 where name=ray
list sorted by name
list sorted by idnumber reversed
is name=john in phonelist
Now, extend the table to make it a trijection (three unique keys),
perhaps a social security number.
Now, extend the table to add a field that isn't unique (not a key
field), perhaps a person's favorite language.
Oh wait, you can't do that with a bijection API.
Now, forget the in-memory part and make it persistent on disk.
Now, relate the entries to another dictionary of tuples (i.e. another
table with foreign-key).
Too bad a DB wasn't used to solve a DB problem.

Raymond

Lie Ryan

unread,

Dec 5, 2009, 7:06:15 AM12/5/09

to

On 12/5/2009 4:18 PM, Steven D'Aprano wrote:
>> Tree is better than Graph
>>
>> not having Tree and Graph package in the standard library force most
>> people to find List-based solution.
>
> If you have to be *forced* to use a list-based solution, that's a good
> sign that a list is *not* the right tool for the job.

Sorry for putting too much emphasis on the "forced", that wasn't my
intention. I was mentioning that often simple problems appeared more
complex because the person was thinking in a higher level data structure
than is necessary. "forced" in that sentence means to let people to
think a little bit harder with list/dict before deciding that they are
unsuitable and moving to a tree or a graph.

geremy condra

unread,

Dec 5, 2009, 1:37:19 PM12/5/09

to Lie Ryan, pytho...@python.org

In any event, I think we can agree that for some tasks (I would say
many) a graph or tree is the most suitable data structure. To me,
that says that we need to be having a discussion about whether to
include those tools in the standard library, and how best to do so if
that's the decision. Would you agree?

Geremy Condra

Raymond Hettinger

unread,

Dec 5, 2009, 4:39:11 PM12/5/09

to

[geremy condra]

> I actually considered using dependencies as an example on the
> "graphine for pythonistas"[1] article, but decided to do the maze
> run instead. In any event, the uses of graphs in general computing
> are well enough established that I don't really think that's where
> the majority of the difficulty in coming up with something for the
> standard library will be- deciding how it should look and behave,
> especially in terms of scope and target audience, that's going to
> be the difficult part.

Right you are :-)

> [1]:http://gitorious.org/graphine/pages/GraphineForPythonistas

Do you have many users?

Raymond

geremy condra

unread,

Dec 5, 2009, 6:22:25 PM12/5/09

to Raymond Hettinger, pytho...@python.org

I literally have no idea, but I suspect not, seeing as how its a
pure python3 project. I get emails about it often enough that
I suspect we have about a hundred or so, though, which I'm
not unhappy with for a project that focuses on so basic a
data structure. Having said that, we always welcome more-
and I'd love to have your input on how to make its api more
pythonic.

Geremy Condra

Raymond Hettinger

unread,

Dec 5, 2009, 7:18:58 PM12/5/09

to

On Dec 5, 3:22 pm, geremy condra <debat...@gmail.com> wrote:
> On Sat, Dec 5, 2009 at 4:39 PM, Raymond Hettinger <pyt...@rcn.com> wrote:
> > [geremy condra]
> >> I actually considered using dependencies as an example on the
> >> "graphine for pythonistas"[1] article, but decided to do the maze
> >> run instead. In any event, the uses of graphs in general computing
> >> are well enough established that I don't really think that's where
> >> the majority of the difficulty in coming up with something for the
> >> standard library will be- deciding how it should look and behave,
> >> especially in terms of scope and target audience, that's going to
> >> be the difficult part.
>
> > Right you are :-)
>
> >> [1]:http://gitorious.org/graphine/pages/GraphineForPythonistas
>
> > Do you have many users?
>
> I literally have no idea, but I suspect not, seeing as how its a
> pure python3 project.

Google's code search can provide a clue.
It is also helpful because it let you see
typically use cases and whether the API fits well
or whether it is awkward to express common use cases.
All of those observations will help you tweak the API.

Also, it's useful to make similar studies of competing
packages either written in Python or in some other langauge.

Raymond

geremy condra

unread,

Dec 5, 2009, 7:49:58 PM12/5/09

to Raymond Hettinger, pytho...@python.org

Unfortunately, judging from the queries I get most of
the people interested in the package are using it to
study graphs, rather than to write applications using
them. That's one of the reasons why I haven't pushed
the idea to date- graphine was developed to be useful
to other developers, but seems to be mainly used by
academics, while networkx seems to be targeted
primarily at academics, and is somewhat widely used.
I think the ideal situation would be to take projects
like graphine and python-graph and develop a
hybrid system specifically for the standard library
that borrows the strengths of each, but that would
involve a lot of developer time for people already
heavily invested in their own projects. Having
said that, if enough people from the python
community got behind it, I think it would happen.

> Also, it's useful to make similar studies of competing
> packages either written in Python or in some other langauge.

I couldn't agree more.

Geremy Condra

M.-A. Lemburg

unread,

Dec 7, 2009, 7:51:02 AM12/7/09

to Terry Reedy, pytho...@python.org

Terry Reedy wrote:
> M.-A. Lemburg wrote:
>
>> Integrating an easy-to-use graph library into the collections
>> module (and it's C companion) is good idea.
>>
>>>> This would have to be written in C, though,
>>> That's currently in the works, along with database backing.
>>> We'd welcome any help though... hint, hint...
>
> The current thinking among deveopers is that new modules should be
> written and maintained in Python, which all implementations can use,
> with speed-critical parts written in C for speed and imported by the
> Python code.

I don't think you are speaking for Python developers in general.

The usual approach is to develop a prototype in Python and then
rewrite it in C. Since the prototype is already there, what
remains is the rewrite to get better performance.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Dec 07 2009)

M.-A. Lemburg

unread,

Dec 7, 2009, 8:07:17 AM12/7/09

to pytho...@python.org

Trees are only a subset of general graphs and graphs often provide
a more intuitive approach to problem representation than trying
to squash all information into a tree or list.

For certain problems, like e.g. dependency checking or routing,
you can easily run into cycles which cannot be represented by
trees (*).

Furthermore, the property of being cycle-free (acyclic)
is often one of the things you want to find out when dealing
with graph data sets.

(*) Note that Python does allow creating lists with cyclic references,
but such usage is not really encouraged and will lead to delays in
garbage collection:

>>> l = [1,2,3]
>>> l[2] = l
>>> l
[1, 2, [...]]

geremy condra

unread,

Dec 7, 2009, 11:46:04 AM12/7/09

to M.-A. Lemburg, pytho...@python.org, Terry Reedy

On Mon, Dec 7, 2009 at 7:51 AM, M.-A. Lemburg <m...@egenix.com> wrote:
> Terry Reedy wrote:
>> M.-A. Lemburg wrote:
>>
>>> Integrating an easy-to-use graph library into the collections
>>> module (and it's C companion) is good idea.
>>>
>>>>> This would have to be written in C, though,
>>>> That's currently in the works, along with database backing.
>>>> We'd welcome any help though... hint, hint...
>>
>> The current thinking among deveopers is that new modules should be
>> written and maintained in Python, which all implementations can use,
>> with speed-critical parts written in C for speed and imported by the
>> Python code.
>
> I don't think you are speaking for Python developers in general.

I believe he's referring to the core developers.

Geremy Condra

M.-A. Lemburg

unread,

Dec 7, 2009, 12:05:00 PM12/7/09

to geremy condra, pytho...@python.org, Terry Reedy

I was as well, being one of them :-)

geremy condra

unread,

Dec 7, 2009, 12:14:05 PM12/7/09

to M.-A. Lemburg, pytho...@python.org, Terry Reedy

On Mon, Dec 7, 2009 at 12:05 PM, M.-A. Lemburg <m...@egenix.com> wrote:
> geremy condra wrote:
>> On Mon, Dec 7, 2009 at 7:51 AM, M.-A. Lemburg <m...@egenix.com> wrote:
>>> Terry Reedy wrote:
>>>> M.-A. Lemburg wrote:
>>>>
>>>>> Integrating an easy-to-use graph library into the collections
>>>>> module (and it's C companion) is good idea.
>>>>>
>>>>>>> This would have to be written in C, though,
>>>>>> That's currently in the works, along with database backing.
>>>>>> We'd welcome any help though... hint, hint...
>>>>
>>>> The current thinking among deveopers is that new modules should be
>>>> written and maintained in Python, which all implementations can use,
>>>> with speed-critical parts written in C for speed and imported by the
>>>> Python code.
>>>
>>> I don't think you are speaking for Python developers in general.
>>
>> I believe he's referring to the core developers.
>
> I was as well, being one of them :-)

Misunderstanding: 2, Me: 0... moving on ;)

How interested are you in a C port of graphine? I haven't had
any specific requests for it, but if its something you need I
can shuffle it towards the top of the to do pile.

Geremy Condra

M.-A. Lemburg

unread,

Dec 7, 2009, 2:51:15 PM12/7/09

to geremy condra, pytho...@python.org

geremy condra wrote:
> How interested are you in a C port of graphine? I haven't had
> any specific requests for it, but if its something you need I
> can shuffle it towards the top of the to do pile.

There are two main reasons for a C implementation:

1. performance

2. memory footprint

These are important when using graphs with lots of nodes or
when using lots of graphs or operations on graphs in tight
loops.

However, to get such a new type into the core, you need to start
a PEP process and hash out the API some more, esp. with respect
to integration with the other core types, such as dictionaries
and lists of tuples for both creation and as alternative
representation.

Some other comments (in no particular order):

* the "name" default of using id(node) is not ideal, since
id(node) will return the address of the object and that
can be subject to reuse within the lifetime of the process;
using a global (thread-safe) counter would be safer

* Graph.__init__ should be able to take a list or set
of nodes and edges as initializer

* Graph.__setitem__ could be mapped to .add_node() for
convenience

* Graph.__length__ could be mapped to .size() for
convenience

* Graph.__iter__ could be mapped to an iterator using
the fastest traversal method for the graph nodes
(ie. order does not matter, it's only important that
all nodes are found as fast as possible)

* Graph could also benefit from some bulk update methods,
e.g. to update weights on all edges or nodes by passing
in a dictionary mapping names to attribute values

* Graph could benefit from some bulk query methods,
such as .get_node_attributes() and .add_edges().

* Node.__getitem__ could be mapped to .data.__getitem__
(following the approach taken by ElementTree); same
for Edge.__getitem__

* Node.__setitem__ could be mapped to .data.__setitem__
(following the approach taken by ElementTree); same
for Edge.__setitem__

* I'd add a DirectedEdge alias for Edge and a new
UndirectedEdge as subclass which has is_directed=False;
makes code more readable

geremy condra

unread,

Dec 7, 2009, 5:23:24 PM12/7/09

to M.-A. Lemburg, pytho...@python.org

On Mon, Dec 7, 2009 at 2:51 PM, M.-A. Lemburg <m...@egenix.com> wrote:
> geremy condra wrote:
>> How interested are you in a C port of graphine? I haven't had
>> any specific requests for it, but if its something you need I
>> can shuffle it towards the top of the to do pile.
>
> There are two main reasons for a C implementation:
>
> 1. performance
>
> 2. memory footprint
>
> These are important when using graphs with lots of nodes or
> when using lots of graphs or operations on graphs in tight
> loops.
>
> However, to get such a new type into the core, you need to start
> a PEP process and hash out the API some more, esp. with respect
> to integration with the other core types, such as dictionaries
> and lists of tuples for both creation and as alternative
> representation.

I'm happy to start the PEP process, but will need some
guidance, as I've never written a PEP before.

> Some other comments (in no particular order):
>
> * the "name" default of using id(node) is not ideal, since
> id(node) will return the address of the object and that
> can be subject to reuse within the lifetime of the process;
> using a global (thread-safe) counter would be safer

Good idea. I'm going to start a branch on the repo to
handle the changes you mention.

> * Graph.__init__ should be able to take a list or set
> of nodes and edges as initializer

The format of this will need to be thought all the way
through before being implemented. To date, we haven't
come up with anything completely satisfactory, but
AFAIK everybody involved is still open to suggestions
on this.

> * Graph.__setitem__ could be mapped to .add_node() for
> convenience

This may be a question of playing around with it. ATM I'm
not sold on the idea, but I'll implement it and see how it
turns out in practice.

> * Graph.__length__ could be mapped to .size() for
> convenience

We decided not to do this due to the ambiguity between
whether .size() or .order() was the intended operation,
and looking back I'm not sure that was entirely unjustified.
Do you see there being any confusion on that score?

> * Graph.__iter__ could be mapped to an iterator using
> the fastest traversal method for the graph nodes
> (ie. order does not matter, it's only important that
> all nodes are found as fast as possible)

Again, it seems ambiguous as to whether nodes or
edges are the intended target here, and while the
API can obviously dictate that, it seems a bit like
a case of "in the face of ambiguity, refuse the
temptation to guess" to me.

> * Graph could also benefit from some bulk update methods,
> e.g. to update weights on all edges or nodes by passing
> in a dictionary mapping names to attribute values

Sounds good. Again, the format will need some careful
thought, but you're right that this would improve it
greatly.

> * Graph could benefit from some bulk query methods,
> such as .get_node_attributes() and .add_edges().

I'm not sure how the first is different from the existing
.data, what did you have in mind?

> * Node.__getitem__ could be mapped to .data.__getitem__
> (following the approach taken by ElementTree); same
> for Edge.__getitem__

> * Node.__setitem__ could be mapped to .data.__setitem__
> (following the approach taken by ElementTree); same
> for Edge.__setitem__

.data is just a property that returns a dictionary of non
private members right now, so if you're wanting to just say
node.this = 'stuff', you can already do that. Or am I
misreading your intent?

> * I'd add a DirectedEdge alias for Edge and a new
> UndirectedEdge as subclass which has is_directed=False;
> makes code more readable

Sounds good, and thanks for the advice!

Geremy Condra

Steven D'Aprano

unread,

Dec 7, 2009, 5:48:22 PM12/7/09

to

On Mon, 07 Dec 2009 17:23:24 -0500, geremy condra wrote:

>> * Graph.__iter__ could be mapped to an iterator using
>> the fastest traversal method for the graph nodes (ie. order does not
>> matter, it's only important that all nodes are found as fast as
>> possible)
>
> Again, it seems ambiguous as to whether nodes or edges are the intended
> target here, and while the API can obviously dictate that, it seems a
> bit like a case of "in the face of ambiguity, refuse the temptation to
> guess" to me.

Consider dicts. They have methods that iterates over keys, values, and
(key, value) pairs, but if you just iterate over the dict itself, you get
the keys. This was a deliberate design decision.

It's not a matter of guessing in the face of ambiguity, but giving an
easy way to get the most common case. Assuming that iteration over nodes
is more common than iteration over edges, then iter(graph) should yield
nodes in some unspecified order. If you need the nodes in a particular
order, then you will call some method that guarantees the order, and if
you want the edges, you call a method that gives edges.

If you're not sure whether wanting nodes or edges will be more common,
then you should wait until you, or the community, has more real-world
experience with the module. Python had dicts for something approaching a
decade before they became directly iterable.

--
Steven

geremy condra

unread,

Dec 7, 2009, 6:16:58 PM12/7/09

to Steven D'Aprano, pytho...@python.org

I don't have a problem with adding this if there's a strong desire for it,
but at the moment I'm leaning towards a wait-and-see approach, for
all the reasons you described.

Geremy Condra

M.-A. Lemburg

unread,

Dec 7, 2009, 6:28:49 PM12/7/09

to geremy condra, pytho...@python.org

geremy condra wrote:
> On Mon, Dec 7, 2009 at 2:51 PM, M.-A. Lemburg <m...@egenix.com> wrote:
>> geremy condra wrote:
>>> How interested are you in a C port of graphine? I haven't had
>>> any specific requests for it, but if its something you need I
>>> can shuffle it towards the top of the to do pile.
>>
>> There are two main reasons for a C implementation:
>>
>> 1. performance
>>
>> 2. memory footprint
>>
>> These are important when using graphs with lots of nodes or
>> when using lots of graphs or operations on graphs in tight
>> loops.
>>
>> However, to get such a new type into the core, you need to start
>> a PEP process and hash out the API some more, esp. with respect
>> to integration with the other core types, such as dictionaries
>> and lists of tuples for both creation and as alternative
>> representation.
>
> I'm happy to start the PEP process, but will need some
> guidance, as I've never written a PEP before.

Some pointers to get you started:

PEPs in general:
http://www.python.org/dev/peps/pep-0001/
http://www.python.org/dev/peps/pep-0009/

Adding modules to the standard lib:
http://www.python.org/dev/peps/pep-0002/

(though, in reality, you'd probably only be patching the
collections.py module, I guess)

>> Some other comments (in no particular order):
>>
>> * the "name" default of using id(node) is not ideal, since
>> id(node) will return the address of the object and that
>> can be subject to reuse within the lifetime of the process;
>> using a global (thread-safe) counter would be safer
>
> Good idea. I'm going to start a branch on the repo to
> handle the changes you mention.
>
>> * Graph.__init__ should be able to take a list or set
>> of nodes and edges as initializer
>
> The format of this will need to be thought all the way
> through before being implemented. To date, we haven't
> come up with anything completely satisfactory, but
> AFAIK everybody involved is still open to suggestions
> on this.

I wasn't thinking of anything clever :-) ...

g = Graph(
[Node("a"), Node("b"), Node("c")],
[Edge(Node("a"), Node("b"), "ab"),
Edge(Node("a"), Node("c"), "ac"),
Edge(Node("b"), Node("c"), "bc"),
])

The main motivation here is to get lists, sets and dicts
play nice together.

>> * Graph.__setitem__ could be mapped to .add_node() for
>> convenience
>
> This may be a question of playing around with it. ATM I'm
> not sold on the idea, but I'll implement it and see how it
> turns out in practice.

Thinking about it some more, I agree, it's not all that useful.

>> * Graph.__length__ could be mapped to .size() for
>> convenience
>
> We decided not to do this due to the ambiguity between
> whether .size() or .order() was the intended operation,
> and looking back I'm not sure that was entirely unjustified.
> Do you see there being any confusion on that score?

There is an ambiguity here, indeed. My thinking was that
the edges define the graph and can be mapped to a dictionary,
e.g.

d = {"ab": ("a", "b"),
"ac": ("a", "c"),
"bc": ("b", "c")}

len(d) == 3
len(g) == 3

A graph without edges is also what you typically call an empty
graph, ie.

if not g:
print 'empty'

http://mathworld.wolfram.com/EmptyGraph.html

Still, perhaps it's better to just not go down this route.

>> * Graph.__iter__ could be mapped to an iterator using
>> the fastest traversal method for the graph nodes
>> (ie. order does not matter, it's only important that
>> all nodes are found as fast as possible)
>
> Again, it seems ambiguous as to whether nodes or
> edges are the intended target here, and while the
> API can obviously dictate that, it seems a bit like
> a case of "in the face of ambiguity, refuse the
> temptation to guess" to me.

Right, but sometimes "practicalty beats purity" ;-) We had
the same situation for dictionaries and then decided that
iteration over keys would be more natural than iterating over
items (key, value) or values.

It's also important to note that:

for n in g: print n

and

n in g

match up in terms of semantics.

Since n in g already uses the "node in graph" approach,
the for-loop should use the same logic.

>> * Graph could also benefit from some bulk update methods,
>> e.g. to update weights on all edges or nodes by passing
>> in a dictionary mapping names to attribute values
>
> Sounds good. Again, the format will need some careful
> thought, but you're right that this would improve it
> greatly.

This is only an optimization, but could lead to some great
performance improvements by avoiding function call overhead.

>> * Graph could benefit from some bulk query methods,
>> such as .get_node_attributes() and .add_edges().
>
> I'm not sure how the first is different from the existing
> .data, what did you have in mind?

The second one was a typo. It should have been
.get_edge_attributes().

In both cases I was thinking of being able to quickly
extract a number of node/edge attributes, e.g.

>>> g.get_edge_attributes('weight', 'color')
{"ab": {"weight": 3, "color": "blue"},
"ac": {"weight": 2, "color": "green"},
"bc": {"weight": 1, "color": "red"}}

Again, the idea is to reduce call overhead and later
on be able to move these lookups to C.

>> * Node.__getitem__ could be mapped to .data.__getitem__
>> (following the approach taken by ElementTree); same
>> for Edge.__getitem__
>
>> * Node.__setitem__ could be mapped to .data.__setitem__
>> (following the approach taken by ElementTree); same
>> for Edge.__setitem__
>
> .data is just a property that returns a dictionary of non
> private members right now, so if you're wanting to just say
> node.this = 'stuff', you can already do that. Or am I
> misreading your intent?

Right, but with attributes you are restricted to Python
identifiers, e.g. keywords (e.g. "from", "pass") are not allowed.
The above approach bypasses this restriction.

>> * I'd add a DirectedEdge alias for Edge and a new
>> UndirectedEdge as subclass which has is_directed=False;
>> makes code more readable
>
> Sounds good, and thanks for the advice!

Thanks,

geremy condra

unread,

Dec 7, 2009, 11:28:05 PM12/7/09

to M.-A. Lemburg, pytho...@python.org

Thanks, I'll go over these a little later.

>>> Some other comments (in no particular order):
>>>
>>> * the "name" default of using id(node) is not ideal, since
>>> id(node) will return the address of the object and that
>>> can be subject to reuse within the lifetime of the process;
>>> using a global (thread-safe) counter would be safer
>>
>> Good idea. I'm going to start a branch on the repo to
>> handle the changes you mention.
>>
>>> * Graph.__init__ should be able to take a list or set
>>> of nodes and edges as initializer
>>
>> The format of this will need to be thought all the way
>> through before being implemented. To date, we haven't
>> come up with anything completely satisfactory, but
>> AFAIK everybody involved is still open to suggestions
>> on this.
>
> I wasn't thinking of anything clever :-) ...
>
> g = Graph(
> [Node("a"), Node("b"), Node("c")],
> [Edge(Node("a"), Node("b"), "ab"),
> Edge(Node("a"), Node("c"), "ac"),
> Edge(Node("b"), Node("c"), "bc"),
> ])
>
> The main motivation here is to get lists, sets and dicts
> play nice together.

Generally, we've tried to discourage people from instantiating
nodes and edges directly, in favor of having them controlled
through the graph. Maybe something along the lines of:

g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b', 'c')])

?

I'd rather avoid it if possible, since there are many
potential interpretations of this in different contexts.
Again, I'm open to to the idea, but not convinced.

>>> * Graph.__iter__ could be mapped to an iterator using
>>> the fastest traversal method for the graph nodes
>>> (ie. order does not matter, it's only important that
>>> all nodes are found as fast as possible)
>>
>> Again, it seems ambiguous as to whether nodes or
>> edges are the intended target here, and while the
>> API can obviously dictate that, it seems a bit like
>> a case of "in the face of ambiguity, refuse the
>> temptation to guess" to me.
>
> Right, but sometimes "practicalty beats purity" ;-) We had
> the same situation for dictionaries and then decided that
> iteration over keys would be more natural than iterating over
> items (key, value) or values.
>
> It's also important to note that:
>
> for n in g: print n
>
> and
>
> n in g
>
> match up in terms of semantics.
>
> Since n in g already uses the "node in graph" approach,
> the for-loop should use the same logic.

Beware, that doesn't just match nodes.

g = Graph()
g.add_node('a')
g.add_node('b')
g.add_edge('a', 'b', 'ab')
'ab' in g # returns true

>>> * Graph could also benefit from some bulk update methods,
>>> e.g. to update weights on all edges or nodes by passing
>>> in a dictionary mapping names to attribute values
>>
>> Sounds good. Again, the format will need some careful
>> thought, but you're right that this would improve it
>> greatly.
>
> This is only an optimization, but could lead to some great
> performance improvements by avoiding function call overhead.

Agreed.

>>> * Graph could benefit from some bulk query methods,
>>> such as .get_node_attributes() and .add_edges().
>>
>> I'm not sure how the first is different from the existing
>> .data, what did you have in mind?
>
> The second one was a typo. It should have been
> .get_edge_attributes().
>
> In both cases I was thinking of being able to quickly
> extract a number of node/edge attributes, e.g.
>
>>>> g.get_edge_attributes('weight', 'color')
> {"ab": {"weight": 3, "color": "blue"},
> "ac": {"weight": 2, "color": "green"},
> "bc": {"weight": 1, "color": "red"}}
>
> Again, the idea is to reduce call overhead and later
> on be able to move these lookups to C.

Entirely doable, but I'm not sure I see the use case.
Would you mind providing a more realistic example?

>>> * Node.__getitem__ could be mapped to .data.__getitem__
>>> (following the approach taken by ElementTree); same
>>> for Edge.__getitem__
>>
>>> * Node.__setitem__ could be mapped to .data.__setitem__
>>> (following the approach taken by ElementTree); same
>>> for Edge.__setitem__
>>
>> .data is just a property that returns a dictionary of non
>> private members right now, so if you're wanting to just say
>> node.this = 'stuff', you can already do that. Or am I
>> misreading your intent?
>
> Right, but with attributes you are restricted to Python
> identifiers, e.g. keywords (e.g. "from", "pass") are not allowed.
> The above approach bypasses this restriction.

Hmm. Sounds like a plausible use case to me, although I'm
not sure its one that should be encouraged. The bigger
question in my mind is whether all attribute lookups should
have to pay the extra lookup cost to support a somewhat
narrow need. I'll definitely have to talk with the other devs
about this one, and run a little bit of timing besides.

Geremy Condra

geremy condra

unread,

Dec 8, 2009, 3:06:29 AM12/8/09

to M.-A. Lemburg, pytho...@python.org

Just as a note, I've made some of the changes we've been
talking about in a new branch over at graphine.org.

Geremy Condra

Robin Becker

unread,

Dec 8, 2009, 7:27:55 AM12/8/09

to pytho...@python.org

geremy condra wrote:
...........

>
> I don't have a problem with adding this if there's a strong desire for it,
> but at the moment I'm leaning towards a wait-and-see approach, for
> all the reasons you described.
>
> Geremy Condra

I don't want to sound pessimistic, but graph and digraph theory has a lot of
history, especially in computer science. There are already very many
implementations eg

http://code.google.com/p/igraph
http://www.boost.org/doc/libs/release/libs/graph
http://ernst-schroeder.uni.lu/Digraph/doc/
http://code.google.com/p/python-graph
http://compbio.washington.edu/~zach/py_graph/doc/html/public/py_graph-module.html

and many others......some of the above already seem to be very useful.

Is there reason to suppose that any one representation of graphs or digraphs is
so good we need to add it to python?

Even for fairly common algorithms eg Dijkstra's shortest path there doesn't seem
to be complete agreement on how to implement them; for the details of how
nodes/edges/paths should be stored and efficiently manipulated there is huge
variety.

Wait seems like a good policy.
--
Robin Becker

Steven D'Aprano

unread,

Dec 8, 2009, 9:47:44 AM12/8/09

to

On Tue, 08 Dec 2009 03:06:29 -0500, geremy condra wrote:

[snip 215 lines of quoted-quoted-quoted-quoted-quoted text]

In the future, would you mind trimming the unneeded quoting from your
post? There's no need to duplicate the *entire* conversation in *every*
post, and it is awfully AOL-like of you to have 215 lines of text, with
up to five levels of quotation, followed by two lines of new content.

Thank you.

--
Steven

geremy condra

unread,

Dec 8, 2009, 1:35:26 PM12/8/09

to Robin Becker, pytho...@python.org

On Tue, Dec 8, 2009 at 7:27 AM, Robin Becker <ro...@reportlab.com> wrote:
> geremy condra wrote:
> ...........
>>
>> I don't have a problem with adding this if there's a strong desire for it,
>> but at the moment I'm leaning towards a wait-and-see approach, for
>> all the reasons you described.
>>
>> Geremy Condra
>
> I don't want to sound pessimistic, but graph and digraph theory has a lot of
> history, especially in computer science.

Of course it does- the rich set of problem-solving tools provided
by theoretical computer science and mathematical graph theory
are what make graphs so useful, which is why I'm advocating
that they be in the standard library.

>There are already very many
> implementations eg
>
> http://code.google.com/p/igraph
> http://www.boost.org/doc/libs/release/libs/graph
> http://ernst-schroeder.uni.lu/Digraph/doc/
> http://code.google.com/p/python-graph
> http://compbio.washington.edu/~zach/py_graph/doc/html/public/py_graph-module.html
>
> and many others......some of the above already seem to be very useful.

I suspect that part of the reason there are so many implementations
is because graphs are a) useful and b) not in the standard library.

> Is there reason to suppose that any one representation of graphs or digraphs
> is so good we need to add it to python?

I think there are several implementations that would meet the
standards for code quality, and both graphine and python-graph
are full-featured, easy to use, pure python graph libraries. Any
one of them would be a credit to the standard library.

> Even for fairly common algorithms eg Dijkstra's shortest path there doesn't
> seem to be complete agreement on how to implement them; for the details of
> how nodes/edges/paths should be stored and efficiently manipulated there is
> huge variety.

Of course there are. Different authors envision different use
cases, and select their data structures and algorithms
appropriately. That doesn't imply that none of them are
appropriate for the common case, which is what we would
be targeting here.

Geremy Condra

Carl Banks

unread,

Dec 8, 2009, 4:58:49 PM12/8/09

to

On Dec 8, 4:27 am, Robin Becker <ro...@reportlab.com> wrote:
> Is there reason to suppose that any one representation of graphs or digraphs is
> so good we need to add it to python?

One of them bothered to write a PEP proposing its inclusion?

> Even for fairly common algorithms eg Dijkstra's shortest path there doesn't seem
> to be complete agreement on how to implement them; for the details of how
> nodes/edges/paths should be stored and efficiently manipulated there is huge
> variety.
>
> Wait seems like a good policy.

Geremy's team seems to balance open-mindedness with not being a
pushover quite well; given this, and that they took initiative, I am
satisfied that they will do a good job designing it for the general
case.

Also, "Now is better than never."

(And before anyone gives the obvious retort, please consider if you
really think it's happening "right now".)

Carl Banks

Rhodri James

unread,

Dec 8, 2009, 8:42:53 PM12/8/09

to pytho...@python.org

On Tue, 08 Dec 2009 04:28:05 -0000, geremy condra <deba...@gmail.com>
wrote:

> On Mon, Dec 7, 2009 at 6:28 PM, M.-A. Lemburg <m...@egenix.com> wrote:

>> I wasn't thinking of anything clever :-) ...
>>
>> g = Graph(
>> [Node("a"), Node("b"), Node("c")],
>> [Edge(Node("a"), Node("b"), "ab"),
>> Edge(Node("a"), Node("c"), "ac"),
>> Edge(Node("b"), Node("c"), "bc"),
>> ])
>>
>> The main motivation here is to get lists, sets and dicts
>> play nice together.
>
> Generally, we've tried to discourage people from instantiating
> nodes and edges directly, in favor of having them controlled
> through the graph. Maybe something along the lines of:
>
> g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b',
> 'c')])
>
> ?

That works fine for simple cases like this, but starts getting unpleasant
if you want to initialise with attributes. Under those circumstances
using Node and Edge explicitly is much cleaner. The only extra I'd
suggest is allowing is_directed as a keyword argument, so you can set the
default for all edges if you want to.

g = Graph(
nodes=[Node("a", colour="red"),
Node("b", colour="white"),
Node("c", colour="blue")],
edges=[Edge("a", "b", "ab", weight=2),
Edge("a", "c", "ac", is_directed=True),
Edge("b", "c", "bc", style="dotted")],
is_directed=True)

I could see a use for this tracking a database structure using a constant
graph, hence all set up in one go for preference.

--
Rhodri James *-* Wildebeest Herder to the Masses

geremy condra

unread,

Dec 8, 2009, 10:47:03 PM12/8/09

to Rhodri James, pytho...@python.org

While I agree with the rationale, I think we need to find another way.
Aesthetics aside, directly instantiating edges by giving only node names
requires that the edge be aware of what graph its in to provide expected
behavior, which creates a bit of a chicken-or-the-egg dilemma.

How about this: the constructor can take any type of iterable, and
assumes that it follows my earlier format unless it specifies a .items()
method, in which case it takes the values as follows:

g = Graph(
nodes={'a':{'colour':'red'},
'b':{'colour':'white'},
'c':{'colour':'blue'}},
edges={('a', 'b'):{'name':'ab', 'weight':2},
('a', 'c'):{'name':'ac'},
('b', 'c'):{'name':'bc', 'style':'dotted'}}
)

Geremy Condra

Gabriel Genellina

unread,

Dec 9, 2009, 2:09:24 AM12/9/09

to pytho...@python.org

En Sat, 28 Nov 2009 06:30:44 -0300, Joshua Bronson <jabr...@gmail.com>
escribiï¿œ:
> On Nov 27, 9:36 pm, "Gabriel Genellina" <gags...@yahoo.com.ar>
> wrote:
>> En Fri, 27 Nov 2009 15:12:36 -0300, Francis Carr
>> <coldt...@gmail.com> escribiï¿œ:
>>
>> > After much tinkering, I think I have a simpler solution. Just make
>> > the inverse mapping accessible via an attribute, -AND- bind the
>> > inverse of -THAT- mapping back to the original. The result is a
>> > python dict with NO NEW METHODS except this inverse-mapping
>> > attribute. I have posted it on code.activestate.com as <a
>> > href="http://code.activestate.com/recipes/576968/">Recipe 576968:
>> > Flipdict -- python dict that also maintains a one-to-one inverse
>> > mapping</a>
>
>> Just a couple of comments:
>>
>> Instead of:
>> self._flip = dict.__new__(self.__class__)
>> I'd write:
>> self._flip = self.__class__()
>> unless I'm missing something (but see the next point).
>
> How would this not cause infinite recursion?

That goes under "unless I'm missing something" ;)
You're right, it would cause infinite recursion. Not a good idea...

>> Also, although Python's GC is able to handle them, I prefer to avoid
>> circular references like those between x and x._flip. Making
>> self._flip a weak reference (and dereferencing it in the property)
>> should be enough.
>
> If both self._flip and self._flip._flip are weak references, no strong
> references to the inverse mapping survive leaving the constructor
> scope. Unless I'm missing something, only one of these can be a weak
> reference, and then you'd have to do something like this in the
> property to prevent "TypeError: FlipDict is not callable":
>
> @property
> def flip(self):
> try:
> # we're an inverse, self._flip is a weak reference
> return self._flip()
> except TypeError:
> # we're a forward mapping, self._flip is a strong
> reference
> return self._flip

Yes - although I'd explicitely test for a weakref object:

def flip(self):
_flip = self._flip
if isinstance(_filp, weakref.ref):
return _flip()
return _flip

and probably all those internal references to self._flip should become
self.flip too; I've not tested the code but I think it *should* work...

--
Gabriel Genellina

Rhodri James

unread,

Dec 9, 2009, 6:04:18 PM12/9/09

to pytho...@python.org

On Wed, 09 Dec 2009 03:47:03 -0000, geremy condra <deba...@gmail.com>
wrote:

> On Tue, Dec 8, 2009 at 8:42 PM, Rhodri James
> <rho...@wildebst.demon.co.uk> wrote:
>>
>> g = Graph(
>> nodes=[Node("a", colour="red"),
>> Node("b", colour="white"),
>> Node("c", colour="blue")],
>> edges=[Edge("a", "b", "ab", weight=2),
>> Edge("a", "c", "ac", is_directed=True),
>> Edge("b", "c", "bc", style="dotted")],
>> is_directed=True)
>>
>> I could see a use for this tracking a database structure using a
>> constant
>> graph, hence all set up in one go for preference.
>
> While I agree with the rationale, I think we need to find another way.
> Aesthetics aside, directly instantiating edges by giving only node names
> requires that the edge be aware of what graph its in to provide expected
> behavior, which creates a bit of a chicken-or-the-egg dilemma.

Oops. I missed that, sorry.

> How about this: the constructor can take any type of iterable, and
> assumes that it follows my earlier format unless it specifies a .items()
> method, in which case it takes the values as follows:

isinstance(x, collections.Mapping) is perhaps the right test?

> g = Graph(
> nodes={'a':{'colour':'red'},
> 'b':{'colour':'white'},
> 'c':{'colour':'blue'}},
> edges={('a', 'b'):{'name':'ab', 'weight':2},
> ('a', 'c'):{'name':'ac'},
> ('b', 'c'):{'name':'bc', 'style':'dotted'}}
> )

That's OK for nodes, but for consistency with add_edges I would have
expected the name to be the optional third element of the key tuples. It
works either way, but I can't help feel it's beginning to look a bit ugly.

geremy condra

unread,

Dec 9, 2009, 6:42:13 PM12/9/09

to Rhodri James, pytho...@python.org

On Wed, Dec 9, 2009 at 6:04 PM, Rhodri James

<rho...@wildebst.demon.co.uk> wrote:
> On Wed, 09 Dec 2009 03:47:03 -0000, geremy condra <deba...@gmail.com>
> wrote:
>
>> On Tue, Dec 8, 2009 at 8:42 PM, Rhodri James
>> <rho...@wildebst.demon.co.uk> wrote:
>>>
>>> g = Graph(
>>> nodes=[Node("a", colour="red"),
>>> Node("b", colour="white"),
>>> Node("c", colour="blue")],
>>> edges=[Edge("a", "b", "ab", weight=2),
>>> Edge("a", "c", "ac", is_directed=True),
>>> Edge("b", "c", "bc", style="dotted")],
>>> is_directed=True)
>>>
>>> I could see a use for this tracking a database structure using a constant
>>> graph, hence all set up in one go for preference.
>>
>> While I agree with the rationale, I think we need to find another way.
>> Aesthetics aside, directly instantiating edges by giving only node names
>> requires that the edge be aware of what graph its in to provide expected
>> behavior, which creates a bit of a chicken-or-the-egg dilemma.
>
> Oops. I missed that, sorry.
>
>> How about this: the constructor can take any type of iterable, and
>> assumes that it follows my earlier format unless it specifies a .items()
>> method, in which case it takes the values as follows:
>
> isinstance(x, collections.Mapping) is perhaps the right test?

The code I kludged together last night just tries __getitem__
and it seems to work, so unless theres something I'm missing
I'll probably just leave it at that.

>> g = Graph(
>> nodes={'a':{'colour':'red'},
>> 'b':{'colour':'white'},
>> 'c':{'colour':'blue'}},
>> edges={('a', 'b'):{'name':'ab', 'weight':2},
>> ('a', 'c'):{'name':'ac'},
>> ('b', 'c'):{'name':'bc', 'style':'dotted'}}
>> )
>
> That's OK for nodes, but for consistency with add_edges I would have
> expected the name to be the optional third element of the key tuples. It
> works either way, but I can't help feel it's beginning to look a bit ugly.

I have to admit, I prefer it the other way, but patrick (our test guru and
chief bug squasher) likes your proposal better. I'm going to get in touch
with robbie tonight and see what he says. Since this is not a feature I'll
use much, if he agrees with you then I'll go ahead and implement the
change tonight and merge it back into mainline. If not, I'd appreciate
it if you'd take another look at it and figure out if its something you can
live with, or if theres another syntax you'd prefer, etc. Fair enough?

Geremy Condra

Rhodri James

unread,

Dec 9, 2009, 7:02:22 PM12/9/09

to geremy condra, pytho...@python.org

On Wed, 09 Dec 2009 23:42:13 -0000, geremy condra <deba...@gmail.com>
wrote:

Fair enough. Don't take my word as having much weight; I'm not likely to
use graphs much for graph theory purposes (having skipped the topology
courses in the Maths part of my degree), it just happens to be clearly the
right datastructure for a project I'm fiddling with at home.

Here's a thought: are

g.add_edge("a", "b", "ab")

and

g.add_edge("a", "b", name="ab")

equivalent? If so, there's no reason not to have both forms of the
initialiser. If not, that weighs against having 'name' as a dictionary
key.

M.-A. Lemburg

unread,

Dec 9, 2009, 7:22:52 PM12/9/09

to geremy condra, pytho...@python.org

geremy condra wrote:
> On Mon, Dec 7, 2009 at 6:28 PM, M.-A. Lemburg <m...@egenix.com> wrote:

>>>> * Graph.__init__ should be able to take a list or set
>>>> of nodes and edges as initializer
>>>
>>> The format of this will need to be thought all the way
>>> through before being implemented. To date, we haven't
>>> come up with anything completely satisfactory, but
>>> AFAIK everybody involved is still open to suggestions
>>> on this.
>>

>> I wasn't thinking of anything clever :-) ...
>>
>> g = Graph(
>> [Node("a"), Node("b"), Node("c")],
>> [Edge(Node("a"), Node("b"), "ab"),
>> Edge(Node("a"), Node("c"), "ac"),
>> Edge(Node("b"), Node("c"), "bc"),
>> ])
>>
>> The main motivation here is to get lists, sets and dicts
>> play nice together.
>
> Generally, we've tried to discourage people from instantiating
> nodes and edges directly, in favor of having them controlled
> through the graph. Maybe something along the lines of:
>

> g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b', 'c')])
>
> ?

That would work as well, but you then miss out on the extra
parameters you can pass to Edge() and Node().

In another email you wrote that Edge() and Node() constructors
should not be used directly, since they have to know the Graph
to which they belong.

Wouldn't it be possible to implement this kind of parent-
referencing in a lazy way, ie. by postponing all the init-
processing until the Graph instance is known ?

> g.add_edge('a', 'b', 'ab')

> 'ab' in g # returns true

I'd avoid such an ambiguity. It could easily hide programming
errors (testing for edges instead of nodes).

OTOH, you could also regard the graph as a set of nodes
and edges (as you apparently already do). In that case,
you'd define

for x in g: print x

as iteration of both nodes and edges in some arbitrary
order and then use the more specific:

for n in g.nodes: print x
for e in g.edges: print x

for iteration over just the nodes or edges.

The idea is that you use the Graph representation of the
data to implement some algorithm e.g. optimizing weights
for example.

The algorithm could be implemented in C to be fast enough
for large data sets.

The above two methods would then be used to quickly push the
original data into the graph and extract it again after
the algorithm has run.

>>>> * Node.__getitem__ could be mapped to .data.__getitem__
>>>> (following the approach taken by ElementTree); same
>>>> for Edge.__getitem__
>>>
>>>> * Node.__setitem__ could be mapped to .data.__setitem__
>>>> (following the approach taken by ElementTree); same
>>>> for Edge.__setitem__
>>>
>>> .data is just a property that returns a dictionary of non
>>> private members right now, so if you're wanting to just say
>>> node.this = 'stuff', you can already do that. Or am I
>>> misreading your intent?
>>
>> Right, but with attributes you are restricted to Python
>> identifiers, e.g. keywords (e.g. "from", "pass") are not allowed.
>> The above approach bypasses this restriction.
>
> Hmm. Sounds like a plausible use case to me, although I'm
> not sure its one that should be encouraged. The bigger
> question in my mind is whether all attribute lookups should
> have to pay the extra lookup cost to support a somewhat
> narrow need. I'll definitely have to talk with the other devs
> about this one, and run a little bit of timing besides.

This would only be an optional second way of accessing
the .data dictionary.

node.data['pass'] = 1
node.data['from'] = 'Las Vegas'
node.data['to'] = 'New York'

would work without such modifications.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Dec 10 2009)

geremy condra

unread,

Dec 9, 2009, 7:27:28 PM12/9/09

to Rhodri James, pytho...@python.org

On Wed, Dec 9, 2009 at 7:02 PM, Rhodri James

> g.add_edge("a", "b", "ab")
>

> and
>
> g.add_edge("a", "b", name="ab")
>
> equivalent? If so, there's no reason not to have both forms of the
> initialiser. If not, that weighs against having 'name' as a dictionary key.

You're right, of course- the obvious way to do this is just to apply
tuple unpacking to each element in the edges argument. I've applied
the change, although testing and documentation will need to be
updated before I can merge it back into mainline, and I doubt I'll
get to that tonight.

Geremy Condra

geremy condra

unread,

Dec 9, 2009, 7:41:11 PM12/9/09

to M.-A. Lemburg, pytho...@python.org

>> Generally, we've tried to discourage people from instantiating
>> nodes and edges directly, in favor of having them controlled
>> through the graph. Maybe something along the lines of:
>>
>> g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b', 'c')])
>>
>> ?
>
> That would work as well, but you then miss out on the extra
> parameters you can pass to Edge() and Node().

Just pushed a change that will allow that.

> In another email you wrote that Edge() and Node() constructors
> should not be used directly, since they have to know the Graph
> to which they belong.
>
> Wouldn't it be possible to implement this kind of parent-
> referencing in a lazy way, ie. by postponing all the init-
> processing until the Graph instance is known ?

While possible, I'm wary of this. We tried it in an initial
prototype (all nodes and edges descended naturally from
a Universe graph until otherwise stated) and that way lie
madness. Perhaps at some point someone would like to
give another shot at it, though.

>> Beware, that doesn't just match nodes.
>>
>> g = Graph()
>> g.add_node('a')
>> g.add_node('b')

>> g.add_edge('a', 'b', 'ab')

>> 'ab' in g # returns true
>
> I'd avoid such an ambiguity. It could easily hide programming
> errors (testing for edges instead of nodes).
>
> OTOH, you could also regard the graph as a set of nodes
> and edges (as you apparently already do). In that case,
> you'd define
>
> for x in g: print x
>
> as iteration of both nodes and edges in some arbitrary
> order and then use the more specific:
>
> for n in g.nodes: print x
> for e in g.edges: print x
>
> for iteration over just the nodes or edges.

Yup, that's exactly what we do.

>>>>>> g.get_edge_attributes('weight', 'color')
>>> {"ab": {"weight": 3, "color": "blue"},
>>> "ac": {"weight": 2, "color": "green"},
>>> "bc": {"weight": 1, "color": "red"}}
>>>
>>> Again, the idea is to reduce call overhead and later
>>> on be able to move these lookups to C.
>>
>> Entirely doable, but I'm not sure I see the use case.
>> Would you mind providing a more realistic example?
>
> The idea is that you use the Graph representation of the
> data to implement some algorithm e.g. optimizing weights
> for example.
>
> The algorithm could be implemented in C to be fast enough
> for large data sets.
>
> The above two methods would then be used to quickly push the
> original data into the graph and extract it again after
> the algorithm has run.

I can see this, but I think the cleaner way is just to iterate
over nodes and edges. Having said that, if somebody wants
to come up with some timing data and it seems to provide a
big advantage, I think robbie for one would make the change.

>> Hmm. Sounds like a plausible use case to me, although I'm
>> not sure its one that should be encouraged. The bigger
>> question in my mind is whether all attribute lookups should
>> have to pay the extra lookup cost to support a somewhat
>> narrow need. I'll definitely have to talk with the other devs
>> about this one, and run a little bit of timing besides.
>
> This would only be an optional second way of accessing
> the .data dictionary.
>
> node.data['pass'] = 1
> node.data['from'] = 'Las Vegas'
> node.data['to'] = 'New York'
>
> would work without such modifications.

Yes, but the change is not reflected on the node. For
example:

>>> g = Graph(nodes={'a':{'spotted':True}})
>>> g['a'].data['spotted'] = False
>>> g['a'].data['spotted']
True

I can see how this would represent a gotcha, though.
Is there a general opinion on whether this is a plus or
a minus?

Geremy Condra

Bearophile

unread,

Dec 9, 2009, 8:59:34 PM12/9/09

to

Robin Becker:

> There are already very many implementations eg
>

> http://code.google.com/p/igraphhttp://www.boost.org/doc/libs/release/libs/graphhttp://ernst-schroeder.uni.lu/Digraph/doc/http://code.google.com/p/python-graphhttp://compbio.washington.edu/~zach/py_graph/doc/html/public/py_graph...

>
> and many others......some of the above already seem to be very useful.

There's mine too:
http://sourceforge.net/projects/pynetwork/
API:
http://code.activestate.com/recipes/466329/

Bye,
bearophile

geremy condra

unread,

Dec 9, 2009, 11:09:26 PM12/9/09

to Bearophile, pytho...@python.org

Huh, I don't think I've ever seen that before, and I'm pretty
sure I'd remember if I had. With your permission, I'd like to
go ahead and start integrating some of the features from
that into graphine, especially a topo traversal. Do you
mind?

Geremy Condra

geremy condra

unread,

Dec 10, 2009, 12:39:05 AM12/10/09

to Bearophile, pytho...@python.org

On Wed, Dec 9, 2009 at 11:09 PM, geremy condra <deba...@gmail.com> wrote:
> On Wed, Dec 9, 2009 at 8:59 PM, Bearophile <bearoph...@lycos.com> wrote:

> Huh, I don't think I've ever seen that before, and I'm pretty
> sure I'd remember if I had. With your permission, I'd like to
> go ahead and start integrating some of the features from
> that into graphine, especially a topo traversal. Do you
> mind?
>
> Geremy Condra
>

Since that's released under the python license, I'm going to
go ahead and commit the version that includes the topo
traversal, but if you have any objections you only need to
say the word and I'll take it down.

Geremy Condra

Bearophile

unread,

Dec 10, 2009, 5:18:26 AM12/10/09

to

geremy condra:

> Since that's released under the python license, I'm going to
> go ahead and commit the version that includes the topo
> traversal, but if you have any objections you only need to
> say the word and I'll take it down.

No objections :-)

Bye,
bearophile

Tiago de Paula Peixoto

unread,

Dec 10, 2009, 7:48:27 AM12/10/09

to pytho...@python.org

On 12/08/2009 01:27 PM, Robin Becker wrote:
> I don't want to sound pessimistic, but graph and digraph theory has a

> lot of history, especially in computer science. There are already very
> many implementations eg
>

> http://code.google.com/p/igraph
> http://www.boost.org/doc/libs/release/libs/graph
> http://ernst-schroeder.uni.lu/Digraph/doc/
> http://code.google.com/p/python-graph
> http://compbio.washington.edu/~zach/py_graph/doc/html/public/py_graph-module.html

I would like to point out the following two projects as additions to
this list:

http://graph-tool.forked.de (my own project)
http://networkx.lanl.gov

The graph-tool module uses the Boost Graph Library internally to achieve
good numerical performance, while networkx has a more python-only
approach.

Cheers,
Tiago

signature.asc

geremy condra

unread,

Dec 10, 2009, 7:56:45 AM12/10/09

to Bearophile, pytho...@python.org

I appreciate it. I don't seem to be having any luck emailing you
offlist, so please feel free to email me privately if you'd rather
this not be indexed, but is there a particular way you want your
attribution line to read?

Geremy Condra

geremy condra

unread,

Dec 10, 2009, 9:13:40 AM12/10/09

to Tiago de Paula Peixoto, pytho...@python.org

Well, we all seem to have reinvented the wheel differently ;)
Bearophile, Tiago- any interest in trying to combine the
best parts of our libraries, with an eye towards eventual
integration into the standard library?

Geremy Condra

Bearophile

unread,

Dec 10, 2009, 10:57:00 AM12/10/09

to

Geremy Condra:

> is there a particular way you want your attribution line to read?

You can just use my nickname (in all lowercase), with the list of
parts you have used. Don't worry.

> Well, we all seem to have reinvented the wheel differently ;)

Maybe also because they are designed for different purposes.

> Bearophile, Tiago- any interest in trying to combine the
> best parts of our libraries, with an eye towards eventual
> integration into the standard library?

The first thing to do is to ask Guido and Hettinger if they are
willing to put a "good" graph module into the std lib. If their answer
is positive for some definition of "good", then we can think about
doing something.

Several years ago I have suggested to put a graph module in the std
lib, and the answer was something like: "Put the lib online, and if
people use it a lot, we'll see to put it into the std lib." In the
meantime my lib was used by no one and ten other graph libs are used
(networkx seems among the most used), but I think no one of them has
shown a strong usage. (In the meantime Hettinger has written and added
two or three or four GOOD data structures to the std lib using a "fast
lane", avoiding the step of popular usage test).

Bye,
bearophile

Tiago de Paula Peixoto

unread,

Dec 12, 2009, 7:21:00 AM12/12/09

to pytho...@python.org

On 12/10/2009 01:57 PM, Bearophile wrote:
> Geremy Condra:

>> Well, we all seem to have reinvented the wheel differently ;)
>
> Maybe also because they are designed for different purposes.

This is true. For instance, the data structures and most algorithms in
graph-tool are implemented in C++ to achieve good performance, which is
necessary if you are working with very large graphs. I think this
differs from most python graph libraries, which don't have this as a
high priority.

>> Bearophile, Tiago- any interest in trying to combine the
>> best parts of our libraries, with an eye towards eventual
>> integration into the standard library?
>
> The first thing to do is to ask Guido and Hettinger if they are
> willing to put a "good" graph module into the std lib. If their answer
> is positive for some definition of "good", then we can think about
> doing something.

A crucial element in this hypothetical module would be the main graph
data structure. The simplest approach would be to implement it in pure
python, with lists, dicts and such, as many libraries do. However, this
would rule out its use by high-performance code, which would need a
simpler C-based data structure for direct interaction. On the other
hand, I'm not sure if there is a need for a high performance graph
module in python's standard library...

Cheers,
Tiago

signature.asc

geremy condra

unread,

Dec 12, 2009, 5:32:18 PM12/12/09

to Bearophile, pytho...@python.org

Well, I've just concluded a short conversation with Raymond Hettinger,
and I think its fair to characterize him as being opposed to the idea
at present. In addition to the popularity test, he's also noted that
ideally a core CPython dev should be involved in the project. Putting
the two together is, AFAICS, a death knell for any extant graph lib.

Having said that, I'd still like to see how much common ground we
could find among the existing libraries. IMHO, there's a lot more
in common than there is different.

Geremy Condra

Terry Reedy

unread,

Dec 13, 2009, 3:38:12 AM12/13/09

to pytho...@python.org

geremy condra wrote:

> Well, I've just concluded a short conversation with Raymond Hettinger,
> and I think its fair to characterize him as being opposed to the idea
> at present. In addition to the popularity test, he's also noted that
> ideally a core CPython dev should be involved in the project. Putting
> the two together is, AFAICS, a death knell for any extant graph lib.
>
> Having said that, I'd still like to see how much common ground we
> could find among the existing libraries. IMHO, there's a lot more
> in common than there is different.

The large number of current projects practically guarantees that none
will be popular enough, and perhaps than none get the usage needed to
shake out a broadly useful api. So any consolidation wuold be good.

geremy condra

unread,

Dec 13, 2009, 4:31:15 AM12/13/09

to Terry Reedy, pytho...@python.org

While I agree, I think it's going to be extremely difficult to get any
kind of buy in without a great deal of support from within python.
Any devs willing to throw the time required into this?

Geremy Condra

Tiago de Paula Peixoto

unread,

Dec 13, 2009, 7:01:32 AM12/13/09

to pytho...@python.org

On 12/13/2009 08:32 AM, anand jeyahar wrote:
> A crucial element in this hypothetical module would be the main graph
> data structure. The simplest approach would be to implement it in pure
> python, with lists, dicts and such, as many libraries do. However, this
> would rule out its use by high-performance code, which would need a
> simpler C-based data structure for direct interaction. On the other
> hand, I'm not sure if there is a need for a high performance graph
> module in python's standard library...
>

> I disagree...I am not sure of the current need in terms of a precise
> survey.....But IMO, as bearophile pointed out.....networkx is the most
> popular........and from their claims it is targeted at mathematicians,
> physicists, biologists, computer scientists, social scientists.
>
> Given the current trend in the growth of social and professional
> networks..... and social scientists (Disclaimer: i aspire to be one).. i
> do expect a growing demand for graph data structures and high
> performance ones soon enough..
>
> so IMHO if we are going in for it we should go for the high performance
> graphs too..

I certainly think it is important to have a high-performance graph
library, that is why I started to write one. I was talking simply about
its inclusion in the standard library. Since something as basic for
scientific computing as, for instance, numpy does not belong in the
standard library, I don't think it makes sense to push for the inclusion
of a scientific graph library. Could we (or rather should we) even make
it _independent_ of numpy?

But if the idea is to consolidate the existing graph libraries into an
uniform package, I'm certainly not against it in principle. But I think
we should first familiarize ourselves with the existing variants, to see
if there is really a common ground. The basic interface to the graph
data structure should not vary much, I believe, but the implementation
may. For instance, I chose in graph-tool to implement most of it in C++
with the Boost Graph Library...

When I get some time, I'll take a careful look at the implementations
listed in this thread, some of which I don't yet know.

Cheers,
Tiago

signature.asc