complementary problem

16 views
Skip to first unread message

Chris Godsil

unread,
Jul 12, 2010, 2:49:53 PM7/12/10
to sage-devel
If G is a graph and A = G.am() is its adjacency matrix and B =
G.complement().am() is
the adjacency matrix of its complement, then A+B should be a 01-
matrix. The code below
shows that sage does not always share this viewpoint.

Cheers
Chris

#####
##
----------------------------------------------------------------------
## | Sage Version 4.4.4, Release Date:
2010-06-23 |
## | Type notebook() for the GUI, and license() for
information. |
##
----------------------------------------------------------------------
## sage: load whine.sage
## sage: mcl = McL()
## sage: A = mcl.am()
## sage: B = mcl.complement().am()
## sage: A.fcp()
##
## These are what they should be
##
## (x - 112) * (x + 28)^22 * (x - 2)^252
## sage: B.fcp()
## (x - 162) * (x - 27)^22 * (x + 3)^252
##
## Here's the problem: C should be a 01-matrix (in fact it should be J-
I)
##
## sage: C = A+B
## sage: max(C[0].list())
## 2
##
## If use mcl.relabel() before computing A and B, there are no
problems. So its the
## complicated vertices that are causing the problem, I expect.

##
## Here's the content of whine.sage:
##

def switch(G,sub):
H = G.copy()
rest = Set(H.vertices()).difference(Set(sub))
for u in sub:
for v in rest:
if H.has_edge(u,v): H.delete_edge(u,v)
else: H.add_edge(u,v)
return H

def McL():
C = ExtendedBinaryGolayCode()
D = C.punctured([0])
words = [ Set(it.support()) for it in D if hamming_weight(it)==7]
MG = Graph( [words, lambda a,b: len(a.intersection(b))==1])
MM = MG.copy()
MM.add_vertices([0..22])
edges = [ (i,a) for i in [0..22] for a in words if i not in a]
MM.add_edges( edges)
McL = switch( MM, MM[0])
McL.delete_vertex(0)
return McL

Nathann Cohen

unread,
Jul 12, 2010, 7:40:53 PM7/12/10
to sage-devel, Robert Miller
I think there lies the problem :

sage: mcl_v = mcl.vertices()
sage: mclc_v = mcl.complement().vertices()
sage: mcl_v == mclc.v
False

So they are not equal. Do they contain the same elements ?

sage: Set(mcl_v) == Set(mclc_v)
True

So it seems... But then, what does THAT mean ?

sage: sorted(mclc_v) == sorted(mcl_v)
False

Anyway :

sage: (i for i in xrange(len(mclc_v)) if mcl_v[i] != mclc_v[i]).next()
30
sage: mcl_v[30]
{16, 1, 2, 17, 8, 11, 14}
sage: mclc_v[30]
{0, 17, 2, 22, 6, 8, 15}

Actually, I tried to first build the union of mcl and its complement,
which was indeed a clique. Besides, when ones builds copies of mcl and
its complement using the adjacency matrices, the graphs are isomorphic
to what they should be. I suspected something wrong with the order in
which the vertices are sorted, which would mean that the coordinate
(i,j) do not represent the same pair of vertices in both matrices.
This seems indeed to be the case, but I really do not like this
difference between the two "sorted" ouputs....

Would you have any idea, Robert? :-/

Nathann

Robert Miller

unread,
Jul 24, 2010, 9:12:59 AM7/24/10
to sage-...@googlegroups.com
Chris,

On Mon, Jul 12, 2010 at 7:49 PM, Chris Godsil <cgo...@uwaterloo.ca> wrote:
> ... its the complicated vertices that are causing the problem, I expect.

The problem is that the comparison operators in Python for sets
implement the subset notion, and thus do not provide a proper sorting
of a list of sets:

http://docs.python.org/library/sets.html#set-objects

IMHO, these would be better as .is_subset() methods, etc.

Sage's adjacency matrix command uses vertices(), which uses comparison
operators to give an ordering of the vertices. However, since "subset"
does not define a total ordering, it does not give a good way of
sorting vertices. One good short term solution is to use tuples
instead, which are ordered lexicographically.

Some possible solutions:

1. Sage graphs should have a way of specifying a user-defined
comparison function for sorting vertices. Any time you use sets as
vertices, you would also do something like

sage: def cmp(foo, bar):
... # some total ordering on foo and bar...
sage: G.set_cmp_verts(cmp)
sage: G.vertices()
[consistently sorted list of vertices]

Or even easier:

sage: G.set_cmp_verts(sets=True)

2. We could be a bit smarter and provide some alternative sort
function in the vertices() command, but it would seem as if no matter
which function we wrote, a user could come up with another object
which would implement < etc in such a way as to still break it.


Thoughts? Python gurus, is there a set object which has the right kind
of __cmp__ for this use?


--
Robert L. Miller
http://www.rlmiller.org/

Robert Miller

unread,
Aug 3, 2010, 4:45:14 PM8/3/10
to sage-...@googlegroups.com, Chris Godsil
This entire problem is fixed by implementing proper comparison for Sage Sets:

http://trac.sagemath.org/sage_trac/ticket/9677

Nils Bruin

unread,
Aug 3, 2010, 6:31:19 PM8/3/10
to sage-devel
The problem of "<" not providing a total ordering seems to come up
again and again in different situations. There are two cases now
already where it is clear that even in Python "<" does not signify a
total ordering anymore:

- python complex numbers ( < gives an error)
- python sets ( < denotes proper inclusion)

Rather than keep trying to work around this by trying to impose rather
arbitrary total orderings in categories, wouldn't it be more natural
to recognise that sorting doesn't make canonical sense in all
categories and be explicit about the choice of injection from the set
to be sorted into a totally ordered set? The "key" attribute for sort
allows for exactly such a choice.

sage: L=[set([1,2,3]),set([3,2]),set([1,4])]
sage: L
[set([1, 2, 3]), set([2, 3]), set([1, 4])]
sage: L.sort()
sage: L
[set([2, 3]), set([1, 2, 3]), set([1, 4])]
sage: L.sort(key=lambda V: [len(V),sorted(V)])
sage: L
[set([1, 4]), set([2, 3]), set([1, 2, 3])]
sage: L.sort(key=lambda V: sorted(V))
sage: L
[set([1, 2, 3]), set([1, 4]), set([2, 3])]

Robert Miller

unread,
Aug 4, 2010, 7:23:48 AM8/4/10
to sage-devel
I'm not sure what the point of a comparison function is if we don't
implement a total ordering. The main place cmp methods get used is in
sorting. If you have a large list of objects, and, for example, you
want to know whether X is in the list, you might find this out by
looping over the whole list. Or, if the list is sorted, you can use
cmp to do a binary search. Thus testing for containment goes from n to
log(n). Python's own sort function uses cmp, which suggests to me that
it is a silly decision to abandon total orderings. Sage's uniq
function only works if the objects input have a total ordering defined
by their cmp method. I think that __cmp__ isn't something for which
"canonical" in the mathematical sense should be applied. This is a
computer scientific operation, for doing computer scientific things,
such as smart searches through lists and giving unique output.

I think that __cmp__ should give a total ordering whenever possible,
because it makes developing Sage easier. We can guarantee unique
reproducible output in more cases, find elements in lists faster, and
sort lists with completely unknown entries. In an environment where we
are programming for an unknown user where the inputs are simply not
predictable ahead of time (and checking what types they all are
massively slows everything down), it would be handy to be able to
count on something as simple as sorted(L) returning the same thing for
any permutation of L. I really don't think that is too much to ask of
a computer algebra system.

Nils Bruin

unread,
Aug 4, 2010, 12:05:10 PM8/4/10
to sage-devel
On Aug 4, 4:23 am, Robert Miller <rlmills...@gmail.com> wrote:
> I'm not sure what the point of a comparison function is if we don't
> implement a total ordering. The main place cmp methods get used is in
> sorting.

But it is a fact that Python has already abandoned total ordering
semantics for "<".

> If you have a large list of objects, and, for example, you
> want to know whether X is in the list, you might find this out by
> looping over the whole list. Or, if the list is sorted, you can use
> cmp to do a binary search. Thus testing for containment goes from n to
> log(n).

Or you use hashes and it becomes an "O(1)" problem, which is how
python solves membership for sets and dictionaries. (I always have
trouble believing this is really O(1), but it does work very well in
practice)

> Python's own sort function uses cmp, which suggests to me that
> it is a silly decision to abandon total orderings. Sage's uniq
> function only works if the objects input have a total ordering defined
> by their cmp method.

Then that is an implementation error in uniq. Turning the sequence
into a set and then into a sequence should be more efficient, provided
that __hash__() is implemented decently.

I agree it's questionable that sort gives undefined results on python
sets. I would think an error would be preferable but the current
behaviour is documented.

> I think that __cmp__ isn't something for which
> "canonical" in the mathematical sense should be applied. This is a
> computer scientific operation, for doing computer scientific things,
> such as smart searches through lists and giving unique output.

To ease the transition to python 3.0, sage should start migrating to
__lt__ etc. If the method were just __cmp__, I would agree with you.
However, the infix notation a < b translates to a __cmp__ call too,
and in a computer algebra system, I expect "a < b" to have a decidedly
mathematical meaning.

> it would be handy to be able to
> count on something as simple as sorted(L) returning the same thing for
> any permutation of L. I really don't think that is too much to ask of
> a computer algebra system.

But you are already asking more than the underlying computer language,
python, provides. There are some other cases where sage departs from
python semantics (transitivity of ==), but that is for mathematical
reasons. I would be really disappointed if sage decides to be less
mathematical and more computer scienctific than python. It is going to
be a herculean task to avoid a < b < c < a from occurring for any sage
objects a,b,c, which will be necessary for safeguarding sorted(L).

In categories where string representations tend to provide a poor
representation of the identity of an element, you can use explicit
equality tests for doctesting.

Robert Miller

unread,
Aug 4, 2010, 4:16:51 PM8/4/10
to sage-...@googlegroups.com
Nils,

So you want Sage Sets to implement "a < b" to mean "a is a subset of
b"? I'll admit that that is reasonable, and it is a fact that it
follows Python convention. But I think that the Python convention is
bizarre, especially given how they implement sorting lists. I would
also rather the sort function return an error than being undefined,
but as far as I can tell, this "feature" is here to stay. (Actually I
think it would be nice if sorted(L) returned a sort of L, but clearly
that's too much to ask.) I took a look at Python 3.0 and the situation
gets more extreme. Heterogeneous arrays are no longer sortable at all,
and sorted() still uses "<", which is still "subset" for sets. So the
Python devs are valuing sorting less and less and less.

In Sage, we have a bizarre mixture. Python sets act as above. Sage
sets have the horrible A < B unless A = B. Subspaces of a vector
space, however, order lexiographically, and not by inclusion. The
reasons given being a justification for that. I have to say I disagree
with the implicit statement I am reading in your message which is that
"<" can only be "decidedly mathematical" if we define it to mean
subset. Groebner bases are also decidedly mathematical, and define all
sorts of arbitrary total orderings...

Maybe if I explain the use cases I have in mind you will have more
sympathy, or maybe someone can suggest a good long term solution. Many
graphs in algebraic graph theory have interesting vertex sets. Some
can be collections of subsets of a certain set, or subspaces of a
certain space, or even more crazily, sets of sets. As it is right now,
Sage graphs allow anything hashable as vertices, with no restrictions.
I view this as important, since graphs themselves are very flexible
mathematical objects. For things like the adjacency matrix, having a
consistent if arbitrary total ordering of the vertices is important.
Now if we allow Sage objects to implement total orderings in general,
then we can just note to avoid using Python sets as vertex sets, since
we can't do anything about that.

On the other hand, if the __lt__ etc. methods implement orderings that
aren't total orderings, sorting gets very messy. In particular, users
will be complaining about all sorts of Sage objects falling into the
same trap where sorted(L) returns a list which isn't consistent. (Or
maybe I'm the only person here who cares...)

As a thought exercise, let's persue the partial ordering idea, fixing
Python's frozensets as our vertices. (Frozen so that they can hash,
Python so that we can't really change the "<" convention even if we
wanted to.) Imagine the following situation. We want to define a
consistent ordering on the vertices, so we use their hashes. Now, we
come to a collision. What to do next? We need to define *some*
consistent ordering so that operations like adjacency matrix, etc.
give consistent, sensical answers. It is not clear to me what to do
when we get hash collisions.

Furthermore, Sage graphs will vastly slow down if we implement a
custom sorting function which tries to deal with this case by case.

On another angle, Python has clearly dealt with this problem
internally, but they don't seem willing to share this with their
users. Python sets must contain only hashable elements. This is
because Python is using hashes to test containment and equality. I
love this kind of stuff
{{{
>>> sorted( [ frozenset([2,3]), frozenset([1,2]) ] )
[frozenset([2, 3]), frozenset([1, 2])]
>>> sorted( [ frozenset([1,2]), frozenset([2,3]) ] )
[frozenset([1, 2]), frozenset([2, 3])]
>>> sorted( set( [ frozenset([1,2]), frozenset([2,3]) ] ) )
[frozenset([1, 2]), frozenset([2, 3])]
>>> sorted( set( [ frozenset([2,3]), frozenset([1,2]) ] ) )
[frozenset([1, 2]), frozenset([2, 3])]
}}}

But, they don't seem to have *acutally* dealt with the problem:
{{{
>>> a = frozenset([1,2])
>>> b = frozenset([2,3])
>>> c = frozenset([3,4])
>>> A = frozenset([a,b])
>>> B = frozenset([b,c])
>>> sorted(set([A,B])) == [A,B]
True
>>> sorted(set([B,A])) == [B,A]
True
}}}

So, given that we want to have graphs whose vertex sets are sets, or
subspaces, what is to be done? I don't want to implement sorting of
every different type of object in the graph code, I'd much rather have
things be sane from the start.

Robert Miller

unread,
Aug 4, 2010, 4:23:06 PM8/4/10
to sage-...@googlegroups.com
Also, consider the fact that many Sage functions use "return
sorted(output)" to guarantee a consistent ordering of the output. What
you're advocating means that this wouldn't work in many cases...
Message has been deleted

Nils Bruin

unread,
Aug 4, 2010, 5:51:21 PM8/4/10
to sage-devel
On Aug 4, 1:16 pm, Robert Miller <r...@rlmiller.org> wrote:
> So you want Sage Sets to implement "a < b" to mean "a is a subset of
> b"? I'll admit that that is reasonable, and it is a fact that it
> follows Python convention.

My initial reaction for this is affirmative: If python makes a choice,
then sage must follow because it is a python library.

There are other areas of mathematics where "<" gets used for proper
inclusion. A group theorist is going to be very surprised if for two
groups H,G, the expression "H < G" is valid but does not mean "H is a
subgroup of G".

> But I think that the Python convention is
> bizarre, especially given how they implement sorting lists.

Indeed, but documented http://docs.python.org/library/stdtypes.html#set-types-set-frozenset
: "Since sets only define partial ordering (subset relationships), the
output of the list.sort() method is undefined for lists of sets."

> I have to say I disagree
> with the implicit statement I am reading in your message which is that
> "<" can only be "decidedly mathematical" if we define it to mean
> subset.
In the category of sets (finite sets for that matter), I cannot think
of a meaning for "<". If you have a total ordering on the elements
then "lex sorted" ordering makes sense of course, but now you are
working in the category of "finite subsets of a totally ordered set".

> Groebner bases are also decidedly mathematical, and define all
> sorts of arbitrary total orderings...

Indeed, and computer algebra systems (the good ones anyway) take great
care in making the choice of monomial ordering explicit.

> For things like the adjacency matrix, having a
> consistent if arbitrary total ordering of the vertices is important.

But then an adjacency matrix is something that is defined for a graph
*PLUS* an ordering of its vertices. A graph only has associated to it
an "adjacency class", which is the conjugacy class of one of its
adjacency matrices under permutation matrices, i.e.:

{ P.transpose()* adjacency_matrix * P : P in permutation matrices of
the right size }

I agree that's not something you want to work with in a computer
algebra system, but then perhaps a graph should be given by a
*sequence* of vertices rather than a set. Just a "the 3 dimensional
vector space over R" also comes with an ordered basis whenever you do
computations with it.

> On the other hand, if the __lt__ etc. methods implement orderings that
> aren't total orderings, sorting gets very messy. In particular, users
> will be complaining about all sorts of Sage objects falling into the
> same trap where sorted(L) returns a list which isn't consistent. (Or
> maybe I'm the only person here who cares...)

Python 3 is clear on the matter: If there is not a natural ordering,
then __lt__() is going to return a type error. In addition, for sets
"<" is commandeered for a partial ordering, so routines that assume
that __lt__() is consistent with a total ordering are going to return
undefined results, and this is documented. I for one would not expect
"sorted" to work on arbitrary iterables, so I'd be happy with an error
and disappointed with an undefined result, until I read the relevant
documentation.

> As a thought exercise, let's persue the partial ordering idea, fixing
> Python's frozensets as our vertices.
[...]
> We want to define a
> consistent ordering on the vertices, so we use their hashes.

You could use id() instead if you ensure uniqueness of keys... This is
guaranteed to be unique and unchanged for the lifetime of the object,
so you won't get collisions. In CPython it is implemented as the
address where the object is stored, but a garbage collected
implementation of Python would probably have to do something else.
[OK, so that one doesn't work]

An alternative approach would be to realise that vertices need to be
stored in a sequence that has quick membership testing, for instance a
dictionary {'v1': 0, 'v2':1} etc. Depending on the type of access,
perhaps you would also need to keep the thing as a sequence
['v1','v2']. In magma, this data structure is an "indexed set".
Including an element adds it at the end, if it doesn't already exist
in the indexed set. Probably, deletion should follow list semantics.

I don't know how easy it is to make such a data structure efficient
and what the memory penalty is for having both hashing and ordering.

> On another angle, Python has clearly dealt with this problem
> internally, but they don't seem willing to share this with their
> users.

I haven't checked the implementation, but I suspect that for
membership test and lookup, they use hash-tables.

> I love this kind of stuff
[sorted([frozenset,....]) examples]
this is documented to be undefined.

> So, given that we want to have graphs whose vertex sets are sets, or
> subspaces, what is to be done?

Don't rely on sorting, but use other methods for getting consistent
orderings and efficient membership tests.

> Also, consider the fact that many Sage functions use "return
> sorted(output)" to guarantee a consistent ordering of the output. What
> you're advocating means that this wouldn't work in many cases...

Yes. Personally, I would be in favour of actively randomizing the
order in which results are returned whenever the results are not
supposed to be returned in any particular order. So many subtle bugs
creep into code due to people assuming results are returned in some
particular order and then in some future edition of the code, this
order is changed and the code unexpectedly breaks.

If you need one return value out of many with a particular property,
explicitly select on that property. Otherwise, deal with the fact that
you are working with *some* return value.

I understand that consistent string representation of output is
necessary for doctesting, but then you can just do a sort in the
doctest, in the spirit of:

sage: f = PolynomialRing(ComplexNumbers(100),"x").random_polynomial()
sage: L = f.roots()
sage: sorted(L, key = lambda v: [v[0].abs(),v[0].arg()])

or test on internal consistency

sage: sum([v[1]: v in L]) == f.degree() and
all( f(r[0]).close_to_zero() for r in L )

In short:
- I don't believe sorting is essential for programming many things
efficiently
- I don't think routines should be going out of their way to sort
their output
- The reality is that Python will not support total ordering in many
cases anymore.

Robert Miller

unread,
Aug 4, 2010, 6:14:53 PM8/4/10
to sage-...@googlegroups.com
On Wed, Aug 4, 2010 at 5:51 PM, Nils Bruin <nbr...@sfu.ca> wrote:
> On Aug 4, 1:16 pm, Robert Miller <r...@rlmiller.org> wrote:
>> So you want Sage Sets to implement "a < b" to mean "a is a subset of
>> b"? I'll admit that that is reasonable, and it is a fact that it
>> follows Python convention.
>
> My initial reaction for this is affirmative: If python makes a choice,
> then sage must follow because it is a python library.

If Python jumped off a cliff...

> There are other areas of mathematics where "<" gets used for proper
> inclusion. A group theorist is going to be very surprised if for two
> groups H,G, the expression "H < G" is valid but does not mean "H is a
> subgroup of G".

But if you ask a good mathematician, given sets A and B, is A less
than B or is it true that A < B, they will ask, what do you mean by
"<"?

> In the category of sets (finite sets for that matter), I cannot think
> of a meaning for "<".

So then you would rather have Sage sets give an error rather than sort
in a list? I can't imagine why this is a good thing. You seem to be
ignoring the sorted() function altogether, rather than admitting it
can ever be useful.

> Indeed, and computer algebra systems (the good ones anyway) take great
> care in making the choice of monomial ordering explicit.

And we could also document our choices well.

>> For things like the adjacency matrix, having a
>> consistent if arbitrary total ordering of the vertices is important.
>
> But then an adjacency matrix is something that is defined for a graph
> *PLUS* an ordering of its vertices. A graph only has associated to it
> an "adjacency class", which is the conjugacy class of one of its
> adjacency matrices under permutation matrices, i.e.:
>
> { P.transpose()* adjacency_matrix * P : P in permutation matrices of
> the right size }

I think that this is entirely too pedantic for the problem at hand. I
would like to pick an arbitrary, consistent ordering and stick with
it, to make the user's life easier, and the developer's.

> Python 3 is clear on the matter: If there is not a natural ordering,
> then __lt__() is going to return a type error. In addition, for sets
> "<" is commandeered for a partial ordering, so routines that assume
> that __lt__() is consistent with a total ordering are going to return
> undefined results, and this is documented. I for one would not expect
> "sorted" to work on arbitrary iterables, so I'd be happy with an error
> and disappointed with an undefined result, until I read the relevant
> documentation.

I think this renders the sort function pretty useless. You can't even
sort a list of objects with the same type. I would much rather have a
Sage which sorts lists of things gladly than errors all over itself
all the time.

>> I love this kind of stuff
> [sorted([frozenset,....]) examples]
> this is documented to be undefined.

I know! You've made your point... The fact is that Python is still
ordering the elements of a set which are sets. The "sorted( set(...)
)" examples are not documented to be undefined, and they work.

>> So, given that we want to have graphs whose vertex sets are sets, or
>> subspaces, what is to be done?
>
> Don't rely on sorting, but use other methods for getting consistent
> orderings and efficient membership tests.

Such as?

>> Also, consider the fact that many Sage functions use "return
>> sorted(output)" to guarantee a consistent ordering of the output. What
>> you're advocating means that this wouldn't work in many cases...
>
> Yes. Personally, I would be in favour of actively randomizing the
> order in which results are returned whenever the results are not
> supposed to be returned in any particular order. So many subtle bugs
> creep into code due to people assuming results are returned in some
> particular order and then in some future edition of the code, this
> order is changed and the code unexpectedly breaks.

I think consistent ordering of outputs is not too much to expect,
*especially* within one run of Sage.

> I understand that consistent string representation of output is
> necessary for doctesting, but then you can just do a sort in the
> doctest, in the spirit of:
>
> sage: f = PolynomialRing(ComplexNumbers(100),"x").random_polynomial()
> sage: L = f.roots()
> sage: sorted(L, key = lambda v: [v[0].abs(),v[0].arg()])

It is very difficult to define such a sort (which is consistent) on
arbitrary objects coming in without becoming *very* inefficient. I'm
not trying to demand that all Sage objects always define a total
ordering, I'm just suggesting that since there is an inconsistent
definition of < for Sage sets, and since they get used frequently as
vertices, that we define a total ordering. You yourself admitted that
you can't think of a natural mathematical meaning for < for arbitrary
sets...

> In short:
>  - I don't believe sorting is essential for programming many things
> efficiently

I hear that loud and clear. But that's no reason to make it impossible
to do so, in favor of a more difficult but also arbitrary decision.

>  - I don't think routines should be going out of their way to sort
> their output

No routines? Ever? What if that's part of the design? Are you making a
broad, sweeping design decision about how everything should always be
done?

>  - The reality is that Python will not support total ordering in many
> cases anymore.

That doesn't mean we can't. Sage sets aren't Python sets, and we don't
have to make the same mistakes. We can still be a Python library
without doing the same exact thing as Python all the time. By that
logic, we should delete the Sage Sets class entirely.

Message has been deleted
Message has been deleted
Message has been deleted

Nils Bruin

unread,
Aug 4, 2010, 7:20:25 PM8/4/10
to sage-devel
On Aug 4, 3:14 pm, Robert Miller <r...@rlmiller.org> wrote:
> If Python jumped off a cliff...
Then Sage might as well :-). Or be reimplemented in CL.

> So then you would rather have Sage sets give an error rather than sort
> in a list? I can't imagine why this is a good thing. You seem to be
> ignoring the sorted() function altogether, rather than admitting it
> can ever be useful.

I think it will be confusing if Sage sets and python sets behave
differently in that respect.
That said, your choice is consistent with how python lists are
compared, so as an attempt at total ordering of sets it's probably the
best one.

> I think that this is entirely too pedantic for the problem at hand. I
> would like to pick an arbitrary, consistent ordering and stick with
> it, to make the user's life easier, and the developer's.

Which was my suggestion: Order the vertices by the order in which they
are added. That is a well-defined, consistent choice that does not
require a non-existing total ordering.

Forcing people to be explicit about their choices often seems
pedantic, until one encounters a situation where the choice
unexpectedly makes a difference.

> > Don't rely on sorting, but use other methods for getting consistent
> > orderings and efficient membership tests.
>
> Such as?

"order of introduction" and hashing.

> > sage: f = PolynomialRing(ComplexNumbers(100),"x").random_polynomial()
> > sage: L = f.roots()
> > sage: sorted(L, key = lambda v: [v[0].abs(),v[0].arg()])
>
> It is very difficult to define such a sort (which is consistent) on
> arbitrary objects coming in without becoming *very* inefficient.

But that's why you're explicit about your choice of "key": You only
have to come up with a method that is consistent for your one example,
so that the doctest does the right thing.

> not trying to demand that all Sage objects always define a total
> ordering, I'm just suggesting that since there is an inconsistent
> definition of < for Sage sets, and since they get used frequently as
> vertices, that we define a total ordering. You yourself admitted that
> you can't think of a natural mathematical meaning for < for arbitrary
> sets...

I think you are patching things the wrong way around: We find that
graphs assume the existence of a total ordering on their vertex set,
but that such an ordering need not be implemented. The more natural
way to solve the problem is by looking why graphs make that assumption
and if they have to. Not by trying to endow everything with a total
ordering.

In terms of cost, I think your solution will be more expensive as
well. Consider the lookup of a set V in a sorted list L of sets. This
will cost log(len(L)) set comparisons. For each comparison V < W you
incur len(V)*log(len(V))+len(W)*log(len(W) in sorting and then another
O(len(V)) for finding the smallest element not in both V and W.

So, assuming all sets in L have about the same cardinality as V, we
get a complexity of

O(len(L)*len(V)*log(len(V)))

Compare this to a lookup of a set V in an indexed set L of sets.
Computing the hash of V might take some time; O(len(V)) probably. But
the elements of L should have their hashes already computed and cached
(that would make sense for something like a frozen set). Then you just
have a quick hash table locator, followed by a very small number of
set comparisons (provided a decent hashing strategy is used). So this
is

O(C*len(V))

where C is the number of sets in a hash bin. The "O(1)" theory assumes
hash computation is cheap (hence our "len(V)") and requires that C is
independent of len(L), but I'm not sure this is feasible in practice.

I think this example makes me even more convinced that vertices should
be looked up via hashing and not via sorted lists and hence
strengthens my opinion that "universal total ordering" is a bad thing,
inviting bad programming style.

But then again, I'm not volunteering to implement the required
changes. I think we've had a very healthy discussion and seen some
good arguments on either sides. Now it's up to the painter to choose
the colour of the shed.

Nils Bruin

unread,
Aug 4, 2010, 7:24:05 PM8/4/10
to sage-devel
On Aug 4, 4:20 pm, Nils Bruin <nbr...@sfu.ca> wrote:
> O(len(L)*len(V)*log(len(V)))

sorry. That was a typo. Binary search means

O(log(len(L))*len(V)*log(len(V)))

Nils Bruin

unread,
Aug 4, 2010, 8:03:14 PM8/4/10
to sage-devel
On Aug 4, 3:44 pm, mda_ <donmorri...@gmail.com> wrote:
> >http://docs.python.org/py3k/whatsnew/3.1.html#pep-372-ordered-diction...

Thanks! That makes me want to use Python 3.1.

Robert Miller

unread,
Aug 4, 2010, 8:27:38 PM8/4/10
to sage-...@googlegroups.com
Nils,

First of all, I hope my distaste for the muggy weather hasn't made me
unbearable. 0:-)

When I was referring to things being much more difficult, I was
thinking of implementing a cmp() function that took arbitrary
hashables and defined a total ordering on all of them, consistent
regardless of order of definition or memory location. Not necessary--
see below.

On Wed, Aug 4, 2010 at 7:20 PM, Nils Bruin <nbr...@sfu.ca> wrote:
> On Aug 4, 3:14 pm, Robert Miller <r...@rlmiller.org> wrote:
>> If Python jumped off a cliff...
> Then Sage might as well :-). Or be reimplemented in CL.

Yes! This is the obvious solution! Let's continue this subthread on
sage-flame so we can really express our excitement! ;-)

> Which was my suggestion: Order the vertices by the order in which they
> are added. That is a well-defined, consistent choice that does not
> require a non-existing total ordering.

This is not that objectionable of an approach, given some elbow grease
(see my comments below). The only problem is that if you make a copy
of the graph, for example, and somehow in the process the vertices get
jumbled around (which is the case currently-- somewhere things get
converted to a set or something, and then iterated over), the
adjacency matrix might differ even though the graphs themselves would
be equal.

> Forcing people to be explicit about their choices often seems
> pedantic, until one encounters a situation where the choice
> unexpectedly makes a difference.

I can definitely agree with this sentiment. I think it is what Sage's
graphs should do, *eventually*.

> "order of introduction" and hashing.

Order of introduction makes sense as long as you're careful about it
(see my comment about copying above), but I have a small problem with
hashing -- when you get to collisions, you are back to the same
problem. It is a practical way to make things sort quickly, but the
theoretical/hypothetical problem of how to define the ordering
remains.

>> ordering, I'm just suggesting that since there is an inconsistent
>> definition of < for Sage sets, and since they get used frequently as
>> vertices, that we define a total ordering. You yourself admitted that
>> you can't think of a natural mathematical meaning for < for arbitrary
>> sets...
>
> I think you are patching things the wrong way around: We find that
> graphs assume the existence of a total ordering on their vertex set,
> but that such an ordering need not be implemented. The more natural
> way to solve the problem is by looking why graphs make that assumption
> and if they have to. Not by trying to endow everything with a total
> ordering.

I'm definitely not trying to endow everything with a total ordering.
(I think by saying this we might be converging towards agreement... I
hope) Again, see my comments below.

> In terms of cost, I think your solution will be more expensive as
> well.

I'm not actually advocating using a binary search for anything
specific. I was just bringing it up as an example of why one might
want things sorted. You're right, hashing is much faster.

> I think this example makes me even more convinced that vertices should
> be looked up via hashing and not via sorted lists and hence
> strengthens my opinion that "universal total ordering" is a bad thing,
> inviting bad programming style.
>
> But then again, I'm not volunteering to implement the required
> changes. I think we've had a very healthy discussion and seen some
> good arguments on either sides. Now it's up to the painter to choose
> the colour of the shed.

I appreciate that sentiment greatly. And I want to reiterate the fact
that I am in no way advocating universal total ordering. Let me
summarize what I'd like to do, short term and long term:

For now, I'd like to fix the problem in the short term as in the patch at
http://trac.sagemath.org/sage_trac/ticket/9677

This allows researchers to get on with their research, and me to get
on with my life. It gives the ability to use sets of integers as
vertices in Sage, the way things are now, thus making for much
rejoicing. Perhaps I should also add some caveat to the graph
documentation that a total ordering on the vertices is expected, or
subtle bugs may rear their ugly heads.

I agree with you that it is a bit silly for this assumption to be
there in the first place. However, it is kind of all over the place.
Fixing it should be done incrementally, and over time. There are a
couple facts working in our favor.

1. All vertices need to be hashable. That almost completely defines a
quickly computable arbitrary sort right there, for when it is needed.

2. At least when we are using c_graphs, part of the underlying
implementation includes a bijection from the set of vertices to a
certain set of nonnegative integers, which is defined exactly by order
of insertion. So in that case, we are finished.

3. There are probably several places where we are using sorted() when
we don't really need to be. If you ever decided to grep through the
graphs module and offer any suggestions for improvements, you probably
wouldn't wait too long to see a patch.

Hopefully this all agrees with you, and if not, I guess I can start
learning Lisp...

Message has been deleted

Tim Daly

unread,
Aug 4, 2010, 10:26:56 PM8/4/10
to sage-...@googlegroups.com

mda_ wrote:
>> Hopefully this all agrees with you, and if not, I guess I can start
>> learning Lisp...
>>
>

> My apologies for the cross-posting (I am not yet approved for sage-
> flame)....
>
> http://www.buayacorp.com/wp-content/uploads/2007/10/john-mccarthy-poster1.jpg
>
The fact that you're "doing it" at all implies the imperative which, as
we all know,
is not the way to program (this week).
> (Above: A portrait, and tribute! ;-), to the inventor of lisp!)
>
> http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)
>
>

Message has been deleted

Peter Jeremy

unread,
Aug 5, 2010, 5:06:33 PM8/5/10
to sage-...@googlegroups.com
On 2010-Aug-04 14:51:21 -0700, Nils Bruin <nbr...@sfu.ca> wrote:
>There are other areas of mathematics where "<" gets used for proper
>inclusion. A group theorist is going to be very surprised if for two
>groups H,G, the expression "H < G" is valid but does not mean "H is a
>subgroup of G".

I expect this is primarily due to the restrictions of the ASCII
character set. More appropriate set comparison and manipulation
characters were defined about 50 years ago in APL: ∩ ∪ ⊂ ⊃ ⊆ ⊇
(For those without appropriate glyphs available, these are Unicode
codepoints U+2289, U+228A, U+2282, U+2283, U+2286 and U+2287)

--
Peter Jeremy

Alex Ghitza

unread,
Aug 5, 2010, 7:11:08 PM8/5/10
to Peter Jeremy, sage-...@googlegroups.com

While this is true, I believe Nils' point was that mathematicians
often use H < G to say that H is a subgroup of G, even when they write
such things by hand, where they're not bound by the ASCII character set
(or even Unicode).


Best,
Alex

--
Alex Ghitza -- http://aghitza.org/
Lecturer in Mathematics -- The University of Melbourne -- Australia

Reply all
Reply to author
Forward
Message has been deleted
0 new messages