On Mon, Jul 12, 2010 at 7:49 PM, Chris Godsil <cgo...@uwaterloo.ca> wrote:
> ... its the complicated vertices that are causing the problem, I expect.
The problem is that the comparison operators in Python for sets
implement the subset notion, and thus do not provide a proper sorting
of a list of sets:
http://docs.python.org/library/sets.html#set-objects
IMHO, these would be better as .is_subset() methods, etc.
Sage's adjacency matrix command uses vertices(), which uses comparison
operators to give an ordering of the vertices. However, since "subset"
does not define a total ordering, it does not give a good way of
sorting vertices. One good short term solution is to use tuples
instead, which are ordered lexicographically.
Some possible solutions:
1. Sage graphs should have a way of specifying a user-defined
comparison function for sorting vertices. Any time you use sets as
vertices, you would also do something like
sage: def cmp(foo, bar):
... # some total ordering on foo and bar...
sage: G.set_cmp_verts(cmp)
sage: G.vertices()
[consistently sorted list of vertices]
Or even easier:
sage: G.set_cmp_verts(sets=True)
2. We could be a bit smarter and provide some alternative sort
function in the vertices() command, but it would seem as if no matter
which function we wrote, a user could come up with another object
which would implement < etc in such a way as to still break it.
Thoughts? Python gurus, is there a set object which has the right kind
of __cmp__ for this use?
--
Robert L. Miller
http://www.rlmiller.org/
http://trac.sagemath.org/sage_trac/ticket/9677
So you want Sage Sets to implement "a < b" to mean "a is a subset of
b"? I'll admit that that is reasonable, and it is a fact that it
follows Python convention. But I think that the Python convention is
bizarre, especially given how they implement sorting lists. I would
also rather the sort function return an error than being undefined,
but as far as I can tell, this "feature" is here to stay. (Actually I
think it would be nice if sorted(L) returned a sort of L, but clearly
that's too much to ask.) I took a look at Python 3.0 and the situation
gets more extreme. Heterogeneous arrays are no longer sortable at all,
and sorted() still uses "<", which is still "subset" for sets. So the
Python devs are valuing sorting less and less and less.
In Sage, we have a bizarre mixture. Python sets act as above. Sage
sets have the horrible A < B unless A = B. Subspaces of a vector
space, however, order lexiographically, and not by inclusion. The
reasons given being a justification for that. I have to say I disagree
with the implicit statement I am reading in your message which is that
"<" can only be "decidedly mathematical" if we define it to mean
subset. Groebner bases are also decidedly mathematical, and define all
sorts of arbitrary total orderings...
Maybe if I explain the use cases I have in mind you will have more
sympathy, or maybe someone can suggest a good long term solution. Many
graphs in algebraic graph theory have interesting vertex sets. Some
can be collections of subsets of a certain set, or subspaces of a
certain space, or even more crazily, sets of sets. As it is right now,
Sage graphs allow anything hashable as vertices, with no restrictions.
I view this as important, since graphs themselves are very flexible
mathematical objects. For things like the adjacency matrix, having a
consistent if arbitrary total ordering of the vertices is important.
Now if we allow Sage objects to implement total orderings in general,
then we can just note to avoid using Python sets as vertex sets, since
we can't do anything about that.
On the other hand, if the __lt__ etc. methods implement orderings that
aren't total orderings, sorting gets very messy. In particular, users
will be complaining about all sorts of Sage objects falling into the
same trap where sorted(L) returns a list which isn't consistent. (Or
maybe I'm the only person here who cares...)
As a thought exercise, let's persue the partial ordering idea, fixing
Python's frozensets as our vertices. (Frozen so that they can hash,
Python so that we can't really change the "<" convention even if we
wanted to.) Imagine the following situation. We want to define a
consistent ordering on the vertices, so we use their hashes. Now, we
come to a collision. What to do next? We need to define *some*
consistent ordering so that operations like adjacency matrix, etc.
give consistent, sensical answers. It is not clear to me what to do
when we get hash collisions.
Furthermore, Sage graphs will vastly slow down if we implement a
custom sorting function which tries to deal with this case by case.
On another angle, Python has clearly dealt with this problem
internally, but they don't seem willing to share this with their
users. Python sets must contain only hashable elements. This is
because Python is using hashes to test containment and equality. I
love this kind of stuff
{{{
>>> sorted( [ frozenset([2,3]), frozenset([1,2]) ] )
[frozenset([2, 3]), frozenset([1, 2])]
>>> sorted( [ frozenset([1,2]), frozenset([2,3]) ] )
[frozenset([1, 2]), frozenset([2, 3])]
>>> sorted( set( [ frozenset([1,2]), frozenset([2,3]) ] ) )
[frozenset([1, 2]), frozenset([2, 3])]
>>> sorted( set( [ frozenset([2,3]), frozenset([1,2]) ] ) )
[frozenset([1, 2]), frozenset([2, 3])]
}}}
But, they don't seem to have *acutally* dealt with the problem:
{{{
>>> a = frozenset([1,2])
>>> b = frozenset([2,3])
>>> c = frozenset([3,4])
>>> A = frozenset([a,b])
>>> B = frozenset([b,c])
>>> sorted(set([A,B])) == [A,B]
True
>>> sorted(set([B,A])) == [B,A]
True
}}}
So, given that we want to have graphs whose vertex sets are sets, or
subspaces, what is to be done? I don't want to implement sorting of
every different type of object in the graph code, I'd much rather have
things be sane from the start.
If Python jumped off a cliff...
> There are other areas of mathematics where "<" gets used for proper
> inclusion. A group theorist is going to be very surprised if for two
> groups H,G, the expression "H < G" is valid but does not mean "H is a
> subgroup of G".
But if you ask a good mathematician, given sets A and B, is A less
than B or is it true that A < B, they will ask, what do you mean by
"<"?
> In the category of sets (finite sets for that matter), I cannot think
> of a meaning for "<".
So then you would rather have Sage sets give an error rather than sort
in a list? I can't imagine why this is a good thing. You seem to be
ignoring the sorted() function altogether, rather than admitting it
can ever be useful.
> Indeed, and computer algebra systems (the good ones anyway) take great
> care in making the choice of monomial ordering explicit.
And we could also document our choices well.
>> For things like the adjacency matrix, having a
>> consistent if arbitrary total ordering of the vertices is important.
>
> But then an adjacency matrix is something that is defined for a graph
> *PLUS* an ordering of its vertices. A graph only has associated to it
> an "adjacency class", which is the conjugacy class of one of its
> adjacency matrices under permutation matrices, i.e.:
>
> { P.transpose()* adjacency_matrix * P : P in permutation matrices of
> the right size }
I think that this is entirely too pedantic for the problem at hand. I
would like to pick an arbitrary, consistent ordering and stick with
it, to make the user's life easier, and the developer's.
> Python 3 is clear on the matter: If there is not a natural ordering,
> then __lt__() is going to return a type error. In addition, for sets
> "<" is commandeered for a partial ordering, so routines that assume
> that __lt__() is consistent with a total ordering are going to return
> undefined results, and this is documented. I for one would not expect
> "sorted" to work on arbitrary iterables, so I'd be happy with an error
> and disappointed with an undefined result, until I read the relevant
> documentation.
I think this renders the sort function pretty useless. You can't even
sort a list of objects with the same type. I would much rather have a
Sage which sorts lists of things gladly than errors all over itself
all the time.
>> I love this kind of stuff
> [sorted([frozenset,....]) examples]
> this is documented to be undefined.
I know! You've made your point... The fact is that Python is still
ordering the elements of a set which are sets. The "sorted( set(...)
)" examples are not documented to be undefined, and they work.
>> So, given that we want to have graphs whose vertex sets are sets, or
>> subspaces, what is to be done?
>
> Don't rely on sorting, but use other methods for getting consistent
> orderings and efficient membership tests.
Such as?
>> Also, consider the fact that many Sage functions use "return
>> sorted(output)" to guarantee a consistent ordering of the output. What
>> you're advocating means that this wouldn't work in many cases...
>
> Yes. Personally, I would be in favour of actively randomizing the
> order in which results are returned whenever the results are not
> supposed to be returned in any particular order. So many subtle bugs
> creep into code due to people assuming results are returned in some
> particular order and then in some future edition of the code, this
> order is changed and the code unexpectedly breaks.
I think consistent ordering of outputs is not too much to expect,
*especially* within one run of Sage.
> I understand that consistent string representation of output is
> necessary for doctesting, but then you can just do a sort in the
> doctest, in the spirit of:
>
> sage: f = PolynomialRing(ComplexNumbers(100),"x").random_polynomial()
> sage: L = f.roots()
> sage: sorted(L, key = lambda v: [v[0].abs(),v[0].arg()])
It is very difficult to define such a sort (which is consistent) on
arbitrary objects coming in without becoming *very* inefficient. I'm
not trying to demand that all Sage objects always define a total
ordering, I'm just suggesting that since there is an inconsistent
definition of < for Sage sets, and since they get used frequently as
vertices, that we define a total ordering. You yourself admitted that
you can't think of a natural mathematical meaning for < for arbitrary
sets...
> In short:
> - I don't believe sorting is essential for programming many things
> efficiently
I hear that loud and clear. But that's no reason to make it impossible
to do so, in favor of a more difficult but also arbitrary decision.
> - I don't think routines should be going out of their way to sort
> their output
No routines? Ever? What if that's part of the design? Are you making a
broad, sweeping design decision about how everything should always be
done?
> - The reality is that Python will not support total ordering in many
> cases anymore.
That doesn't mean we can't. Sage sets aren't Python sets, and we don't
have to make the same mistakes. We can still be a Python library
without doing the same exact thing as Python all the time. By that
logic, we should delete the Sage Sets class entirely.
First of all, I hope my distaste for the muggy weather hasn't made me
unbearable. 0:-)
When I was referring to things being much more difficult, I was
thinking of implementing a cmp() function that took arbitrary
hashables and defined a total ordering on all of them, consistent
regardless of order of definition or memory location. Not necessary--
see below.
On Wed, Aug 4, 2010 at 7:20 PM, Nils Bruin <nbr...@sfu.ca> wrote:
> On Aug 4, 3:14 pm, Robert Miller <r...@rlmiller.org> wrote:
>> If Python jumped off a cliff...
> Then Sage might as well :-). Or be reimplemented in CL.
Yes! This is the obvious solution! Let's continue this subthread on
sage-flame so we can really express our excitement! ;-)
> Which was my suggestion: Order the vertices by the order in which they
> are added. That is a well-defined, consistent choice that does not
> require a non-existing total ordering.
This is not that objectionable of an approach, given some elbow grease
(see my comments below). The only problem is that if you make a copy
of the graph, for example, and somehow in the process the vertices get
jumbled around (which is the case currently-- somewhere things get
converted to a set or something, and then iterated over), the
adjacency matrix might differ even though the graphs themselves would
be equal.
> Forcing people to be explicit about their choices often seems
> pedantic, until one encounters a situation where the choice
> unexpectedly makes a difference.
I can definitely agree with this sentiment. I think it is what Sage's
graphs should do, *eventually*.
> "order of introduction" and hashing.
Order of introduction makes sense as long as you're careful about it
(see my comment about copying above), but I have a small problem with
hashing -- when you get to collisions, you are back to the same
problem. It is a practical way to make things sort quickly, but the
theoretical/hypothetical problem of how to define the ordering
remains.
>> ordering, I'm just suggesting that since there is an inconsistent
>> definition of < for Sage sets, and since they get used frequently as
>> vertices, that we define a total ordering. You yourself admitted that
>> you can't think of a natural mathematical meaning for < for arbitrary
>> sets...
>
> I think you are patching things the wrong way around: We find that
> graphs assume the existence of a total ordering on their vertex set,
> but that such an ordering need not be implemented. The more natural
> way to solve the problem is by looking why graphs make that assumption
> and if they have to. Not by trying to endow everything with a total
> ordering.
I'm definitely not trying to endow everything with a total ordering.
(I think by saying this we might be converging towards agreement... I
hope) Again, see my comments below.
> In terms of cost, I think your solution will be more expensive as
> well.
I'm not actually advocating using a binary search for anything
specific. I was just bringing it up as an example of why one might
want things sorted. You're right, hashing is much faster.
> I think this example makes me even more convinced that vertices should
> be looked up via hashing and not via sorted lists and hence
> strengthens my opinion that "universal total ordering" is a bad thing,
> inviting bad programming style.
>
> But then again, I'm not volunteering to implement the required
> changes. I think we've had a very healthy discussion and seen some
> good arguments on either sides. Now it's up to the painter to choose
> the colour of the shed.
I appreciate that sentiment greatly. And I want to reiterate the fact
that I am in no way advocating universal total ordering. Let me
summarize what I'd like to do, short term and long term:
For now, I'd like to fix the problem in the short term as in the patch at
http://trac.sagemath.org/sage_trac/ticket/9677
This allows researchers to get on with their research, and me to get
on with my life. It gives the ability to use sets of integers as
vertices in Sage, the way things are now, thus making for much
rejoicing. Perhaps I should also add some caveat to the graph
documentation that a total ordering on the vertices is expected, or
subtle bugs may rear their ugly heads.
I agree with you that it is a bit silly for this assumption to be
there in the first place. However, it is kind of all over the place.
Fixing it should be done incrementally, and over time. There are a
couple facts working in our favor.
1. All vertices need to be hashable. That almost completely defines a
quickly computable arbitrary sort right there, for when it is needed.
2. At least when we are using c_graphs, part of the underlying
implementation includes a bijection from the set of vertices to a
certain set of nonnegative integers, which is defined exactly by order
of insertion. So in that case, we are finished.
3. There are probably several places where we are using sorted() when
we don't really need to be. If you ever decided to grep through the
graphs module and offer any suggestions for improvements, you probably
wouldn't wait too long to see a patch.
Hopefully this all agrees with you, and if not, I guess I can start
learning Lisp...
mda_ wrote:
>> Hopefully this all agrees with you, and if not, I guess I can start
>> learning Lisp...
>>
>
> My apologies for the cross-posting (I am not yet approved for sage-
> flame)....
>
> http://www.buayacorp.com/wp-content/uploads/2007/10/john-mccarthy-poster1.jpg
>
The fact that you're "doing it" at all implies the imperative which, as
we all know,
is not the way to program (this week).
> (Above: A portrait, and tribute! ;-), to the inventor of lisp!)
>
> http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)
>
>
I expect this is primarily due to the restrictions of the ASCII
character set. More appropriate set comparison and manipulation
characters were defined about 50 years ago in APL: ∩ ∪ ⊂ ⊃ ⊆ ⊇
(For those without appropriate glyphs available, these are Unicode
codepoints U+2289, U+228A, U+2282, U+2283, U+2286 and U+2287)
--
Peter Jeremy
While this is true, I believe Nils' point was that mathematicians
often use H < G to say that H is a subgroup of G, even when they write
such things by hand, where they're not bound by the ASCII character set
(or even Unicode).
Best,
Alex
--
Alex Ghitza -- http://aghitza.org/
Lecturer in Mathematics -- The University of Melbourne -- Australia