Manifold equality and UniqueRepresentation

186 views
Skip to first unread message

Eric Gourgoulhon

unread,
Nov 4, 2015, 5:10:51 PM11/4/15
to sage-devel
Hi sage-dev,

There is an ongoing discussion at #18529 about defining __eq__ for manifolds and whether one should
use UniqueRepresentation (which implements equality by id). If anybody has any opinion on this,
she/he is welcome to join the discussion.

The context is as follows:

- Manifolds are parents, whose elements are points.
- Manifolds are characterized by a user-defined atlas, which is a list of coordinate charts defined on the
  fly by the user, along with transition maps. "Defined on the fly" means that charts are added to the
  atlas during the working session. In particular, the atlas is not set at the manifold construction.
- Two points are declared equal iff they have the same coordinates in the same chart.
- Mathematically speaking, two manifolds are equal if they have the same maximal atlas, not the
  same "user-defined" atlas. Now maximal atlases cannot be represented in the computer.
- One needs fast equality check for manifolds, since
  (i) they are used as dictionary keys (for instance as the domains in the dictionary of the restrictions of
      some field)
  (ii) check of chart equality involves comparison of chart domains, which are manifolds (as open subset
      of a manifold), and charts are massively used as keys in the dictionary of coordinate expressions
      of scalar fields
  In the current branch of #18529, the manifold class inherits from UniqueRepresentation, so that
  equality is by id and is therefore fast.
 
Examples of use corresponding to #18529 branch are here (they are rather primitive since there are
neither scalar fields nor continuous maps in #18529).

Thanks for your feedback.

Eric.

Travis Scrimshaw

unread,
Nov 4, 2015, 7:26:08 PM11/4/15
to sage-devel
I should note for those not wanting to read through the discussion in full that we arrived at the current design dealing with the following issues:

- We need the manifold to be defined in order for check the charts (as it is the domain of said map).
- We can't just use EqualityById without UniqueRepresentation because this breaks pickling.
- We can only define manifolds by (latex) name, base ring, and dimension, so this is insufficient data for UniqueRepresentation. So we add an additional unique identifier to the cache key, the time from the epoch the object was created.

We would welcome any suggestions for improvements on ways to work around these issues and still preserve a fast equality check.

Best,
Travis

Nils Bruin

unread,
Nov 4, 2015, 7:57:27 PM11/4/15
to sage-devel
On Wednesday, November 4, 2015 at 2:10:51 PM UTC-8, Eric Gourgoulhon wrote:
Hi sage-dev,

There is an ongoing discussion at #18529 about defining __eq__ for manifolds and whether one should
use UniqueRepresentation (which implements equality by id). If anybody has any opinion on this,
she/he is welcome to join the discussion.

Avoid UniqueRepresentation if you can. It requires expensive processing of construction parameters and hence introduces bad overhead and it introduces "global variables" in a way that is much worse than global variables.

The main reason why UniqueRepresentation is necessary in some cases is because the coercion framework requires it to figure out certain things efficiently. However, this mainly comes in for coercion discovery. I would not expect manifolds to act in any non-trivial way in coercions, so I'm pretty sure this is not forcing you to have it.

In general, I don't think there is a real problem if

A = manifold(<construction parameter set 1)
B = manifold(<construction parameter set 1)

has (A is B) equal to False. I would think it's not really a problem if it has A == B equal to False.

Indeed, this would mean that manifolds stored in separate pickles would be non-equal. However, pickle is smart enough to ensure that

loads(dumps([A,A]))

would again be a sequence of two identical manifolds (they might just be distinct from the A you had originally).

Simon King

unread,
Nov 5, 2015, 4:17:35 AM11/5/15
to sage-...@googlegroups.com
Hi Eric,

On 2015-11-04, Eric Gourgoulhon <egourg...@gmail.com> wrote:
> The context is as follows:
>
> - Manifolds are parents, whose elements are points.
> - Manifolds are characterized by a user-defined atlas, which is a list of
> coordinate charts defined on the
> fly by the user, along with transition maps. "Defined on the fly" means
> that charts are added to the
> atlas during the working session. In particular, the atlas is not set at
> the manifold construction.
> - Two points are declared equal iff they have the same coordinates in the
> same chart.
> - Mathematically speaking, two manifolds are equal if they have the same
> maximal atlas, not the
> same "user-defined" atlas. Now maximal atlases cannot be represented in
> the computer.
> - One needs fast equality check for manifolds, since
> (i) they are used as dictionary keys (for instance as the domains in the
> dictionary of the restrictions of
> some field)
> (ii) check of chart equality involves comparison of chart domains, which
> are manifolds (as open subset
> of a manifold), and charts are massively used as keys in the
> dictionary of coordinate expressions
> of scalar fields

Are you serious? Certainly you are aware that there is no algorithm that
would be able to test whether two arbitrary compact 4-manifolds are
homeomorphic. And if I am not mistaken, the Rubinstein-Thompson
algorithm for the recognition of the 3-sphere is still state of the art,
but it involves solving a double-exponential number (in terms of number of
tetrahedra of a one-vertex triangulation) of NP-complete problems.

In such a situation, I believe a UniqueRepresentation does not make
sense. You may have different input data yielding homeomorphic
manifolds, and it is not feasible to solve a computationally super-hard
problem upon creation of each manifold (basically comparing the new
to-be-created manifold with all previously created manifolds, unless you
come up with a normal form for manifolds).

What you COULD use is CachedRepresentation. Then, identical input data
would result in identical output. But still, non-identical input data
would result in two manifolds that may or may not be homeomorphic.

And using manifolds as dictionary keys seems to me like a very bad idea,
given the above-mentioned problems.

Best regards,
Simon

Simon King

unread,
Nov 5, 2015, 4:32:29 AM11/5/15
to sage-...@googlegroups.com
Hi Travis,

On 2015-11-05, Travis Scrimshaw <tsc...@ucdavis.edu> wrote:
> - We need the manifold to be defined in order for check the charts (as it
> is the domain of said map).
> - We can't just use EqualityById without UniqueRepresentation because this
> breaks pickling.
> - We can only define manifolds by (latex) name, base ring, and dimension,
> so this is insufficient data for UniqueRepresentation. So we add an
> additional unique identifier to the cache key, the time from the epoch the
> object was created.

What about a mixed approach: It seems that you have, on the one hand,
nice manifolds that you know by name (S^n, T^n, RR^n, L(p,q), ...). You
want that manifolds that you know by name are identical if and only if
they have the same name. But you also want to be able to build more
complicated manifolds, where you would soon run into non-decidable
homeomorphism tests.

My suggestion would be:
- Start with CachedRepresentation. In that way, manifolds defined with input will
be identical. This would also cover pickling.
- If the manifold has a name or a normal form, then store it as an attribute
(if I recall correctly, CachedRepresentation stores the input data anyway, so,
you could just use what CachedRepresentation gives you).
- Implement a comparison as follows:
* If L and R are named and have the same name, then they are equal.
* If L and R have a known algorithm for computing a normal form, then
compute the normal form and store the result as an attribute (or
just read the attribute if it exists) and do comparison accordingly.
* Otherwise, it depends what you want. You could just stop and raise
an error, saying that you can't decide equality. Or you could start
a heuristics (such as a simulated annealing algorithm on Pachner
moves for PL-manifolds). Or you could attempt a deterministic
algorithm that may take ages.
In the last two cases, it would be good to print a warning to the
user, at least if verbosity is greater than 0.
- Or IF you really want to use them as dictionary keys, then just let
manifolds compare by identity (so, you allow that two homeomorphic
manifolds evaluate as non-equal), and have a separate method
M.is_homeomorphic(N)
that does the real thing (but is only invoked if the user asks for
it).

In that way, you would have super-fast equality-by-identity in the
obvious cases, but would still have a framework for the difficult cases,
where you postpone all unsolvable problems until the user wants them
solved.

Best regards,
Simon


Simon King

unread,
Nov 5, 2015, 4:35:07 AM11/5/15
to sage-...@googlegroups.com
Hi Nils,

On 2015-11-05, Nils Bruin <nbr...@sfu.ca> wrote:
> In general, I don't think there is a real problem if
>
> A = manifold(<construction parameter set 1)
> B = manifold(<construction parameter set 1)
>
> has (A is B) equal to False. I would think it's not really a problem if it
> has A == B equal to False.

+1

> Indeed, this would mean that manifolds stored in separate pickles would be
> non-equal. However, pickle is smart enough to ensure that
>
> loads(dumps([A,A]))
>
> would again be a sequence of two identical manifolds (they might just be
> distinct from the A you had originally).

And if you use CachedRepresentation, then loads(dumps(A)) would be
identical with A.

Best regards,
Simon


Eric Gourgoulhon

unread,
Nov 5, 2015, 4:58:23 AM11/5/15
to sage-devel
Hi Simon,

Thanks for your feedback!

Le jeudi 5 novembre 2015 10:17:35 UTC+1, Simon King a écrit :
Hi Eric,


Are you serious? Certainly you are aware that there is no algorithm that
would be able to test whether two arbitrary compact 4-manifolds are
homeomorphic. And if I am not mistaken, the Rubinstein-Thompson
algorithm for the recognition of the 3-sphere is still state of the art,
but it involves solving a double-exponential number (in terms of number of
tetrahedra of a one-vertex triangulation) of NP-complete problems.


Yes I know this. We cannot decide the mathematical identity of two
manifolds. What I was speaking about was the definition of the __eq__ method
for the objects representing manifolds in Sage.
 
In such a situation, I believe a UniqueRepresentation does not make
sense. You may have different input data yielding homeomorphic
manifolds, and it is not feasible to solve a computationally super-hard
problem upon creation of each manifold (basically comparing the new
to-be-created manifold with all previously created manifolds, unless you
come up with a normal form for manifolds).


This is precisely why in the current setting the string representing the
manifold's name is part of the manifold's identity, thanks to the
UniqueRepresentation mechanism. The idea is that the user does not give
the same name to two different manifolds. In this way, manifolds are
identified by their names and for this UniqueRepresentation seemed a
good solution.

 
What you COULD use is CachedRepresentation. Then, identical input data
would result in identical output. But still, non-identical input data
would result in two manifolds that may or may not be homeomorphic.

And using manifolds as dictionary keys seems to me like a very bad idea,
given the above-mentioned problems.

If you think about the decomposition of manifolds into parallelizable parts and
the storage of the restrictions of vector fields to these parts, then dictionary with
manifolds as keys are quite natural.

Best regards,

Eric.

Eric Gourgoulhon

unread,
Nov 5, 2015, 5:08:14 AM11/5/15
to sage-devel


Le jeudi 5 novembre 2015 10:32:29 UTC+1, Simon King a écrit :

- Or IF you really want to use them as dictionary keys, then just let
  manifolds compare by identity (so, you allow that two homeomorphic
  manifolds evaluate as non-equal), and have a separate method
     M.is_homeomorphic(N)
  that does the real thing (but is only invoked if the user asks for
  it).

In that way, you would have super-fast equality-by-identity in the
obvious cases, but would still have a framework for the difficult cases,
where you postpone all unsolvable problems until the user wants them
solved.


Yes this is precisely what we had in mind: to delegate the mathematical
comparison of manifolds to a specific method is_homeomorphic, which
in most case will answer "I cannot answer" (cf. comments 39 and 40 in
the ticket).
Then if we have both CachedRepresentation and equality-by-identity, then
we have UniqueRepresentation, don't we?


Simon King

unread,
Nov 5, 2015, 5:23:57 AM11/5/15
to sage-...@googlegroups.com
Hi Eric,

On 2015-11-05, Eric Gourgoulhon <egourg...@gmail.com> wrote:
> This is precisely why in the current setting the string representing the
> manifold's name is part of the manifold's identity, thanks to the=20
> UniqueRepresentation mechanism.

No, this is thanks to the CachedRepresentation mechanism only!
UniqueRepresentation = CachedRepresentation + EqualityById.

Best regards,
Simon

Eric Gourgoulhon

unread,
Nov 5, 2015, 5:28:14 AM11/5/15
to sage-devel


Le jeudi 5 novembre 2015 01:57:27 UTC+1, Nils Bruin a écrit :
On Wednesday, November 4, 2015 at 2:10:51 PM UTC-8, Eric Gourgoulhon wrote:
Hi sage-dev,

There is an ongoing discussion at #18529 about defining __eq__ for manifolds and whether one should
use UniqueRepresentation (which implements equality by id). If anybody has any opinion on this,
she/he is welcome to join the discussion.

Avoid UniqueRepresentation if you can. It requires expensive processing of construction parameters

A priori, in a typical working session, not so many manifolds should be constructed, so this might not be a problem
 
and hence introduces bad overhead and it introduces "global variables" in a way that is much worse than global variables.


Ah? OK.
 
The main reason why UniqueRepresentation is necessary in some cases is because the coercion framework requires it to figure out certain things efficiently. However, this mainly comes in for coercion discovery. I would not expect manifolds to act in any non-trivial way in coercions, so I'm pretty sure this is not forcing you to have it.
 
Thanks for these explanations.


In general, I don't think there is a real problem if

A = manifold(<construction parameter set 1)
B = manifold(<construction parameter set 1)

has (A is B) equal to False. I would think it's not really a problem if it has A == B equal to False.

Indeed, this would mean that manifolds stored in separate pickles would be non-equal. However, pickle is smart enough to ensure that

loads(dumps([A,A]))

would again be a sequence of two identical manifolds (they might just be distinct from the A you had originally).

But, as I said in comment 46 on the ticket, there will be a problem with the _test_pickling method, which is checking
loads(dumps(A)) == A. Shall we redefine _test_pickling, which is inherited form SageObject?
 

Simon King

unread,
Nov 5, 2015, 5:30:08 AM11/5/15
to sage-...@googlegroups.com
Hi Eric,

On 2015-11-05, Eric Gourgoulhon <egourg...@gmail.com> wrote:
> Le jeudi 5 novembre 2015 10:32:29 UTC+1, Simon King a =C3=A9crit :
> Yes this is precisely what we had in mind: to delegate the mathematical
> comparison of manifolds to a specific method is_homeomorphic, which
> in most case will answer "I cannot answer" (cf. comments 39 and 40 in=20
> the ticket).=20
> Then if we have both CachedRepresentation and equality-by-identity, then
> we have UniqueRepresentation, don't we?

*Will* you have equality-by-identity?

Certainly, if "A is B" then "A==B" will be True. Actually I am not sure
if A.__eq__ would be invoked at all if you compare to identical objects
(i.e., I don't know what Python does).

But if A is not B and ask "A==B", then you could say that you will do SOME
effort to test for equality in the sense of homeomorphism. I don't know
if you want to do some heuristics here.

Anyway, if you think that equality-by-identity is the right way of
comparison in your applications, then UniqueRepresentation might be the
correct tool.

Best regards,
Simon

Jeroen Demeyer

unread,
Nov 5, 2015, 6:08:15 AM11/5/15
to sage-...@googlegroups.com
On 2015-11-05 01:26, Travis Scrimshaw wrote:
> So we
> add an additional unique identifier to the cache key, the time from the
> epoch the object was created.

That's an extremely ugly hack. If you absolutely must use a unique
identifier, generate a sufficiently large random number.

Travis Scrimshaw

unread,
Nov 5, 2015, 10:22:10 AM11/5/15
to sage-devel
Hey Simon,

   I would then be advocating for using UniqueRepresentation if that was the only issue. However the problem I see is this (even for using CachedRepresentation). Suppose you created a 2-sphere called 'S', and then you created a manifold called 'M' and worked with that for a while, took a lunch break, came back and did some more work. Then you wanted to create a genus 3 surface called 'S' in the same session. All of a sudden, to your surprise, you have charts that tell you 'S' is a sphere (because you forgot you had one previously defined). This also means we have to have `_clear_cache_()` calls all throughout the doctests.

   So I'm somewhat inclined to use EqualityById and let the pickling be broken (and redefining _test_pickling to do nothing (i.e., passes) with a docstring warning that it is intentionally broken). I'm starting to become really convinced that this is the way to go.

Best,
Travis

Travis Scrimshaw

unread,
Nov 5, 2015, 10:27:28 AM11/5/15
to sage-devel

I agree, it is an ugly hack, but it was one possible solution that I had that would keep pickling and everything else we wanted. I'm guessing a random large number is better because it won't stop working in 2038?

Best,
Travis

Jeroen Demeyer

unread,
Nov 5, 2015, 11:08:11 AM11/5/15
to sage-...@googlegroups.com
On 2015-11-05 16:27, Travis Scrimshaw wrote:
> I'm guessing a
> random large number is better because it won't stop working in 2038?

Mostly, it's better because there is a much smaller chance of
collisions: there is a hidden assumption that there will not be 2
manifolds created with the same timestamp. The chance of this assumption
being true also depends on the accuracy of the Python time function and
the underlying system call, which are big unknowns.

Jeroen.

Nils Bruin

unread,
Nov 5, 2015, 12:54:40 PM11/5/15
to sage-devel
On Thursday, November 5, 2015 at 7:22:10 AM UTC-8, Travis Scrimshaw wrote:
   So I'm somewhat inclined to use EqualityById and let the pickling be broken (and redefining _test_pickling to do nothing (i.e., passes) with a docstring warning that it is intentionally broken). I'm starting to become really convinced that this is the way to go.

I completely agree. There's really no problem if calling "TwoSphere()" two times yields non-equal two-spheres: You've just created two manifolds that are homeomorphic, but since they are different manifolds, there is not a canonical homeomorphism between them, so treating them as distinct objects is quite justified. If you want to keep talking about *the same* manifold.

The fact that in sage QQ['x','y'] and QQ['x','y'] gives back equal polynomial rings is because the correspondence in naming of the generators suggests a distinguished isomorphism between the two polynomial rings (well, that's a mathematical justification. It's mainly convenient). Sage goes a step further by ensuring that the rings are *identical*, which has turned out to have pros and cons.

Clearly, for a manifold there is not a small set of parameters given at construction that, when in agreement, suggest a distinguished isomorphism. So I think the basis for any "CachedRepresentation" is rather weak (and will lead to code the behaviour of which is hard to predict).

Incidentally, I guess your "charts" need to be immutable if you use then as keys, but do your manifolds have to be?

EqualityById objects have relatively little use for being hashable. Certainly as dictionary keys they are hardly useful:

D[eq_by_id_obj] could always be spelled as eq_by_id_obj.D , since you have to have the exact object in hand to look it up in a dictionary anyway.

If you allow for atlasses to be refined and updates, it would be more natural to leave manifolds mutable and hence unhashable (because clearly you're mutating them). By making manifolds hashable you're entering into a contract that doesn't sound so natural (although EqualityById guarantees you'll keep your contract).

Eric Gourgoulhon

unread,
Nov 5, 2015, 1:42:47 PM11/5/15
to sage-devel
For the benefit of the discussion, here is the list of dictionaries involved in the manifold tickets colllected at #18528. Each dictionary is presented in the form

- values | key: the key

- chart restrictions | key: open subset (i.e. manifold)
- transition maps | key: pair of charts
- point coordinates | key: chart
- intersections of a given subset with other subsets | key: string (subset name)
- unions of a given subset with other subsets | key: string (subset name)
- coordinate expressions of a map | key: pair of charts
- map restrictions | key: open subset (i.e. manifold)
- coordinate expressions of a scalar field | key: chart
- scalar field restrictions | key: open subset (i.e. manifold)

For differentiable manifolds, we have in addition

- changes of vector frames | key: pair of vector frames
- modules of vector fields along the manifold | key: differentiable map
- coordinate expressions of the differential of a diff. map | key: pair of charts
- cache of the Lie derivatives of a scalar field | key: id(vector field)
- bases of a tangent space induced by vector frames | key: vector frame
- tensor field restrictions | key: open subset (i.e. manifold)
- components of a tensor field on a parallelizable manifold | key: vector frame
- cache of the Lie derivatives of tensor fields | key: id(tensor field)
- vector frame restrictions | key: open subset (i.e. manifold)
- affine connection coefficients | key: vector frame
- affine connection restrictions | key: open subset (i.e. manifold)
- affine connection 1-forms | key: vector frame
- affine connection torsion 2-forms | key: vector frame
- affine connection curvature 2-forms | key: vector frame
- determinants of a pseudo-Riemannian metric | key: vector frame
- square roots of the absolute values of the determinants of a pseudo-Riemannian metric | key: vector frame

So we see that the objects used as keys are charts, vector frames and manifolds.

Nils Bruin

unread,
Nov 5, 2015, 2:06:09 PM11/5/15
to sage-devel
On Thursday, November 5, 2015 at 10:42:47 AM UTC-8, Eric Gourgoulhon wrote:

So we see that the objects used as keys are charts, vector frames and manifolds.
 
I think that's a very strong reason to go with equality-by-id and *not* unique representation: You really don't want to risk that for such structural information you end up with a key that is accidentally equal to another one.


Eric Gourgoulhon

unread,
Nov 5, 2015, 5:35:57 PM11/5/15
to sage-devel
Thanks for your suggestions.


Le jeudi 5 novembre 2015 18:54:40 UTC+1, Nils Bruin a écrit :
On Thursday, November 5, 2015 at 7:22:10 AM UTC-8, Travis Scrimshaw wrote:
   So I'm somewhat inclined to use EqualityById and let the pickling be broken (and redefining _test_pickling to do nothing (i.e., passes) with a docstring warning that it is intentionally broken). I'm starting to become really convinced that this is the way to go.

I completely agree. There's really no problem if calling "TwoSphere()" two times yields non-equal two-spheres: You've just created two manifolds that are homeomorphic, but since they are different manifolds, there is not a canonical homeomorphism between them, so treating them as distinct objects is quite justified. If you want to keep talking about *the same* manifold.

I agree.
 

The fact that in sage QQ['x','y'] and QQ['x','y'] gives back equal polynomial rings is because the correspondence in naming of the generators suggests a distinguished isomorphism between the two polynomial rings (well, that's a mathematical justification. It's mainly convenient). Sage goes a step further by ensuring that the rings are *identical*, which has turned out to have pros and cons.

Clearly, for a manifold there is not a small set of parameters given at construction that, when in agreement, suggest a distinguished isomorphism.

Indeed.
 
So I think the basis for any "CachedRepresentation" is rather weak (and will lead to code the behaviour of which is hard to predict).

Incidentally, I guess your "charts" need to be immutable if you use then as keys,

Yes charts are immutable: a chart is defined by two parameters, which are passed to the constructor: its domain (a manifold) and its coordinates (a list of symbolic variables). These two parameters do not change during the object life. I think that using UniqueRepresentation for charts is then well justified. Same thing for vector frames.
 
but do your manifolds have to be?

EqualityById objects have relatively little use for being hashable. Certainly as dictionary keys they are hardly useful:

D[eq_by_id_obj] could always be spelled as eq_by_id_obj.D , since you have to have the exact object in hand to look it up in a dictionary anyway.

I am afraid I don't quite understand the above. Let me take an example: the addition of two vector fields, a and b say, on the 2-sphere S^2. To decompose S^2 into parallelizable parts (on which the addition can be performed by adding the vector components in some frame), let us introduce the open cover of S^2 formed by (i) the open subset U, domain of the stereographic chart from the North pole, and (ii) the open subset V, domain of the stereographic chart from the South pole. The addition s = a + b is then performed as follows:

s._restrictions[U] = a._restrictions[U] + b._restrictions[U]                     
s._restrictions[V] = a._restrictions[V] + b._restrictions[V]

Indeed, each vector field on S^2 is represented by its restriction to U and its restriction to V. On U and V, which are endowed with global vector frames (the frames associated with stereographic coordinates), the vector fields are represented by their components w.r.t. these vector frames, so that each of the two above additions is performed by adding the components.

The above example illustrates the use of the dictionary VectorField._restrictions with the manifolds U and V as keys.

Considering your example (taking into account that U is an eq-by-id object), how could we spell a._restrictions[U] as something like U.(a_restrictions) ?


If you allow for atlasses to be refined and updates, it would be more natural to leave manifolds mutable and hence unhashable (because clearly you're mutating them).

Technically speaking, yes, we are mutating them, by adding charts and vector frames. But if we make them eq-by-id, we guarantee that they don't change: they are recognized only by their id, which is constant during all their lives. So why not using them as dictionary keys as in the example above?

By making manifolds hashable you're entering into a contract that doesn't sound so natural (although EqualityById guarantees you'll keep your contract).

Yes we keep the contract. If this does not sound so natural, which alternative would you propose to deal with the restrictions of a vector field to parallelizable open subsets?
 

Nils Bruin

unread,
Nov 5, 2015, 6:16:26 PM11/5/15
to sage-devel
On Thursday, November 5, 2015 at 2:35:57 PM UTC-8, Eric Gourgoulhon wrote:
Yes charts are immutable: a chart is defined by two parameters, which are passed to the constructor: its domain (a manifold) and its coordinates (a list of symbolic variables). These two parameters do not change during the object life. I think that using UniqueRepresentation for charts is then well justified. Same thing for vector frames. 

It indicates they could be if they needed to be. However, making things UniqueRepresentation or CachedRepresentation has a very real cost and makes the job of the garbage collector *much* harder. You can get very subtle memory leaks via weak references once global caches are involved.

So what specifically do you *gain* from making them UniqueRepresentation? Is it ever the case that two charts are constructed independently from each other (with the same construction parameters!) and that it is important that the resulting charts are actually identical? Given that one of the construction parameters is a manifold, which is going to be EqualityById anyway, I expect this will not happen. It'll be much more straightforward to look the chart up on the manifold (where it'll be stored anyway, right?) if you need to retrieve it.
 
EqualityById objects have relatively little use for being hashable. Certainly as dictionary keys they are hardly useful:

D[eq_by_id_obj] could always be spelled as eq_by_id_obj.D , since you have to have the exact object in hand to look it up in a dictionary anyway.

I am afraid I don't quite understand the above.

That's just a general remark that might sometimes be useful to consider. There might be performance/logic considerations that might lead you to prefer using such an object as a key anyway.
 

Considering your example (taking into account that U is an eq-by-id object), how could we spell a._restrictions[U] as something like U.(a_restrictions) ?

Technically, by having a dict or a list on U that, for each vector field in existence, has the restriction to a on it. So you could write it as

U.restrictions_of_vector_fields[s] = U.restrictions_of_vector_fields[a] + V.restrictions_of_vector_fields[b]

I won't claim that doing so is a good idea (in fact, it looks like a horrible idea). However, there are other scenarios where this transposition of index and indexee can be fruitful to consider.

Technically speaking, yes, we are mutating them, by adding charts and vector frames. But if we make them eq-by-id, we guarantee that they don't change: they are recognized only by their id, which is constant during all their lives. So why not using them as dictionary keys as in the example above?

By making manifolds hashable you're entering into a contract that doesn't sound so natural (although EqualityById guarantees you'll keep your contract).

Yes we keep the contract. If this does not sound so natural, which alternative would you propose to deal with the restrictions of a vector field to parallelizable open subsets?

At least one of the alternatives looks horrible, so I would not propose any alternative. It does force your hand on EqById, basically, because otherwise it is nontrivial to keep the contract.

Nils Bruin

unread,
Nov 5, 2015, 6:21:56 PM11/5/15
to sage-devel
On Thursday, November 5, 2015 at 2:30:08 AM UTC-8, Simon King wrote:
 
Certainly, if "A is B" then "A==B" will be True. Actually I am not sure
if A.__eq__ would be invoked at all if you compare to identical objects
(i.e., I don't know what Python does).

This is famous in Python:

sage: NAN = float("NaN")
sage: NAN == NAN
False

Dictionary lookup shortcuts equality testing  on identical keys, leading to the surprising:

sage: D={NAN: 1}
sage: D[NAN]
1
sage: D[float("NaN")]
KeyError: nan
sage: { NAN, NAN, float("NaN")}
{nan, nan}

(i.e., even for a key that is not equal to itself, it is still possible to retrieve the value under that key using *exactly* that key)

Eric Gourgoulhon

unread,
Nov 6, 2015, 5:58:24 AM11/6/15
to sage-devel
Hi,

Le vendredi 6 novembre 2015 00:16:26 UTC+1, Nils Bruin a écrit :
On Thursday, November 5, 2015 at 2:35:57 PM UTC-8, Eric Gourgoulhon wrote:
Yes charts are immutable: a chart is defined by two parameters, which are passed to the constructor: its domain (a manifold) and its coordinates (a list of symbolic variables). These two parameters do not change during the object life. I think that using UniqueRepresentation for charts is then well justified. Same thing for vector frames. 

It indicates they could be if they needed to be. However, making things UniqueRepresentation or CachedRepresentation has a very real cost and makes the job of the garbage collector *much* harder. You can get very subtle memory leaks via weak references once global caches are involved.

So what specifically do you *gain* from making them UniqueRepresentation?

From what you say, not much I presume... Actually, making charts UniqueRepresentation was a quick way to ensure both equality by id and passed _test_pickling. I understand that we should make them only EqualityById (and redefine _test_pickling). As you see from the above list of dictionaries, it is important to have fast __eq__ for charts and EqualityById is offering this.

However, your example with NAN in the reply to Simon shows that dictionary lookup shortcuts equality testing  on identical keys. If I understand correctly, this means that even if we had a slow __eq__ for charts, as soon as we write
point._coordinates[X] with the chart X that is the same object as the one used to create the entry in the dictionary point._coordinates, then we would have fast access to the coordinates? Could this depend on the Python version/implementation? (i.e. can we rely on this for the future?). If yes, we may even abandon EqualityById for charts.
 
Is it ever the case that two charts are constructed independently from each other (with the same construction parameters!) and that it is important that the resulting charts are actually identical?

No, you are right.

Given that one of the construction parameters is a manifold, which is going to be EqualityById anyway, I expect this will not happen. It'll be much more straightforward to look the chart up on the manifold (where it'll be stored anyway, right?

Yes
 
) if you need to retrieve it.
 

I won't claim that doing so is a good idea (in fact, it looks like a horrible idea). However, there are other scenarios where this transposition of index and indexee can be fruitful to consider.


Thanks for these explanations.

Best wishes,
Eric. 

mmarco

unread,
Nov 6, 2015, 6:51:40 AM11/6/15
to sage-devel
But the random number generator suffers from the same issues: how random it really is deppends on the platform. I would suggest joining both ideas: random number plus time stamp.

Simon King

unread,
Nov 6, 2015, 7:39:16 AM11/6/15
to sage-...@googlegroups.com
Hi Travis,

On 2015-11-05, Travis Scrimshaw <tsc...@ucdavis.edu> wrote:
> I would then be advocating for using UniqueRepresentation if that was
> the only issue.

It really depends whether in comparing manifolds you would prefer to do *some*
heuristics to detect homeomorphic manifolds, or prefer to consider
manifolds unequal if they were created with different input data.

> However the problem I see is this (even for using
> CachedRepresentation). Suppose you created a 2-sphere called 'S', and then
> you created a manifold called 'M' and worked with that for a while, took a
> lunch break, came back and did some more work. Then you wanted to create a
> genus 3 surface called 'S' in the same session. All of a sudden, to your
> surprise, you have charts that tell you 'S' is a sphere (because you forgot
> you had one previously defined). This also means we have to have
> `_clear_cache_()` calls all throughout the doctests.

Pardon? The variable name of a manifold will certainly not be part of the input
data.

What I rather mean is this:
The three-dimensional sphere can be constructed, for example,
- as the boundary of the four-dimensional unit ball,
- by gluing two 3-balls together,
- by a Heegaard diagram (infinitely many possibilities)

1. Do you want that all possible ways to construct a 3-sphere yield the same
object? Not feasible.

2. Do you want that distinct constructions yield distinct (but potentially
homeomorphic) objects, whereas two equal Heegaard diagrams will result in
the same object? Feasible.
Then, do you want that comparison of manifolds does some heuristic of
solving the homeomorphism problem?
a) If you do, then use CachedRepresentation
b) If you don't, use UniqueRepresentation

3. Assume that you construct a three-manifold out of a Heegaard diagram,
and then do the same construction again. Would you accept that the
result of the two constructions are two distinct objects? Then don't
use CachedRepresentation. Do you, in addition, accept that the two
distinct objects evaluate unequal, even though they obviously are
equal? Then you can use EqualityById.

Best regards,
Simon

Travis Scrimshaw

unread,
Nov 6, 2015, 10:21:44 AM11/6/15
to sage-devel
Hey Simon,


On 2015-11-05, Travis Scrimshaw <tsc...@ucdavis.edu> wrote:
>    I would then be advocating for using UniqueRepresentation if that was
> the only issue.

It really depends whether in comparing manifolds you would prefer to do *some*
heuristics to detect homeomorphic manifolds, or prefer to consider
manifolds unequal if they were created with different input data.

IMO, such checks are too expensive to be done at creation. For example, you want do so some tests over all n-spheres.
   Granted, there are nice and well-defined inputs for 2d and 3d manifolds, which would make good sense to do Cached/UniqueRepresentation around these structures. For a general (topological) manifold where there are not necessarily such nice inputs, what would you give as input data?

Side note, I think it would be good to have a class for Heegaard diagrams and the associated 3 manifolds if there was a feasible way of working with these on a computer.

Best,
Travis


Nils Bruin

unread,
Nov 6, 2015, 11:32:49 AM11/6/15
to sage-devel
On Friday, November 6, 2015 at 2:58:24 AM UTC-8, Eric Gourgoulhon wrote:

However, your example with NAN in the reply to Simon shows that dictionary lookup shortcuts equality testing  on identical keys. If I understand correctly, this means that even if we had a slow __eq__ for charts, as soon as we write
point._coordinates[X] with the chart X that is the same object as the one used to create the entry in the dictionary point._coordinates, then we would have fast access to the coordinates? Could this depend on the Python version/implementation? (i.e. can we rely on this for the future?). If yes, we may even abandon EqualityById for charts.
 
I'm pretty sure that this is an implementation detail of CPython. Also, the dict in CPython is designed to do very few equality tests if you look up a key with a hash value that only occurs once or not at all for the keys present in the dictionary, but the number is not zero.

Furthermore, if X and Y happen to be non-identical but "equal" charts, would you want points._coordinates[X] be identical to points_coordinates[Y] ? Would that ever be useful? EqualityById is easy and cheap to have. It basically restores the default equality- and hashing properties of Python's objects. I don't think it's something you should particularly avoid having.

Eric Gourgoulhon

unread,
Nov 6, 2015, 2:46:18 PM11/6/15
to sage-devel

Le vendredi 6 novembre 2015 17:32:49 UTC+1, Nils Bruin a écrit :

Furthermore, if X and Y happen to be non-identical but "equal" charts, would you want points._coordinates[X] be identical to points_coordinates[Y] ? Would that ever be useful?

Not much: in the current setting, we should not have two copies of the same mathematical chart on the same manifold, i.e. if we have X == Y this is because X is Y.
 
EqualityById is easy and cheap to have. It basically restores the default equality- and hashing properties of Python's objects. I don't think it's something you should particularly avoid having.

OK. Let us use EqualityById for charts then!
Thanks.

Eric.

Simon King

unread,
Nov 7, 2015, 9:16:38 AM11/7/15
to sage-...@googlegroups.com
Hi Travis,

On 2015-11-06, Travis Scrimshaw <tsc...@ucdavis.edu> wrote:
>> On 2015-11-05, Travis Scrimshaw <tsc...@ucdavis.edu <javascript:>> wrote:
>> > I would then be advocating for using UniqueRepresentation if that was
>> > the only issue.
>>
>> It really depends whether in comparing manifolds you would prefer to do
>> *some*
>> heuristics to detect homeomorphic manifolds, or prefer to consider
>> manifolds unequal if they were created with different input data.
>>
>
> IMO, such checks are too expensive to be done at creation. For example, you
> want do so some tests over all n-spheres.

I did not say anything about whether the tests should be done at
creation time or not. I was talking about "comparing manifolds", which
obviously means "comparing manifolds AFTER creation"-

If you want that two distinct homeomorphic manifolds may evaluate equal then
you must not use UniqueRepresentation. That's all.

> Granted, there are nice and well-defined inputs for 2d and 3d
> manifolds, which would make good sense to do Cached/UniqueRepresentation
> around these structures. For a general (topological) manifold where there
> are not necessarily such nice inputs, what would you give as input data?

Whatever you currently give. An atlas. A simplicial complex. A system of
multivariate polynomial equations.

Here is an example in terms of "polynomial equations" as input data.

The first question is to you whether you want that the same set of equations
yields identical manifolds or distinct manifolds. The second question is whether
you want that the manifolds are identical if and only if the two systems of
equations have the same solutions. It depends on your answers whether to
use CachedRepresentation, UniqueRepresentation or nothing.

Best regards,
Simon

Travis Scrimshaw

unread,
Nov 7, 2015, 6:28:36 PM11/7/15
to sage-devel
Hey Simon,


>> >    I would then be advocating for using UniqueRepresentation if that was
>> > the only issue.
>>
>> It really depends whether in comparing manifolds you would prefer to do
>> *some*
>> heuristics to detect homeomorphic manifolds, or prefer to consider
>> manifolds unequal if they were created with different input data.
>>
>
> IMO, such checks are too expensive to be done at creation. For example, you
> want do so some tests over all n-spheres.

I did not say anything about whether the tests should be done at
creation time or not. I was talking about "comparing manifolds", which
obviously means "comparing manifolds AFTER creation"-

Ah, I'm sorry, I misunderstood what you were asking.
 
If you want that two distinct homeomorphic manifolds may evaluate equal then
you must not use UniqueRepresentation. That's all.

IMO, homeomorphism is too strong of a test for __eq__, which should be kept relatively fast.

>    Granted, there are nice and well-defined inputs for 2d and 3d
> manifolds, which would make good sense to do Cached/UniqueRepresentation
> around these structures. For a general (topological) manifold where there
> are not necessarily such nice inputs, what would you give as input data?

Whatever you currently give. An atlas. A simplicial complex. A system of
multivariate polynomial equations.

Here is an example in terms of "polynomial equations" as input data.

The first question is to you whether you want that the same set of equations
yields identical manifolds or distinct manifolds. The second question is whether
you want that the manifolds are identical if and only if the two systems of
equations have the same solutions. It depends on your answers whether to
use CachedRepresentation, UniqueRepresentation or nothing.

   I would probably take UniqueRepresentation and have a method is_homeomorphic() as the equality check would probably be too computationally expensive. I'm thinking what graphs do with a faster equality check an a separate is_isomorphic method.

Best,
Travis


David Roe

unread,
Nov 8, 2015, 1:32:48 AM11/8/15
to sage-devel
I think Nils' point about the downsides of having a cache is one reason not to use UniqueRepresentation, both because it can generate references to objects you might want to forget, and because sometimes you just want to start with two three dimensional manifolds, specifying distinct atlases as you go.  If you use unique representation, you might accidentally get the same object.

 Why not just inherit from :class:`~sage.misc.fast_methods.WithEqualityById` ?  That seems to have what you need.  Yes, the standard pickling test will be broken, but I think that's fine for your application.
David

Best,
Travis
 


--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Eric Gourgoulhon

unread,
Nov 8, 2015, 4:00:34 AM11/8/15
to sage-devel, roed...@gmail.com
Hi,


Le dimanche 8 novembre 2015 07:32:48 UTC+1, David Roe a écrit :


I think Nils' point about the downsides of having a cache is one reason not to use UniqueRepresentation, both because it can generate references to objects you might want to forget, and because sometimes you just want to start with two three dimensional manifolds, specifying distinct atlases as you go.  If you use unique representation, you might accidentally get the same object.

 Why not just inherit from :class:`~sage.misc.fast_methods.WithEqualityById` ?  That seems to have what you need.  Yes, the standard pickling test will be broken, but I think that's fine for your application.
David

Yes, from the above discussions, I am convinced that this is the route to go: have manifolds and charts inherit from WithEqualityById, possibly adding a method is_isomorphic(), which could return True in a few easy cases, keeping in mind that most cases cannot be solved.

Best wishes,

Eric.
 

Eric Gourgoulhon

unread,
Nov 9, 2015, 6:07:31 AM11/9/15
to sage-devel, roed...@gmail.com
Thank you all for the discussion and recommendations ! (it's nice to develop within such a responsive community)

I've updated the ticket #18529 accordingly, removing UniqueRepresentation for manifolds and charts and letting them inherit directly from WithEqualityById. See comment 52 on the ticket for details.

I've also updated the follow-up tickets along the same lines:

- #18640 : the algebra of scalar fields on a topological manifold does no longer inherit from UniqueRepresentation, but from WithEqualityById
- #18725 : same thing for the manifold homsets

Best wishes,

Eric.
Reply all
Reply to author
Forward
0 new messages