Caching considered controversial

204 views
Skip to first unread message

Nils Bruin

unread,
Sep 13, 2012, 12:19:55 PM9/13/12
to sage-devel
There seems to be a big push to use caching in a lot of places in
sage. There, a problem arises from the fact that sage's conventions
for equality tests are rather liberal.

Obviously, one should only cache functions that solely depend on the
equality class of their argument. That's probably less often true than
you think it is, due to sage's liberal approach to defining when
things are equal:

sage: a=3
sage: b=GF(5)(3)
sage: @cached_function
....: def test(a):
....: return a^2
....:
sage: [test(a),test(b)]
[9, 9]
sage: test.clear_cache()
sage: [test(b),test(a)]
[4, 4]

For imprecise objects this is particularly tricky because equality is
usually only up to the lower common precision:

sage: R.<t>=QQ[[]]
sage: A=matrix(R,2,2,[1+O(t),1+O(t),1+O(t),1+t+O(t)])
sage: A.set_immutable()
sage: B=matrix(R,2,2,[1+O(t^2),1+O(t^2),1+O(t^2),1+t+O(t^2)])
sage: B.set_immutable()
sage: A==B
True

The fact that this seems to work:

sage: @cached_function
....: def DET(M):
....: return det(M)
....:
sage: DET(A)
O(t^1)
sage: DET(B)
t + O(t^2)

is only because hashes for these objects do not respect equality:

sage: hash(A)
0
sage: hash(B)
44546084519852655

However, as soon as the cache of DICT gets resized so that those two
hashes are mapped to the same bucket, it'll be a toss-up which result
DET will give, for both of them.

Simon King

unread,
Sep 13, 2012, 12:45:33 PM9/13/12
to sage-...@googlegroups.com
Hi Nils,

On 2012-09-13, Nils Bruin <nbr...@sfu.ca> wrote:
> There seems to be a big push to use caching in a lot of places in
> sage. There, a problem arises from the fact that sage's conventions
> for equality tests are rather liberal.

This of course isn't an argument against caching in general. Actually,
from the example you gave with data that evaluate equal but have unequal
hash, I'd rather conclude that such data must not be hashable.

However, using GF(2)(1) and GF(3)(1) together with ZZ(1) as keys for the
cache is a bad idea, because equality is not necessarily transitive.

I think, if one wants to create a unique parent, then either the __init__
method should raise an error if an integer is expected but a finite field
element is given, or one should pre-process the arguments. That is
possible by customising UniqueRepresentation.__classcall__, or by using
UniqueFactory. The pre-processing should turn the finite field elements
into proper integers.

As you probably know, the problem of "bad keys" was one of the reasons for
#715 and #12313: In coercion, it is a good idea to use dictionaries that
compare their keys by identity and not equality. The main reason was to fix
memory leaks, but that is a different story.

Best regards,
Simon

Volker Braun

unread,
Sep 13, 2012, 12:51:02 PM9/13/12
to sage-...@googlegroups.com
On Thursday, September 13, 2012 5:19:57 PM UTC+1, Nils Bruin wrote:
sage: A==B
True
sage: hash(A)
0
sage: hash(B)
44546084519852655 

Really, the issue is that the objects violate the contract on object.__hash__():

"The only required property is that objects which compare equal have the same hash value"


Nils Bruin

unread,
Sep 13, 2012, 6:06:48 PM9/13/12
to sage-devel
On Sep 13, 9:51 am, Volker Braun <vbraun.n...@gmail.com> wrote:
> On Thursday, September 13, 2012 5:19:57 PM UTC+1, Nils Bruin wrote:
> > sage: A==B
> > True
> > sage: hash(A)
> > 0
> > sage: hash(B)
> > 44546084519852655
>
> Really, the issue is that the objects violate the contract on
> object.__hash__():

Well, that is a problem, but not the main one I wanted to point out.
It's not so far-fetched to think that some-one might think it's a good
idea to put a caching determinant function on a matrix ring. After
all, people might well create non-identical but equal matrices and
then it would be great if determinant computations can be reused,
right? That idea would probably even work quite well (except for the
cases where someone wants to create lots of non-equal matrices and
compute their determinants. Then there's just a memory leak without
computational advantage), until someone instantiates the matrix ring
over, say, a power series ring. Apart from the hashing irregularity,
now people may well get wrong answers back! (the most dangerous being
a high precision matrix followed by a matrix with insufficient
precision, but equal to the previous one).

The more appropriate design (and the one followed at the moment, I
think) is that the determinant gets cached on the matrix element
itself (only if it's an immutable matrix, I hope?).

Caching can be of great benefit, but it needs to be applied carefully
because you are creating state where none was before. Essentially you
need to put a proof with it that this additional state will never
change outcomes. And as the example shows, that is a little hairier
that one might initially expect.

Identity-based caching as applied in the coercion framework is a
little easier to reason about (although, as we found, still quite
tricky to implement), but that's only meaningful in the presence of
guaranteed global uniqueness, which we establish by ordinary (weak)
caching.

Consider:

sage: K=Qp(5)
sage: P.<x>=K[]
sage: f=x^2-5
sage: g=x^2-O(5)
sage: f==g
True

sage: L=K.extension(f,names='a')
sage: M=K.extension(g,names='a')

Presently, the second one fails, as you'd hope. However, note that we
are trying to create M with parameters that are equal to the ones used
by L, so according to the "UniqueRepresentation" doctrine, we should
get `M is L`. We're "saved" here because f and g don't have the same
hash, but it's only an (admittedly rather likely) fluke that they
don't end up in the same hashing bucket.

Volker Braun

unread,
Sep 13, 2012, 7:48:40 PM9/13/12
to sage-...@googlegroups.com
On Thursday, September 13, 2012 11:06:50 PM UTC+1, Nils Bruin wrote:
> Really, the issue is that the objects violate the contract on
> object.__hash__():
Well, that is a problem, but not the main one I wanted to point out.

I claim that it is the only problem here. Really, objects that implement "fuzzy ==" must not implement __hash__(). This solves the caching issue, because you can't cache non-hashable arguments.

The UniqueRepresentation issue is slightly more tricky. If you follow the "no __hash__" route then you can't use them as arguments to objects with unique representation. We either demand that == always means identical (often not convenient at the command line) or introduce some convention that a "fuzzy ==" object must implement an is_identical(self, other) method. Then UniqueRepresentation could be extended to do the right thing (TM) with such arguments.


 

Nils Bruin

unread,
Sep 13, 2012, 10:19:51 PM9/13/12
to sage-devel
On Sep 13, 4:48 pm, Volker Braun <vbraun.n...@gmail.com> wrote:
> The UniqueRepresentation issue is slightly more tricky. If you follow the
> "no __hash__" route then you can't use them as arguments to objects with
> unique representation. We either demand that == always means identical
> (often not convenient at the command line) or introduce some convention
> that a "fuzzy ==" object must implement an is_identical(self, other)
> method. Then UniqueRepresentation could be extended to do the right thing
> (TM) with such arguments.

This last one might be manageable without demanding an is_identical
method. Usually, parent constructors have to normalize their arguments
before doing a cache lookup anyway. For "fuzzy" constructor data, the
normalization step could include extracting the "fuzzy" information
(in the given example, that would mean extracting the precisions as a
vector of integers or something like it). Of course, even that can be
defeated by hiding the fuzziness one level deeper:

Qp(5)[t][x]/(x^2-5)

versus

Qp(5)[t][x]/(x^2-O(5))

and that example would be very hard to handle properly with an
is_identical method too (*Everybody* would need one, because you need
to tell the system somehow that in recursive situations, the stronger
equality should be used).

So "no hash for fuzzies" might just be necessary, but that would break
a lot of code too.

Simon King

unread,
Sep 14, 2012, 1:17:15 AM9/14/12
to sage-...@googlegroups.com
On 2012-09-14, Nils Bruin <nbr...@sfu.ca> wrote:
> On Sep 13, 4:48 pm, Volker Braun <vbraun.n...@gmail.com> wrote:
>> The UniqueRepresentation issue is slightly more tricky. If you follow the
>> "no __hash__" route then you can't use them as arguments to objects with
>> unique representation.

+1, as I have already stated in my previous post.

> We either demand that == always means identical

No, that would be bad. I think having unique parents is a good idea
(though perhaps not always practical), but having unique *elements* (and
elements may easily be used as cache keys!) is not good.

>> (often not convenient at the command line) or introduce some convention
>> that a "fuzzy ==" object must implement an is_identical(self, other)
>> method.

What's wrong with Python's "is"?

>> Then UniqueRepresentation could be extended to do the right thing
>> (TM) with such arguments.
>
> This last one might be manageable without demanding an is_identical
> method. Usually, parent constructors have to normalize their arguments
> before doing a cache lookup anyway.

If you use UniqueRepresentation, then caching happens before the
__init__ method of the parent is even called. So, it would not help to
normalize the arguments in __init__.

It would be possible to override the __classcall__ method by a method
that first normalises the arguments and then calls
UniqueRepresentation.__classcall__, though.

A more radical idea: We could modify the @cached_function and
@cached_method decorators, so that the arguments are normalised *before*
looking up the cache and *before* passing the arguments to the wrapped
function.

I am not sure if that would be doable, though.

Cheers,
Simon


Simon King

unread,
Sep 14, 2012, 1:25:35 AM9/14/12
to sage-...@googlegroups.com
Hi Nils,

On 2012-09-13, Nils Bruin <nbr...@sfu.ca> wrote:
> Identity-based caching as applied in the coercion framework is a
> little easier to reason about (although, as we found, still quite
> tricky to implement), but that's only meaningful in the presence of
> guaranteed global uniqueness

No, that's not true. Actually, the opposite is true.

Namely, I already met situations where a coercion from A to B was
sought, but a coercion from A' to B' with A'==A and B'==B was found.
That will naturally happen if one uses equality and not identity tests
in the coercion caches. However, at some point it is verified in the
coercion model that the coercion map has "the correct" domain and
codomain -- where "correct" is tested by "is" and not by "==".

So, what happens if you use TripleDict and MonoDict with unique keys, as
in #715 and #12313, but have parents A==A'? Well, if you look for a
coercion from A to B then you are *guaranteed* to get a map from A to B,
and not a map from A' to B.

In other words, non-unique parents are only meaningful, if the coercion
dictionaries compare parents by identity and not equality.

Cheers,
Simon


Volker Braun

unread,
Sep 14, 2012, 6:14:45 AM9/14/12
to sage-...@googlegroups.com
We don't have to break any existing @cached_method code, we just have to change @cached_... to skip the cache lookup if the argument doesn't have a __hash__. 

And is_identical() would of course have to work recursively, so for two polynomials you'd have to compare the coefficients with is_identical(). The default implementation would just use == if the argument has a __hash__ method, an "is" otherwise. 

I know that we would have to change UniqueRepresentation.__classcall__, though that at least is manageable. The mangled arguments, which are the hash keys, would have to be a in proxy object that implements == in terms of is_identical() of the arguments.

Simon King

unread,
Sep 14, 2012, 6:50:45 AM9/14/12
to sage-...@googlegroups.com
Hi Volker,

On 2012-09-14, Volker Braun <vbrau...@gmail.com> wrote:
> ------=_Part_466_20242037.1347617685588
> Content-Type: text/plain; charset=ISO-8859-1
>
> We don't have to break any existing @cached_method code, we just have to
> change @cached_... to skip the cache lookup if the argument doesn't have a
> __hash__.

The data are stored in a dictionary. Hence, if the arguments have no
hash, you'd get a TypeError - which I think is fine.

> I know that we would have to change UniqueRepresentation.__classcall__,
> though that at least is manageable. The mangled arguments, which are the
> hash keys, would have to be a in proxy object that implements == in terms
> of is_identical() of the arguments.

I believe it is a bad idea to compare *everything* by identity.
Namely, 5 is not 5 in Sage, hence, you would have GF(5) is not GF(5)
is not GF(int(5)).

But hang on. Do you suggest that the cached method fed with unhashable arguments
does return a result, but won't cache it? That might be very reasonable.
Then, it would only remain to make power series and friends unhashable.

Best regards,
Simon

Volker Braun

unread,
Sep 14, 2012, 8:03:00 AM9/14/12
to sage-...@googlegroups.com
On Friday, September 14, 2012 11:51:03 AM UTC+1, Simon King wrote:
But hang on. Do you suggest that the cached method fed with unhashable arguments
does return a result, but won't cache it? That might be very reasonable.
Then, it would only remain to make power series and friends unhashable.

Yes, thats what I'm proposing.
 
On Friday, September 14, 2012 11:51:03 AM UTC+1, Simon King wrote:
I believe it is a bad idea to compare *everything* by identity. 

I agree. But in my proposal, ZZ(5) is exact and would keep its __hash__ method. So the default is_identical() implementation would just use == to compare integers. Hence nothing changes, GF(5) is still unique.

Though I just realized that this won't fix the following:

sage: @cached_function
....: def add_one(x):
....:         return x+1
....: 
sage: add_one(GF(5)(4))
0
sage: 
sage: [ add_one(i) for i in range(10) ]
[1, 2, 3, 4, 0, 6, 7, 8, 9, 10]

rjf

unread,
Sep 14, 2012, 3:08:47 PM9/14/12
to sage-...@googlegroups.com

I was surprised to learn that Sage has a notion of equal that is not transitive.  While you are of course free to make up rules and define new relations that fall in the gap between mathematics and computer representation, it might help to look at what other computer algebra systems do. In particular I suggest that you NOT do what Mathematica does.  You might look at what Common Lisp does, which has a number of equality predicates, equal, eq, eql, =.
Note that in IEEE 754 floating point std, NaNs are not equal to each other or even themselves. Note that some items like "Interval(0,1)" may be considered not-necessarily-equal to another "interval(0,1)".   However, if you compare such an item to itself  (that is, the same memory location) it can be arguably equal.  Maple makes the choice of having unique representations (via hashing/caching) for any expression and so it is more or less impossible to store "Interval(0,1)" in two locations.

Comparing 2 numbers of different precision by rounding the less precise leads to VERY bad consequences:  an iterative computation can converge either by reaching an accurate numerically meaningful fixed point, or it can converge by losing more and more precision.  (This is what Mathematica can sometimes do to you.)

Caching the results of pure functions (no global state, exact inputs and outputs) may work but maybe not.

You would not be happy with a deli that had cached exactly one sandwich of each type, so that two customers got THE SAME pastrami sandwich..

RJF

Simon King

unread,
Sep 15, 2012, 8:13:13 AM9/15/12
to sage-...@googlegroups.com
Hi!

I think we are talking about two different topics here, that should be
kept separate.

1) The hash must obey the law. If you find two objects that evaluate
equal but have different hash, then it is a bug. It can be fixed by
either changing the hash so that it only relies on the data that are
responsible for making the objects evaluate equal, or it can be fixed by
making the objects unhashable.

2) While == is supposed to be transitive (a==b and b==c implies a==c),
if a,b,c are defined as elements of the same parent (if not, it is a
bug). However, unless we want to drop Python, we must also allow
comparison of elements that do not belong to the same parent ("==" is
supposed to never raise an error, in Python).

If, e.g., a=GF(2)(1) and b=ZZ['x'](1), then some CAS would say that a!=b
because they belong to totally different rings. Of course, doing so
keeps == transitive.

Other CAS (Sage, but also Magma, IIRC) would say that "a!=b for elements
of different parent" is *impractical*. Coercion offers a mathematically
sober approach to compare elements of different algebraic structures:
If there is a *canonical* parent (called "pushout" in Sage) into which
both a and b coerce, then compare the elements *there*.

As long as the pushout is canonical (which might be difficult to prove, so,
there could be bugs in it as well), it totally makes sense to say:
"a==b holds, because R=GF(2)['x'] is the pushout of GF(2) and ZZ['x'],
and R(a)==R(b)."

What does 2) mean for the topic of this thread?

While I believe that "comparison by pushout" is a good idea, I think it
is impossible to avoid == being intransitive on elements a,b,c that
belong to three *different* parents that have no common pushout. That
would be the price to pay.

Being intransitive, elements are bad cache keys with the usual ==. One
could argue that users should be aware of it and take precautions. But
it would be nice to support the user by, say, a version of
cached_function that uses a different key comparison.

IIRC, a "parent" in Sage is any type, and any instance of
sage.structure.parent.Parent. So, the notion "parent" includes the
notion of a type. As has been mentioned in this thread, it
makes sense to compare parents by identity. Given this definition of a
parent, one has following notion of an "element": If X is instance
of a type and X is not an instance of sage.structure.parent.Parent, then
it is an element. The parent which this element belongs to, is either
X.parent() or type(X).
That's what is done if you call parent(X), by the way.

It would be easy to define a "cached_by_id" decorator, that returns a
cached function, so that parents on the argument list are compared by
identity, and everything else (in particular, any element) is compared
in a coercion-free way: "parent(X) is parent(Y) and X==Y". Namely, if
parent(X) is parent(Y) then X==Y will not rely on coercion.

What do people think? Should a coercion-free container and a
cached_function version using this container be implemented?

Cheers,
Simon


Volker Braun

unread,
Sep 15, 2012, 10:18:21 AM9/15/12
to sage-...@googlegroups.com
Just for the record, in Python there is nothing wrong with different hash yet equality or with non-transitive equality as long as you don't put them into a set or dict. Generally this issue just doesn't come up since you usually don't put stuff from different parents together. Except that this does matter for caching and UniqueRepresentation.

I agree that we need a cache with a stronger equality. Testing "parent(X) is parent(Y) and X==Y" is a good start, but not always sufficient. If the elements have a varying precision then the precision also needs to be compared. Because right now we have

sage: R.<x> = QQ[[]]
sage: x + O(x^2) == x + x^3 + O(x^4)
True

So maybe this could be the default implementation of an is_identical() method, then it could be overridden as needed. 


Nils Bruin

unread,
Sep 15, 2012, 11:32:38 AM9/15/12
to sage-devel
[Disclaimer: "comparison" here only means testing for equality here,
not
"comparison" in the sense of testing for a (partial) ordering.]

On Sep 15, 5:13 am, Simon King <simon.k...@uni-jena.de> wrote:
> Hi!
>
> I think we are talking about two different topics here, that should be
> kept separate.
>
> 1) The hash must obey the law. If you find two objects that evaluate
> equal but have different hash, then it is a bug.

But then the only bug-less hash functions on the integers are the
trivial ones, due to GF(p)(a) == a+p.

> Other CAS (Sage, but also Magma, IIRC) would say that "a!=b for elements
> of different parent" is *impractical*.

Magma draws a different conclusion, though. Comparison of ZZ(1) and
GF(p)(1) raises an error. There is another operator 'cmpeq' that
returns false if 'eq' would give an error and the result of 'eq'
otherwise. For instance, a dictionary (associative array) needs a
universe declared (parent for the keys) so that it's clear what
equality means.

Floats are not a main concern in Magma, however, and some different
choices are made there:

> RealField(10)!(1/3) eq RealField(20)!(1/3);
true
> RealField(10)!(1/3) eq 1/3;
true
> 1/3 eq 0.3;
false

> As long as the pushout is canonical (which might be difficult to prove, so,
> there could be bugs in it as well), it totally makes sense to say:
>  "a==b holds, because R=GF(2)['x'] is the pushout of GF(2) and ZZ['x'],
>   and R(a)==R(b)."

That does break transitivity of equality rather thoroughly. If you
want to insist hashing is coarser than equality (along *any* path) you
quickly arrive at only trivial hash functions, due to 0 == GF(3)(0) ==
3 == GF(2)(1) == 1 and the fact that hash equality is necessarily
transitive.

I suspect this is because we allow quotient operations in automatic
coercions as well.

> [equality] Being intransitive, elements are bad cache keys with the usual ==. One
> could argue that users should be aware of it and take precautions. But
> it would be nice to support the user by, say, a version of
> cached_function that uses a different key comparison.

You've just made RJF's weekend, and given an indication that
Greenspun's tenth rule can probably be extended beyond C and Fortran.
This is why lisp has a whole bunch of different equality test (equal,
eq, eql, equalp). Whatever you're going to add, it probably needs
universal infrastructure

> It would be easy to define a "cached_by_id" decorator, that returns a
> cached function, so that parents on the argument list are compared by
> identity, and everything else (in particular, any element) is compared
> in a coercion-free way: "parent(X) is parent(Y) and X==Y". Namely, if
> parent(X) is parent(Y) then X==Y will not rely on coercion.

That would not solve the "fuzzy" problem, which I think is the most
likely to cause insidious bugs. I think within parent, "hash" (if it
exists) should be strictly coarser than equality.

At the same time, one gains a lot of (untrustworthy) functionality by
letting equality testing only compare to the lower precision on, for
instance, power series and p-adic numbers (henceforth "fuzzies"). As
long as you make sure that you start out with enough precision to
distinguish mathematical inequalities throughout your (approximate)
computations, your answer is probably correct, and often you can
reason that any precision left in your answer must be correct (just
probably not optimal).

This is incredibly useful in practice. Yes, the proper approach is to
rewrite your algorithm with numerical considerations in every step,
but this is often not practical. Simple example: Many computations
with p-adic points on elliptic curves work quite well with "lowest
precision" equality and conventional algorithms. The cases where they
don't are exactly the ones where elementary p-adic analytic
descriptions start converging, so they complement eachother
wonderfully. It would be a pain to have to reimplement all the
elliptic curve stuff.

I'm not so sure that being hashable is so essential, so my initial bet
is that for fuzzies, dropping hashability is the preferred option. The
increased introduction of caching functions is restricting
non-hashable types more, though.

The other problem, that dictionaries with keys from different parents
are not trustworthy, is I think a smaller one, provided that people
are aware of it. Since cached_functions use a dictionary implicitly,
it's harder for people to be aware that their function inputs are
stuffed in a dict at some point.

If the "stronger comparison" is only to take into account the parents
as well (that means ignoring the hashing problems of fuzzies), you can
just use

(parent(a),a)

as key (for suitable definition for "parent"). I would expect the
parent hashes to be pretty trustworthy and as long as hashes within
parent behave, '==' of such pairs would imply hash equality.

Please don't use is_identical(). Identity already has a meaning in
python and we're talking about a weaker concept.

Simon King

unread,
Sep 15, 2012, 11:58:31 AM9/15/12
to sage-...@googlegroups.com
Hi Volker,

On 2012-09-15, Volker Braun <vbrau...@gmail.com> wrote:
> ------=_Part_608_31003705.1347718701217
> Content-Type: text/plain; charset=ISO-8859-1
> Because right now we have
>
> sage: R.<x> = QQ[[]]
> sage: x + O(x^2) == x + x^3 + O(x^4)
> True

Yes, and I think it must not be hashable, then. Or the hash must be
constant.

Cheers,
Simon


Simon King

unread,
Sep 15, 2012, 12:50:56 PM9/15/12
to sage-...@googlegroups.com
Hi Nils,

On 2012-09-15, Nils Bruin <nbr...@sfu.ca> wrote:
> But then the only bug-less hash functions on the integers are the
> trivial ones, due to GF(p)(a) == a+p.

ZZ coerces into any unital ring, and for its image would hold the same.
That is really not nice.

>> Other CAS (Sage, but also Magma, IIRC) would say that "a!=b for elements
>> of different parent" is *impractical*.
>
> Magma draws a different conclusion, though. Comparison of ZZ(1) and
> GF(p)(1) raises an error.

Really? Then I am sorry for my wrong statement. I recall that I was told
that Sage's coercion model is inspired by Magma's (hopefully I recall at
least one detail correctly), and I thought that *arithmetic* involving
different parents wouldn't make much sense without *comparison* involving
different parents.

> There is another operator 'cmpeq' that
> returns false if 'eq' would give an error and the result of 'eq'
> otherwise. For instance, a dictionary (associative array) needs a
> universe declared (parent for the keys) so that it's clear what
> equality means.

That would actually be another possibility to make our coercions better,
without the need to work with trivial hash functions: Implement a
container type "TypedDict", say, that tests whether all keys belong to a
given domain. If a key from a different domain is given, then either it
is converted into the expected domain, or an error is raised (I don't
know what would be better).

Example:

sage: DR = TypedDict(Fields())()
sage: DZ = TypedDict(ZZ)()

sage: DR[QQ] = 1
sage: DZ[1] = 1

And now I don't know what would be better:
sage: DR[ZZ]
1 # because Fields()(ZZ) is QQ
or
TypeError # because ZZ not in Fields()

Similarly, I don't know whether DZ[GF(3)(2)]=2 should result in a
TypeError, or should silently convert GF(3)(2) into the integer 2. Note,
that this conversion would solve the hash problem, because then
DZ[GF(3)(2)] does exactly the same as DZ[ZZ(2)].

Actually it could be made even more flexible, say: TypedDict(ZZ) accepts
any tuple of integers as key, while TypedDict((ZZ,Rings(),GF(3),int)) only
accepts quadruples, formed by one integer, one ring, one element
of GF(3), and one Python int.

Do people think such containers would be useful for creating more
reliable caches?

> You've just made RJF's weekend

It is a pleasure for me to make people happy.

>> It would be easy to define a "cached_by_id" decorator, that returns a
>> cached function, so that parents on the argument list are compared by
>> identity, and everything else (in particular, any element) is compared
>> in a coercion-free way: "parent(X) is parent(Y) and X==Y". Namely, if
>> parent(X) is parent(Y) then X==Y will not rely on coercion.
>
> That would not solve the "fuzzy" problem, which I think is the most
> likely to cause insidious bugs.

Sure, and neither would the second suggestion (TypedDict).

> The other problem, that dictionaries with keys from different parents
> are not trustworthy, is I think a smaller one, provided that people
> are aware of it.

Interesting. I would rather argue: People are likely to be aware of
"fuzzy" data being bad cache keys. They are less likely to be aware of
subtleties of Sage's coercion model and its implications for caching.
So, it is more important for us to solve the coercion problem than the
fuzzy problem, to avoid pit falls for the user.

> Since cached_functions use a dictionary implicitly,
> it's harder for people to be aware that their function inputs are
> stuffed in a dict at some point.

You think so? If I was aware that something is cached, then I would be
surprised if things were *not* stuffed into a dict behind the scenes.

Best regards,
Simon


Nils Bruin

unread,
Sep 16, 2012, 12:47:48 AM9/16/12
to sage-devel
On Sep 15, 9:51 am, Simon King <simon.k...@uni-jena.de> wrote:

> You think so? If I was aware that something is cached, then I would be
> surprised if things were *not* stuffed into a dict behind the scenes.

The key is in the "If". People can easily use functions without
knowing whether or not they are cached. Caching is an implementation
detail.

Volker Braun

unread,
Sep 17, 2012, 11:11:56 AM9/17/12
to sage-...@googlegroups.com
On Saturday, September 15, 2012 4:32:41 PM UTC+1, Nils Bruin wrote:
Please don't use is_identical(). Identity already has a meaning in
python and we're talking about a weaker concept.

Fair enough, but maybe you should suggest a better name. How about .is_strongly_equal()

Python doesn't have any additional comparison operators that one could overload, but one could extend the preparser to parse a===b, say, into a.is_strongly_equal(b). Note that === is the "equal in value and type" operator in javascript.

Robert Bradshaw

unread,
Sep 18, 2012, 12:56:44 AM9/18/12
to sage-...@googlegroups.com
On Sat, Sep 15, 2012 at 9:50 AM, Simon King <simon...@uni-jena.de> wrote:
> Hi Nils,
>
> On 2012-09-15, Nils Bruin <nbr...@sfu.ca> wrote:
>> But then the only bug-less hash functions on the integers are the
>> trivial ones, due to GF(p)(a) == a+p.
>
> ZZ coerces into any unital ring, and for its image would hold the same.
> That is really not nice.
>
>>> Other CAS (Sage, but also Magma, IIRC) would say that "a!=b for elements
>>> of different parent" is *impractical*.
>>
>> Magma draws a different conclusion, though. Comparison of ZZ(1) and
>> GF(p)(1) raises an error.
>
> Really? Then I am sorry for my wrong statement. I recall that I was told
> that Sage's coercion model is inspired by Magma's (hopefully I recall at
> least one detail correctly),

The idea of Parents comes from Magma, but coercion is a new concept.
(IIRC Magma one simply implements the + operator manually for all
pairs of types that one thinks makes sense).

> and I thought that *arithmetic* involving
> different parents wouldn't make much sense without *comparison* involving
> different parents.

Yes, though comparison is trickier because one can't, for example,
return a reduced-precision "True" or "False"
I think the right solution for caching is to let the key be
(parent(x), x), or even ((id(parent(x)), x). This would be really fast
to hash and compare at the cost of unlikely cache misses (well, int
vs. Integer might be a significant overlap). TypedDict might be
useful, but probably overkill for this. The fuzziness issue is
separate, but I think people are less likely to be "accidentally"
bitten by that.

I've occasionally wondered if the inverse of coercion maps wouldn't be
better to use for comparisons, e.g. GF(p) -> ZZ and RealField(53) ->
RealField(100). The "intersection" rather than "pushout" would get
used as a domain to compare two elements. Of course this would be an
extremely backwards-incompatible change, and sometimes less convenient
(e.g. it's handy to be able to compare finite field elements with -1).

- Robert
Reply all
Reply to author
Forward
0 new messages