Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Caching considered controversial
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Nils Bruin  
View profile  
 More options Sep 15 2012, 11:32 am
From: Nils Bruin <nbr...@sfu.ca>
Date: Sat, 15 Sep 2012 08:32:38 -0700 (PDT)
Local: Sat, Sep 15 2012 11:32 am
Subject: Re: Caching considered controversial
[Disclaimer: "comparison" here only means testing for equality here,
not
"comparison" in the sense of testing for a (partial) ordering.]

On Sep 15, 5:13 am, Simon King <simon.k...@uni-jena.de> wrote:

> Hi!

> I think we are talking about two different topics here, that should be
> kept separate.

> 1) The hash must obey the law. If you find two objects that evaluate
> equal but have different hash, then it is a bug.

But then the only bug-less hash functions on the integers are the
trivial ones, due to GF(p)(a) == a+p.

> Other CAS (Sage, but also Magma, IIRC) would say that "a!=b for elements
> of different parent" is *impractical*.

Magma draws a different conclusion, though. Comparison of ZZ(1) and
GF(p)(1) raises an error. There is another operator 'cmpeq' that
returns false if 'eq' would give an error and the result of 'eq'
otherwise. For instance, a dictionary (associative array) needs a
universe declared (parent for the keys) so that it's clear what
equality means.

Floats are not a main concern in Magma, however, and some different
choices are made there:

  > RealField(10)!(1/3) eq RealField(20)!(1/3);
  true
  > RealField(10)!(1/3) eq 1/3;
  true
  > 1/3 eq 0.3;
  false

> As long as the pushout is canonical (which might be difficult to prove, so,
> there could be bugs in it as well), it totally makes sense to say:
>  "a==b holds, because R=GF(2)['x'] is the pushout of GF(2) and ZZ['x'],
>   and R(a)==R(b)."

That does break transitivity of equality rather thoroughly. If you
want to insist hashing is coarser than equality (along *any* path) you
quickly arrive at only trivial hash functions, due to 0 == GF(3)(0) ==
3 == GF(2)(1) == 1 and the fact that hash equality is necessarily
transitive.

I suspect this is because we allow quotient operations in automatic
coercions as well.

> [equality] Being intransitive, elements are bad cache keys with the usual ==. One
> could argue that users should be aware of it and take precautions. But
> it would be nice to support the user by, say, a version of
> cached_function that uses a different key comparison.

You've just made RJF's weekend, and given an indication that
Greenspun's tenth rule can probably be extended beyond C and Fortran.
This is why lisp has a whole bunch of different equality test (equal,
eq, eql, equalp). Whatever you're going to add, it probably needs
universal infrastructure

> It would be easy to define a "cached_by_id" decorator, that returns a
> cached function, so that parents on the argument list are compared by
> identity, and everything else (in particular, any element) is compared
> in a coercion-free way: "parent(X) is parent(Y) and X==Y". Namely, if
> parent(X) is parent(Y) then X==Y will not rely on coercion.

That would not solve the "fuzzy" problem, which I think is the most
likely to cause insidious bugs. I think within parent, "hash" (if it
exists) should be strictly coarser than equality.

At the same time, one gains a lot of (untrustworthy) functionality by
letting equality testing only compare to the lower precision on, for
instance, power series and p-adic numbers (henceforth "fuzzies"). As
long as you make sure that you start out with enough precision to
distinguish mathematical inequalities throughout your (approximate)
computations, your answer is probably correct, and often you can
reason that any precision left in your answer must be correct (just
probably not optimal).

This is incredibly useful in practice. Yes, the proper approach is to
rewrite your algorithm with numerical considerations in every step,
but this is often not practical. Simple example: Many computations
with p-adic points on elliptic curves work quite well with "lowest
precision" equality and conventional algorithms. The cases where they
don't are exactly the ones where elementary p-adic analytic
descriptions start converging, so they complement eachother
wonderfully. It would be a pain to have to reimplement all the
elliptic curve stuff.

I'm not so sure that being hashable is so essential, so my initial bet
is that for fuzzies, dropping hashability is the preferred option. The
increased introduction of caching functions is restricting
non-hashable types more, though.

The other problem, that dictionaries with keys from different parents
are not trustworthy, is I think a smaller one, provided that people
are aware of it. Since cached_functions use a dictionary implicitly,
it's harder for people to be aware that their function inputs are
stuffed in a dict at some point.

If the "stronger comparison" is only to take into account the parents
as well (that means ignoring the hashing problems of fuzzies), you can
just use

(parent(a),a)

as key (for suitable definition for "parent"). I would expect the
parent hashes to be pretty trustworthy and as long as hashes within
parent behave, '==' of such pairs would imply hash equality.

Please don't use is_identical(). Identity already has a meaning in
python and we're talking about a weaker concept.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.