Should matrix spaces keep violating the basic assumption on hash functions, or should they be unique parents?

11 views
Skip to first unread message

Simon King

unread,
Jan 11, 2012, 4:25:47 AM1/11/12
to sage-devel
Hi!

Matrix spaces of dense and sparse matrices are equal and are thus not
unique parents:
sage: M1 = MatrixSpace(ZZ,2,sparse=True)
sage: M2 = MatrixSpace(ZZ,2,sparse=False)
sage: M1 is M2
False
sage: M1==M2
True

Fine. But what should be really bad: They violate the assumption that
equal objects have equal hash.
sage: hash(M1)==hash(M2)
False

By consequence, the two equal matrix spaces will correspond to
different items of dictionaries:
sage: D = {M1:1,M2:2}
sage: len(D)
2

I think that it is an apparent bug. Therefore, at #12290, I am fixing
the hash values (and moreover I speed-up the hash of matrix spaces
considerably).

However, it seems that the coercion framework relies on the bug.
Namely, when fixing it, one gets
sage: A = matrix(ZZ, 5, range(30), sparse=False)
sage: B = matrix(ZZ, 5, range(30), sparse=True)
sage: C = matrix(QQ, 5, range(30), sparse=True)
sage: A.elementwise_product(C).is_dense()
True
sage: B.elementwise_product(C).is_sparse()

---------------------------------------------------------------------------
TypeError Traceback (most recent
call last)
...
TypeError: no common canonical parent for objects with parents:
'Full MatrixSpace of 5 by 6 sparse matrices over Integer Ring' and
'Full MatrixSpace of 5 by 6 sparse matrices over Rational Field'

The problem can be fixed by making matrix spaces unique parents (but I
didn't check yet whether it creates other problems).

Question to you: Do you see an obvious reason why matrix spaces should
not be unique parents?

Best regards,
Simon

Nicolas M. Thiery

unread,
Jan 11, 2012, 5:50:36 AM1/11/12
to sage-...@googlegroups.com
On Wed, Jan 11, 2012 at 01:25:47AM -0800, Simon King wrote:
> Matrix spaces of dense and sparse matrices are equal and are thus not
> unique parents:
> sage: M1 = MatrixSpace(ZZ,2,sparse=True)
> sage: M2 = MatrixSpace(ZZ,2,sparse=False)
> sage: M1 is M2
> False
> sage: M1==M2
> True
>
> Fine. But what should be really bad: They violate the assumption that
> equal objects have equal hash.
> sage: hash(M1)==hash(M2)
> False
>
> By consequence, the two equal matrix spaces will correspond to
> different items of dictionaries:
> sage: D = {M1:1,M2:2}
> sage: len(D)
> 2
>
> I think that it is an apparent bug.

+1

> Therefore, at #12290, I am fixing the hash values (and moreover I
> speed-up the hash of matrix spaces considerably).

Cool.

> However, it seems that the coercion framework relies on the bug.
> Namely, when fixing it, one gets
> sage: A = matrix(ZZ, 5, range(30), sparse=False)
> sage: B = matrix(ZZ, 5, range(30), sparse=True)
> sage: C = matrix(QQ, 5, range(30), sparse=True)
> sage: A.elementwise_product(C).is_dense()
> True
> sage: B.elementwise_product(C).is_sparse()
>
> ---------------------------------------------------------------------------
> TypeError Traceback (most recent
> call last)
> ...
> TypeError: no common canonical parent for objects with parents:
> 'Full MatrixSpace of 5 by 6 sparse matrices over Integer Ring' and
> 'Full MatrixSpace of 5 by 6 sparse matrices over Rational Field'
>
> The problem can be fixed by making matrix spaces unique parents (but I
> didn't check yet whether it creates other problems).
>
> Question to you: Do you see an obvious reason why matrix spaces should
> not be unique parents?

If at all possible, I for myself would rather have them as unique
parents.

On the other hand, I remember it was argued against it for
VectorSpaces: indeed users might want to create several copies of a
given vector space, in particular to use different scalar products.
I don't know if there are similar needs/use for matrix spaces.


In the case of VectorSpaces, I would be more in favor of having unique
parents, and making the scalar product a part of the data to construct
the vector space. That is:

VectorSpace(QQ,3, scalar_product=...)

would model QQ^3, *endowed* with the given scalar product, which is
different from QQ^3.

Also, we could still allow the user to create his/her private vector
space by specifying an extra key to the constructor (that's what
CombinatorialFreeModule does). But that might require some refactoring
and might break backward compatibility.

Cheers,
Nicolas
--
Nicolas M. Thi�ry "Isil" <nth...@users.sf.net>
http://Nicolas.Thiery.name/

Simon King

unread,
Jan 11, 2012, 6:46:00 AM1/11/12
to sage-devel
Hi Nicolas, hi all,

On 11 Jan., 11:50, "Nicolas M. Thiery" <Nicolas.Thi...@u-psud.fr>
wrote:
> > Question to you: Do you see an obvious reason why matrix spaces should
> > not be unique parents?
>
> If at all possible, I for myself would rather have them as unique
> parents.
>
> On the other hand, I remember it was argued against it for
> VectorSpaces: indeed users might want to create several copies of a
> given vector space, in particular to use different scalar products.
> I don't know if there are similar needs/use for matrix spaces.

Here, the only extra structure is "sparse versus dense". Namely,
currently, a matrix space is determined as unique object by
base_ring, nrows, ncols, sparse
but equality is tested by
base_ring, nrows, ncols
only.

Of course, "dense versus sparse" is not a mathematical property, but
only concerns implementation (so, it is different from a scalar
product, which carries mathematical information).

From my perspective, it only matters whether one can do arithmetic
involving a sparse and a dense matrix -- that's to say, whether the
coercion model is easily able to handle it. And for the coercion
model, it is easier when the parents are unique.

Two remarks:
1) With my patch from #12290, the time for returning the hash of a
matrix space drops from 13.8 µs to 547 ns. But when using
UniqueRepresentation as a base class of matrix spaces, then one even
comes to 489 ns. So, if you mainly care for speed then using
UniqueRepresentation is the way to go.

2) A more general consideration:
The coercion model prefers to have unique parents. But many people
think that "A == B" should mean "A and B are canonically isomorphic",
and not just "A is B". That could be solved by making the coercion
model consequently use containers that compare by "is" and not by
"==". For example, consequently use
sage.structure.coerce_dict.TripleDict and not dict.

In that way, we could both strengthen the coercion model and make the
people happy. But it would probably involve some tedious work...

Best regards,
Simon

Marco Streng

unread,
Jan 11, 2012, 7:30:07 AM1/11/12
to sage-...@googlegroups.com
On 11/01/2012 11:46, Simon King wrote:
> 2) A more general consideration: The coercion model prefers to have
> unique parents. But many people think that "A == B" should mean "A
> and B are canonically isomorphic", and not just "A is B". That could
> be solved by making the coercion model consequently use containers
> that compare by "is" and not by "==".


I'm not sure what you mean by ``making the coercion model use containers that compare by "is" and not by "=="'', but I'll just chime in as I think it could be related to something that I found.

Currently, if two parents A and B are equal according to "==" and no good coercion map is found, then A.coerce_map_from(B) returns A._generic_convert_map(B). In case of number fields, this means that if two number fields have same generator name and same defining polynomial, then the generators are identified by coercion. There is a NumberField example where this is incompatible with maps that I would consider natural. So in the NumberField case, I would prefer not to have a coercion in this case, but only a conversion.

Marco

PS. More specifically, the example is

sage: K.<a> = NumberField(x^2-2, embedding=-1)
sage: L.<b> = NumberField(x^2-2, embedding=1)
sage: xK = K['x'].gen()
sage: xL = L['x'].gen()
sage: M.<c> = NumberField(xK^2-3)
sage: N.<d> = NumberField(xL^2-3)
sage: O = M.absolute_field('e')
sage: P = N.absolute_field('e')
sage: b_in_a = K(0)+b
sage: map1 = O.structure()[1]
sage: map2 = P.structure()[1]
sage: b_in_a in map1.domain()
# True
sage: b in map2.domain()
# True
sage: map1(b_in_a) - map2(b)
# e^3 - 9*e, which is non-zero, so the maps don't commute!

The maps O <--> M <-- K <--> L --> N <--> P are natural (structure morphisms and coercions).
The fields O and P have same generator name and same defining polynomials, hence O==P is True. However, O.structure() and P.structure() are distinct, which makes O._generic_convert_map(P) incompatible with the natural maps O <--> M <-- K <--> L --> N <--> P

So I would prefer there not to be a coercion from O to P. I'm fine with conversion.





Simon King

unread,
Jan 11, 2012, 8:25:54 AM1/11/12
to sage-devel
Hi Marco,

On 11 Jan., 13:30, Marco Streng <marco.str...@gmail.com> wrote:
> I'm not sure what you mean by ``making the coercion model use containers
> that compare by "is" and not by "=="'',

If you have a dictionary D with keys A and B such that "A==B" but "A
is not B", then you will always have "D[A] is D[B]" -- of course
unless you screw that hash functions of A and B, which currently is
the case for matrix spaces. This is simply how Python dictionaries
work.

However, sage.structure.coerce_dict.TripleDict has the property that
(1) each key is a triple and (2) the three parts of the key are
compared by identity and not by equality. Hence, "D[A,A,A] is not
D[A,B,A]".

> but I'll just chime in as I
> think it could be related to something that I found.

The topic is certainly (weakly, at least :) related.

> sage: K.<a> = NumberField(x^2-2, embedding=-1)
> sage: L.<b> = NumberField(x^2-2, embedding=1)
> sage: xK = K['x'].gen()
> sage: xL = L['x'].gen()
> sage: M.<c> = NumberField(xK^2-3)
> sage: N.<d> = NumberField(xL^2-3)
> sage: O = M.absolute_field('e')
> sage: P = N.absolute_field('e')

Yep, non-uniqueness strikes again:
sage: O==P
True
sage: O is P
False

But I think in this case there should be other ways to ensure that the
conversion from O to P is really just used as conversion and not as
coercion.
Cheers,
Simon

Simon King

unread,
Jan 11, 2012, 8:44:35 AM1/11/12
to sage-devel
Hi!

#12290 is now ready for review.

For reference, the ticket contains one not-totally-finished patch that
only improves the hash.
However, the patch that is supposed to be reviewed goes beyond and
introduces UniqueRepresentation as a base class of matrix spaces.

First feature: All tests pass, the hash problem is fixed and in
addition the hash is even faster than with the old patch.

Second feature: Coercion of matrices is a lot nicer now, IMHO!!

Namely, in unpatched sage, the parent of the sum of a dense and a
sparse matrix implicitly depends on the summation order:
sage: M = MatrixSpace(ZZ, 10)
sage: N = MatrixSpace(ZZ, 10,sparse=True)
sage: a = M.random_element()
sage: b = N.random_element()
sage: (a+b).parent()
Full MatrixSpace of 10 by 10 dense matrices over Integer Ring
sage: (b+a).parent()
Full MatrixSpace of 10 by 10 sparse matrices over Integer Ring

With my patch, the parent of b+a is the same as the parent of a+b.

Cheers,
Simon
Reply all
Reply to author
Forward
0 new messages