Comparison of vectors is O(n) even in the simple cases

30 views
Skip to first unread message

Florent Hivert

unread,
May 8, 2012, 12:33:01 AM5/8/12
to Sage Devel
Hi There,

Comparison of large vectors in Sage is slow in a surprising way: even if all
the entries of the vectors are different, the cost is proportional to the
length of the vector instead of having constant cost !

sage: l = 1000; m0 = vector(ZZ, [0]*l); m1 = vector(ZZ, [1]*l)
sage: %timeit m0 == m1
625 loops, best of 3: 656 �s per loop
sage: l = 10000; m0 = vector(ZZ, [0]*l); m1 = vector(ZZ, [1]*l)
sage: %timeit m0 == m1
125 loops, best of 3: 5.66 ms per loop
sage: l = 100000; m0 = vector(ZZ, [0]*l); m1 = vector(ZZ, [1]*l)
sage: %timeit m0 == m1
5 loops, best of 3: 59.1 ms per loop

It happens whatever ring you are using. The reason is the following:
The various vector classes inherits from FreeModuleElement which defines

def __richcmp__(left, right, int op):
[...]
return (<Element>left)._rich_to_bool(op, (
<FreeModuleElement>left)._cmp_same_ambient_c(right))
[...]

ultimately calling the following very inadequate solution:

cdef int _cmp_same_ambient_c(left, FreeModuleElement right) except -2:
return cmp(left.list(copy=False), right.list(copy=False))

On the other hand over ZZ, the vector are instances of Vector_integer_dense
which defines:

cdef int _cmp_c_impl(left, Element right) except -2:
cdef Py_ssize_t i
cdef int c
for i from 0 <= i < left.degree():
c = mpz_cmp(left._entries[i], (<Vector_integer_dense>right)._entries[i])
if c < 0:
return -1
elif c > 0:
return 1
return 0

But it is never called. I'd like to have confirmation that adding to all
vectors classes (following sage/structure/element.pyx line 881 (5.0.rc0))

def __richcmp__(left, right, int op):
return (<Element>left)._richcmp(right, op)

is the correct solution. Removing the offending __richcmp__ in
FreeModuleElement seems to solve the problem once for all for all rings but
introduce a lot of other problems.

Also it is very likely that the same problem is occuring in different places
(not only the various vector classes) and I'm not completely sure where to
look. I'd rather have an automatic way to find all the places where this fails
and also add the check to TestSuite if possible. Does anyone have a suggestion
for that ? Is it possible to check from Python that any subclass of Element
providing _cmp_c_impl (cdef) does also properly define __richcmp__ or does it
have to be checked on sources with the drawback that if someone introduce the
same mistake it could remain unnoticed for quite a long time ?

Cheers,

Florent, which always have trouble with Cython comparison and which seems not
to be the only one !

Simon King

unread,
May 8, 2012, 2:28:55 AM5/8/12
to sage-...@googlegroups.com
Hi Florent,

On 2012-05-08, Florent Hivert <Florent...@lri.fr> wrote:
> But it is never called. I'd like to have confirmation that adding to all
> vectors classes (following sage/structure/element.pyx line 881 (5.0.rc0))
>
> def __richcmp__(left, right, int op):
> return (<Element>left)._richcmp(right, op)
>
> is the correct solution.

Yes, the following advices given in some comment in
sage/structure/element.pyx are still valid (and, I agree, difficult
to keep in mind):

####################################################################
# For a derived Cython class, you **must** put the following in
# your subclasses, in order for it to take advantage of the
# above generic comparison code. You must also define
# either _cmp_c_impl (if your subclass is totally ordered),
# _richcmp_c_impl (if your subclass is partially ordered), or both
# (if your class has both a total order and a partial order;
# then the total order will be available with cmp(), and the partial
# order will be available with the relation operators; in this case
# you must also define __cmp__ in your subclass).
# This is simply how Python works.
#
# For a *Python* class just define __cmp__ as always.
# But note that when this gets called you can assume that
# both inputs have identical parents.
#
# If your __cmp__ methods are not getting called, verify that the
# canonical_coercion(x,y) is not throwing errors.
#
####################################################################

IIRC, the problem is that (by design of Cython) it is not possible to
inherit __(rich)cmp__ if there is also a __hash__ method. Anyway,
copying the code for __(rich)cmp__ from sage/structure/element.pyx is
necessary, inheritance wouldn't work, as I've learnt the hard way...

And then, _cmp_c_impl or _richcmp_c_impl are supposed to provide the
actual implementation of comparison.

> Removing the offending __richcmp__ in
> FreeModuleElement seems to solve the problem once for all for all rings but
> introduce a lot of other problems.

What other problems emerge? Given the comment from sage/structure/element.pyx, I think
that method should be renamed into _richcmp_c_impl.

> Also it is very likely that the same problem is occuring in different places
> (not only the various vector classes) and I'm not completely sure where to
> look. I'd rather have an automatic way to find all the places where this fails
> and also add the check to TestSuite if possible. Does anyone have a suggestion
> for that ?

I am not sure what you are trying to do. Do you want to detect Cython
classes that inherit from sage.structure.element.Element and do not copy
their __cmp__ method as they should? I am not sure how this could be
done without source inspection.

Cheers,
Simon


Florent Hivert

unread,
May 8, 2012, 9:29:28 AM5/8/12
to sage-...@googlegroups.com
Hi Simon,

> > Removing the offending __richcmp__ in
> > FreeModuleElement seems to solve the problem once for all for all rings but
> > introduce a lot of other problems.
>
> What other problems emerge?

Actually, I was wrong saying that this solves the problem. The result is much
faster but incorrect.

> Given the comment from sage/structure/element.pyx, I think
> that method should be renamed into _richcmp_c_impl.

This doesn't work either with seemingly the same error as if I remove
__richcmp__.

> > Also it is very likely that the same problem is occuring in different
> > places (not only the various vector classes) and I'm not completely sure
> > where to look. I'd rather have an automatic way to find all the places
> > where this fails and also add the check to TestSuite if possible. Does
> > anyone have a suggestion for that ?
>
> I am not sure what you are trying to do. Do you want to detect Cython
> classes that inherit from sage.structure.element.Element and do not copy
> their __cmp__ method as they should?

Sorry, I wasn't clear. What you describe is exactly what I want to do.

> I am not sure how this could be done without source inspection.

Precisely! I'd like to do it from inside the Python interpreter by
introspection. Unfortunately, I think that this can't be done.

This is now #12923

Cheers,

Florent

Florent Hivert

unread,
May 8, 2012, 10:07:50 AM5/8/12
to sage-...@googlegroups.com
Hi Simon,
And it's not the end of the story: you also need to overload __hash__ if you
do that. Cython assumes that if you change equality, you have to change
__hash__ as well. __hash__ isn't inherited.

Florent
Reply all
Reply to author
Forward
0 new messages