Dig through stack for printing an object?

59 views
Skip to first unread message

Nils Bruin

unread,
May 14, 2013, 12:39:22 PM5/14/13
to sage-devel
On http://trac.sagemath.org/sage_trac/ticket/14579 I noticed that sage
is doing something very fancy to produce a short "repr" of a large
matrix: It doesn't print the entries (I knew that), but it even goes
delving in the stack to see if it can find a name to which the matrix
might possibly be bound! That's a very expensive operation:

Example code:

def t(depth,N,n):
if depth:
return t(depth-1,N,n)
else:
G = matrix([[2 for u in range(N)] for v in range(N)])
for i in range(n):
print repr(G)

Already for printing a single matrix the result shows up in timings:
sage: %time t(0,19,1)
[2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
... removed lines for brevity ...
[2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
sage: %time t(0,20,1)
20 x 20 dense matrix over Integer Ring
CPU times: user 0.03 s, sys: 0.00 s, total: 0.03 s
Wall time: 0.03 s

This routine uses sage.misc.sageinspect.sage_getvariablename, which
goes digging in the globals of the entire stack to find the variable
name. Since globals dictionaries tend to be rather well-populated
(certainly in sage), this is a rather expensive operation (also
because assembling a "stack frame object" is a rather expensive
operation):

sage: timeit('t(100,19,10)')
125 loops, best of 3: 3.09 ms per loop
sage: timeit('t(100,20,10)')
5 loops, best of 3: 2.46 s per loop

As you see, roughly 1000x slower(!).

I understand it's mildly convenient to get:

sage: G = matrix([[2 for u in range(20)] for v in range(20)])
sage: print repr(G)
20 x 20 dense matrix over Integer Ring (type 'print G.str()' to see
all of the entries)

but the price of printing "G" is excessive. OK, it's printing
operation, but as a rule I'd say one should only go digging in stacks
for debugging, not as part of normal operation.

How about just changing this to "(use str(...) to see all entries)",
where we rely on the intelligence of the user to interpret "..." to
mean 'whatever access you have to the matrix'.

Note that this expensive trick always gets executed and frequently has
no useful result anyway:

sage: G = [matrix([[2 for u in range(20)] for v in range(20)])]
sage: print G
[20 x 20 dense matrix over Integer Ring]

In these cases, a static message about 'str' would be more useful
anyway.

Nils Bruin

unread,
May 14, 2013, 12:54:37 PM5/14/13
to sage-devel
Oh, and sage_getvariablename doesn't quite live up to its advertised
API either:

sage: L=["hi"]
sage: _T=L[0]
sage:
sage.misc.sageinspect.sage_getvariablename(L[0],omit_underscore_names=True)
'_T'

Volker Braun

unread,
May 14, 2013, 1:02:43 PM5/14/13
to sage-...@googlegroups.com
I have no opinion on whether or not to do that on each _repr_ call. It doesn't do anything for me, but I don't see what the point of flooding the terminal even faster is. 

But I do think the functionality to dig for a variable name from an object is useful, and should be kept.

John H Palmieri

unread,
May 14, 2013, 1:50:35 PM5/14/13
to sage-...@googlegroups.com


On Tuesday, May 14, 2013 9:39:22 AM UTC-7, Nils Bruin wrote:
How about just changing this to "(use str(...) to see all entries)",
where we rely on the intelligence of the user to interpret "..." to
mean 'whatever access you have to the matrix'.


In principle a change like that sounds okay, but:

    sage: M = random_matrix(ZZ, 30, 30)
    sage: M
    30 x 30 dense matrix over Integer Ring (type 'print M.str()' to see all of the entries)
    sage: str(M)
    "30 x 30 dense matrix over Integer Ring (type 'print M.str()' to see all of the entries)"

So you have to call M.str(), not str(M).

--
John
 

Nils Bruin

unread,
May 14, 2013, 1:54:38 PM5/14/13
to sage-devel
On May 14, 10:02 am, Volker Braun <vbraun.n...@gmail.com> wrote:
> I have no opinion on whether or not to do that on each _repr_ call. It
> doesn't do anything for me, but I don't see what the point of flooding the
> terminal even faster is.

Not all "repr"s end up on a terminal, though. They can end up in
exception messages that get caught, processed, and never printed. The
coercion/conversion framework might try to convert via a string (repr
would be the more appropriate, since that should be the more "machine
readable" version). It could try that, catch the exception generated
from a failure, and try something else.

At least historically, string representations would get used for
hashing every now and again. This already causes:

sage: G=matrix([[2 for u in range(20)] for v in range(20)])
sage: G.set_immutable()
sage: repr(G)
"20 x 20 dense matrix over Integer Ring (type 'print G.str()' to see
all of the entries)"
sage: A=G
sage: repr(G)
"20 x 20 dense matrix over Integer Ring (type 'print obj.str()' to see
all of the entries)"
sage: del G
sage: repr(A)
"20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see
all of the entries)"

which is a mild violation of G being immutable (but not without
precedent. There are plenty of properties of "immutable" objects that
can still change).

Incidentally, the hash for matrices seems a bit prone to collision:

sage: L=[matrix([[2**j for u in range(20)] for v in range(20)]) for j
in range(10)]
sage: [m.set_immutable() for m in L];
sage: [hash(m) for m in L]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

> But I do think the functionality to dig for a variable name from an object
> is useful, and should be kept.

Agreed. The functionality is sometimes useful, mainly for debugging.

I think one has to be very cautious to reach for "inspection" tools
during *normal* code execution, though, because the information one
accesses via these is usually not really *meant* to be available at
the place where you retrieve it. You'll usually be violating
(implicit) locality and scope assumptions many people will be making
when reasoning about your code (if you go digging in the stack and
retrieve a "globals" dictionary from an outer frame from you, you're
definitely reaching into information that wasn't originally passed to
you).

Volker Braun

unread,
May 14, 2013, 2:35:01 PM5/14/13
to sage-...@googlegroups.com
On Tuesday, May 14, 2013 6:54:38 PM UTC+1, Nils Bruin wrote:
Not all "repr"s end up on a terminal, though. They can end up in
exception messages that get caught, processed, and never printed.

Don't do that. Exceptions should be short and to the point. And not do any string processing. Don't convert passed values to strings unless you don't care about performance.

Nils Bruin

unread,
May 14, 2013, 4:48:21 PM5/14/13
to sage-devel
On May 14, 11:35 am, Volker Braun <vbraun.n...@gmail.com> wrote:
> Don't do that. Exceptions should be short and to the point. And not do any
> string processing. Don't convert passed values to strings unless you don't
> care about performance.

I agree, but it does happen. And as you see, matrix.__repr__ takes the
whole performance loss a couple of orders of magnitude further with
its stack inspection thing. Would you want to consider that as a
feature, to make suboptimal implementations more noticeable?

Also, is it always reasonable to expect error messages to be without
string processing? For instance:

sage: G=matrix(20,20,[GF(3)(1)]*400)
sage: G+GF(5)(1)
TypeError: unsupported operand parent(s) for '+': 'Full MatrixSpace of
20 by 20 dense matrices over Finite Field of size 3' and 'Finite Field
of size 5'

This is the style of error message Python itself generates in these
situations. Missing out on that info would be quite painful.

I think someone was considering using a "lazy" object as "message"
that would only materialize to a full string if it's really asked for.
Did we ever get to that?

Simon King

unread,
May 14, 2013, 5:13:38 PM5/14/13
to sage-...@googlegroups.com
Hi Nils,

On 2013-05-14, Nils Bruin <nbr...@sfu.ca> wrote:
> I think someone was considering using a "lazy" object as "message"
> that would only materialize to a full string if it's really asked for.
> Did we ever get to that?

There is a lazy object specially designed for attribute errors. See
sage.structure.misc.AttributeErrorMessage.

Best regards,
Simon

Nils Bruin

unread,
May 14, 2013, 5:52:30 PM5/14/13
to sage-devel
On May 14, 2:13 pm, Simon King <simon.k...@uni-jena.de> wrote:
> There is a lazy object specially designed for attribute errors. See
> sage.structure.misc.AttributeErrorMessage.

Cool. Doesn't the coercion framework end up trying all kinds of things
that may generate errors which get caught? Aren't those also different
errors that AttributeError? Shouldn't we be using a similar approach?

def LazyError(Error):
def __init__(self,template,data):
self._template = template
self._data = data
@property
def message(self):
return self._template%self._data

You may have a better feeling for whether having something like this
would offer a worthwhile improvement.
[this is a little different from AttributeErrorMessage: There you're
even recycling the string allocation. Here, we'd just be postponing
some possibly expensive "repr" calls and string interpolation]

Nils Bruin

unread,
May 14, 2013, 8:09:57 PM5/14/13
to sage-devel
On May 14, 2:52 pm, Nils Bruin <nbr...@sfu.ca> wrote:
> On May 14, 2:13 pm, Simon King <simon.k...@uni-jena.de> wrote:
>
> > There is a lazy object specially designed for attribute errors. See
> > sage.structure.misc.AttributeErrorMessage.

We already have a tool for delaying string assembly and it indeed
makes a difference in creating Exception objects:

sage: from sage.misc.lazy_string import lazy_string
sage: f=lambda op,A,B:"unsupported operand parent(s) for '%s': '%s'
and '%s'"%(op,A,B)
sage: R= GF(5)
sage: S= GF(3)
sage: %timeit Exception(lazy_string(f, '+', R, S))
1000000 loops, best of 3: 872 ns per loop
sage: %timeit Exception("unsupported operand parent(s) for '%s': '%s'
and '%s'"%('+',R,S))
100000 loops, best of 3: 4.35 us per loop

We don't seem to be making any use of it. Perhaps we should.

Simon King

unread,
May 15, 2013, 1:01:11 AM5/15/13
to sage-...@googlegroups.com
Hi Nils,

On 2013-05-15, Nils Bruin <nbr...@sfu.ca> wrote:
> We already have a tool for delaying string assembly and it indeed
> makes a difference in creating Exception objects:
>
> sage: from sage.misc.lazy_string import lazy_string
> sage: f=lambda op,A,B:"unsupported operand parent(s) for '%s': '%s'
> and '%s'"%(op,A,B)
> sage: R= GF(5)
> sage: S= GF(3)
> sage: %timeit Exception(lazy_string(f, '+', R, S))
> 1000000 loops, best of 3: 872 ns per loop
> sage: %timeit Exception("unsupported operand parent(s) for '%s': '%s'
> and '%s'"%('+',R,S))
> 100000 loops, best of 3: 4.35 us per loop
>
> We don't seem to be making any use of it. Perhaps we should.

Indeed, when I created the AttributeErrorMessage, it was suggested on
trac to use LazyString instead. It turns out that a special purpose
cythoned AttributeErrorMessage is faster, but I agree that in the
general case (when it makes no sense to have a special purpose object)
it totally makes sense to be a bit lazy.

Best regards,
Simon

Nils Bruin

unread,
May 15, 2013, 1:42:46 AM5/15/13
to sage-devel
On May 14, 10:01 pm, Simon King <simon.k...@uni-jena.de> wrote:
> Indeed, when I created the AttributeErrorMessage, it was suggested on
> trac to use LazyString instead. It turns out that a special purpose
> cythoned AttributeErrorMessage is faster, but I agree that in the
> general case (when it makes no sense to have a special purpose object)
> it totally makes sense to be a bit lazy.

Interesting! It turns out that the current implementation of
LazyFormat is slower than lazy_string. Both implementations can be
cythonized and significantly sped up if required:

sage: %timeit Exception(LazyFormat("unsupported operand parent(s) for
'%s': '%s' and '%s'")%('+',R,S))
1000000 loops, best of 3: 1.77 us per loop

It's even worse when you reuse the string (this is clear from the
implementation):

sage: gg=LazyFormat("unsupported operand parent(s) for '%s': '%s' and
'%s'")
sage: %timeit Exception(gg%('+',R,S))
100000 loops, best of 3: 11.5 us per loop

structure.parent and structure.element do use LazyFormat, so it may be
worthwhile fixing its implementation.

Simon King

unread,
May 15, 2013, 4:37:44 AM5/15/13
to sage-...@googlegroups.com
Hi Nils,

On 2013-05-14, Nils Bruin <nbr...@sfu.ca> wrote:
> Cool. Doesn't the coercion framework end up trying all kinds of things
> that may generate errors which get caught? Aren't those also different
> errors that AttributeError? Shouldn't we be using a similar approach?
>
> def LazyError(Error):
> ...

It would mean that all errors that we want to be raised quickly needed
to inherit from LazyError. But I am sure that we sometimes want to raise
an AttributeError or a TypeError or a ValueError quickly, and of course
we then want that "except TypeError:" works.

Hence, rather than creating a new class for Errors, I think it would be
better to focus on making the creation of the error message faster.

Did you open a ticket for cythoning sage.misc.lazy_format and
sage.misc.lazy_string (and perhaps make them *one* thing)?

Best regards,
Simon

Simon King

unread,
May 15, 2013, 5:39:28 AM5/15/13
to sage-...@googlegroups.com
Hi Nils,

On 2013-05-15, Simon King <simon...@uni-jena.de> wrote:
> Did you open a ticket for cythoning sage.misc.lazy_format and
> sage.misc.lazy_string (and perhaps make them *one* thing)?

Since I didn't find a ticket by searching trac, I created a new one: See
#14585.

Best regards,
Simon

Simon King

unread,
May 15, 2013, 6:47:29 AM5/15/13
to sage-...@googlegroups.com
On 2013-05-15, Simon King <simon...@uni-jena.de> wrote:
> Hi Nils,
>
> On 2013-05-15, Simon King <simon...@uni-jena.de> wrote:
>> Did you open a ticket for cythoning sage.misc.lazy_format and
>> sage.misc.lazy_string (and perhaps make them *one* thing)?
>
> Since I didn't find a ticket by searching trac, I created a new one: See
> #14585.

I posted two patches.

With the first patch, creating an error with the (lazy) message "unsupported
operand parent(s) for '+': 'Finite Field of size 5' and 'Finite Field of
size 3'" drops from 1.47 us (Python version of lazy_string) respectively
from 5.04 us (a usual format string) respectively from 23.1 us (LazyFormat)
to only 595 ns.

With the second patch, it further drops to 480 ns. But it might be
considered problematic that in the second patch I am using a dummy
message, that is changed *in-place*.
A similar approach did work for the attribute errors raised in the
coercion framework. So, perhaps it will similarly work for other errors
in the coercion framework?

Best regards,
Simon


Nils Bruin

unread,
Aug 11, 2013, 4:41:20 PM8/11/13
to sage-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages