WeakValueDictionary: items are deleted too soon

Jeroen Demeyer

unread,

Mar 12, 2018, 5:58:39 AM3/12/18

to sage-devel

There is some very bad behaviour related to CachedRepresentation caching
that I'm observing on #24742 (but this is otherwise unrelated to that
ticket):

sage: timeit('MatrixSpace(ZZ,3,3)')
625 loops, best of 3: 117 µs per loop

Now, we try again but we first create a strong reference:

sage: S = MatrixSpace(ZZ,3,3)
sage: timeit('MatrixSpace(ZZ,3,3)')
625 loops, best of 3: 4.13 µs per loop

This is much faster the second time! In the first example, the caching
of CachedRepresentation.__classcall__ is pointless since there is no
strong reference to the entry in the cache, so it gets deleted
immediately whenever the "MatrixSpace(ZZ,3,3)" is deleted.

This is just the usual Py_DECREF of Python objects, it has nothing to do
with the cyclic garbage collector: the behaviour remains the same even
with gc.disable().

This makes me think that we might need a version of CachedRepresentation
which keeps semi-strong references: these would only be deleted by the
cyclic garbage collector but not by a simple Py_DECREF().

Jeroen.

Jeroen Demeyer

unread,

Mar 12, 2018, 6:58:15 AM3/12/18

to sage-...@googlegroups.com

I created https://trac.sagemath.org/ticket/24954 for this

Simon King

unread,

Mar 12, 2018, 8:07:21 AM3/12/18

to sage-...@googlegroups.com

Hi Jeroen,

On 2018-03-12, Jeroen Demeyer <J.De...@UGent.be> wrote:
> This is just the usual Py_DECREF of Python objects, it has nothing to do
> with the cyclic garbage collector: the behaviour remains the same even
> with gc.disable().
>
> This makes me think that we might need a version of CachedRepresentation
> which keeps semi-strong references: these would only be deleted by the
> cyclic garbage collector but not by a simple Py_DECREF().

Would that remotely be possible without massive changes to Python
infrastructure?

Best regards,
Simon

Simon King

unread,

Mar 12, 2018, 8:38:41 AM3/12/18

to sage-...@googlegroups.com

Deeper changes than a monkey patch to gc.collect seem not to be needed,
see #24954.

Best regards,
Simon

Erik Bray

unread,

Mar 12, 2018, 8:39:24 AM3/12/18

to sage-devel

I'm not exactly sure what you have in mind here, but you might be
interested in PEP-442 regardless:
https://www.python.org/dev/peps/pep-0442/

Nils Bruin

unread,

Mar 12, 2018, 6:07:05 PM3/12/18

to sage-devel

This is extremely easily done: let the object store a reference to itself. That makes it part of a cycle, and hence a decref won't delete it.

It doesn't sound like the right solution to me though: if you think the system benefits from keeping certain recently constructed objects around, they should probably be part of an LRU cache.

Your example doesn't convince me at all that we need this change though. You should only consider making a change if you have a real-world example that would significantly benefit and if you can show that degradation in other normal use is minimal. A simplistic "timeit" statement, although instructive, does not necessarily illustrate real-world usage.

I'm happy to see certain objects are now quickly disappearing from memory! we usually have the opposite problem, and that really grinds things to a halt.

Jeroen.

Simon King

unread,

Mar 13, 2018, 2:47:23 AM3/13/18

to sage-...@googlegroups.com

On 2018-03-12, Nils Bruin <nbr...@sfu.ca> wrote:
> Your example doesn't convince me at all that we need this change though.
> You should only consider making a change if you have a real-world example
> that would significantly benefit and if you can show that degradation in
> other normal use is minimal. A simplistic "timeit" statement, although
> instructive, does not necessarily illustrate real-world usage.

+1

Kind regards,
Simon

Jeroen Demeyer

unread,

Mar 13, 2018, 4:37:49 AM3/13/18

to sage-...@googlegroups.com

On 2018-03-12 23:07, Nils Bruin wrote:
> Your example doesn't convince me at all that we need this change though.
> You should only consider making a change if you have a real-world
> example that would significantly benefit and if you can show that
> degradation in other normal use is minimal. A simplistic "timeit"
> statement, although instructive, does not necessarily illustrate
> real-world usage.

I can imagine a small function constructing a matrix (for which it needs
to construct the MatrixSpace) to do some computation but not keeping
that matrix in memory. For example, the output of that function could be
the determinant of the matrix. If this function is called from a loop,
you end up in the scenario that I described.

Dima Pasechnik

unread,

Mar 13, 2018, 5:38:05 AM3/13/18

to sage-devel

this example, assuming the matrix size is a parameter, would easily become a memory leak,

just imagine calling this function millions of times, for 10 different fields, with random matrix sizes.

Nils Bruin

unread,

Mar 13, 2018, 7:00:12 AM3/13/18

to sage-devel

On Tuesday, March 13, 2018 at 9:38:05 AM UTC, Dima Pasechnik wrote:

I can imagine a small function constructing a matrix (for which it needs
to construct the MatrixSpace) to do some computation but not keeping
that matrix in memory. For example, the output of that function could be
the determinant of the matrix. If this function is called from a loop,
you end up in the scenario that I described.

this example, assuming the matrix size is a parameter, would easily become a memory leak,
just imagine calling this function millions of times, for 10 different fields, with random matrix sizes.

Indeed. The strong LRU/FIFO cache needs to be placed very carefully. It produces the kind of optimization that a memory pool gives. In the latter there's a natural granularity: block size. Here it's just time since use.

The random matrix sizes give a good example where this caching is counterproductive: by keeping parents around that aren't used anymore, you're preventing quick memory reuse so one would actually be deteriorating runtime (even before leaking becomes a real problem) because you'd be making life harder for the memory cache of the processor.

Note that, with the missing LRU/FIFO there's a viable strategy to counteracting the overhead: profile the code, note that some parent construction is a bottleneck, identify which parent (that would be clear from the call graph), and construct that parent before the loop and keep a strong reference.

Counterproductive behaviour of the LRU/FIFO would be much harder to detect and even harder to work around.

I recall that for exactly the reason of avoiding expensive parent creation, we used to have deferred full parent initialization of matrix spaces; exactly for the scenario where a matrix only gets created as a temporary thing, for instance for taking its determinant. The advantage of that approach is that you even gain on the first ephemeral use. Did we lose that for a reason? Perhaps full parent initialization isn't as expensive as it used to be? (I have trouble believing that, with the introduction of the category framework).

So: the implementation of the CachedWeakValueDictionary on the ticket is fine, but I doubt that it's actually the appropriate solution for the problems sketched. We'd need to have actual instances of the problem to see. Probably some modular forms code somewhere will be quite linear algebra intensive. The code on https://trac.sagemath.org/ticket/15113 would be another example.

Erik Bray

unread,

Mar 13, 2018, 7:14:41 AM3/13/18

to sage-devel

Doesn't Sage allow creating mutable matrices? In a case like that you
should mutate the object rather than go through expensive object
creation over and over.

Nils Bruin

unread,

Mar 13, 2018, 7:43:30 AM3/13/18

to sage-devel

On Tuesday, March 13, 2018 at 11:14:41 AM UTC, Erik Bray wrote:

Doesn't Sage allow creating mutable matrices? In a case like that you
should mutate the object rather than go through expensive object
creation over and over.

I think the scenario would be one where the discriminant of a polynomial gets computed by something along the lines of:

f.sylvester_matrix(f.derivative()).determinant()

(and now imagine that in a loop for various f; possibly of varying degrees)

One could optimize that by explicitly coding all the steps and that would probably be quite fast, but that would be quite onerous. If profiling would give that parent creation is a significant cost in this process, it would be nice to do something about that. We'd have to see examples.

(of course, the better solution here is to compute the discriminant directly, via a method that's faster than computing a generic determinant, but it's a nice example of a case where a matrix is created where having the parent initialized has no benefit).

Simon King

unread,

Mar 13, 2018, 7:48:10 AM3/13/18

to sage-...@googlegroups.com

Hi Dima,

No, the idea is to only keep the 64 (or so) latest items in memory
(Jeroen's approach) resp. to keep stuff in memory till the next cyclic
garbage collection occurs (my approach). So, if you run this function
millions of times then there will be at most 64 matrix spaces in memory.

Cheers,
Simon

Jeroen Demeyer

unread,

Mar 13, 2018, 8:58:56 AM3/13/18

to sage-...@googlegroups.com

On 2018-03-13 10:38, Dima Pasechnik wrote:
> this example, assuming the matrix size is a parameter, would easily
> become a memory leak

You mean with my code from #24954? My idea is to cache strongly only a
fixed finite number (currently 128) of CachedRepresentation instances.
So even if you create a million temporary matrix spaces that way, at
most 128 would be kept needless in memory. I don't see the problem there.

Jeroen.

Jeroen Demeyer

unread,

Mar 13, 2018, 9:02:07 AM3/13/18

to sage-...@googlegroups.com

On 2018-03-13 12:00, Nils Bruin wrote:
> you'd be making life harder for the memory cache of the processor.

IMHO, if you worry about that, you shouldn't be writing Python.

Erik Bray

unread,

Mar 13, 2018, 9:42:32 AM3/13/18

to sage-devel

On Tue, Mar 13, 2018 at 12:43 PM, Nils Bruin <nbr...@sfu.ca> wrote:
> On Tuesday, March 13, 2018 at 11:14:41 AM UTC, Erik Bray wrote:
>>
>>
>> Doesn't Sage allow creating mutable matrices? In a case like that you
>> should mutate the object rather than go through expensive object
>> creation over and over.
>
>
> I think the scenario would be one where the discriminant of a polynomial
> gets computed by something along the lines of:
>
> f.sylvester_matrix(f.derivative()).determinant()
>
> (and now imagine that in a loop for various f; possibly of varying degrees)

That makes sense. Cases like this *could* still be optimized
somewhat, by keeping around a pool of pre-allocated matrix objects (as
long as they're of the same dimensions, base ring, etc.) But it
sounds like that's more or less what Jeroen is proposing (as long as
it's not caching matrices by value?)

CPython itself uses this in several (simpler) areas to avoid object
allocation/initialization overhead for types that are used heavily,
for example it does this for method objects and frame objects using a
"free list". This requires at the least a little hacking around with
tp_alloc, but it can be done from Cython. For something like matrices
maybe it could be done at least for small matrices of common bases.

Erik Bray

unread,

Mar 13, 2018, 9:52:12 AM3/13/18

to sage-devel

While I definitely agree with that statement 100% in principle, when
it comes to writing the Python interpreter itself that is a valid
concern. And while we're not doing that, we are working at a level
much closer to the interpreter (and much closer to the machine) than
most people writing pure Python code, and we really are in a position
to think about things like efficient cache use.

But I also believe that with most performance concerns, the one
bringing the concern has the burden of proof--that is, to demonstrate
that one particular area of code has noticeable effects on the
efficiency of some other code--before taking any drastic actions to
improve the situation :)

Erik Bray

unread,

Mar 13, 2018, 9:52:52 AM3/13/18

to sage-devel

+1

Jeroen Demeyer

unread,

Mar 13, 2018, 10:10:08 AM3/13/18

to sage-...@googlegroups.com

On 2018-03-13 14:42, Erik Bray wrote:
> But it
> sounds like that's more or less what Jeroen is proposing (as long as
> it's not caching matrices by value?)

I'm not caching the Element (Matrix) but the Parent (MatrixSpace).

Erik Bray

unread,

Mar 13, 2018, 10:16:04 AM3/13/18

to sage-devel

Great--sorry I missed that (I haven't read the full thread in detail)
but that makes perfect sense to me.

Reply all

Reply to author

Forward