12 views

Skip to first unread message

Jan 17, 2012, 5:44:58 AM1/17/12

to sage-devel

Hi!

Currently, I try to fix several memory leaks, namely at trac ticket

#715, #11521, #12215 and #12313. The common idea is: Use weak

references for the caches (in particular for caching coercions and

actions), so that stuff can be collected once it is no more used.

But of course it would be easy to overdo it. For example, with the

current patch from #715, the following computation involves the

creation of 481 actions, while it only needs 320 actions without the

patch:

E = J0(46).endomorphism_ring()

g = E.gens()

So, in that example, actions seem to be garbage collected although

they will be used again. However, other examples involving elliptic

curves (e.g., EllipticCurve('960d1').prove_BSD()) are fine, the same

number of actions is created with or without the patch.

Now I wonder what you would recommend to do?

* Is there a Python package that provides "soft" references (that is

stronger than a weak reference)? That would be a good solution, but

google has not been able to find it.

* Should I try to trace down "E.gens()" and add a reference to some

parent in an appropriate location, so that the 160 actions in the

example above are prevented from being collected? That would solve the

particular problem, but there is no guarantee that it is the only

example of that type.

* Should I modify my weak version of

`sage.structure.coerce_dict.TripleDict` so that strong references are

used in the first place, but so that the oldest references are

"weakened" when the dictionary grows bigger than a certain threshold?

That is likely to fix the example above and other types of examples as

well, but may create an overhead.

* Should I consider it as the price to pay for fixing memory leaks?

Best regards,

Simon

Currently, I try to fix several memory leaks, namely at trac ticket

#715, #11521, #12215 and #12313. The common idea is: Use weak

references for the caches (in particular for caching coercions and

actions), so that stuff can be collected once it is no more used.

But of course it would be easy to overdo it. For example, with the

current patch from #715, the following computation involves the

creation of 481 actions, while it only needs 320 actions without the

patch:

E = J0(46).endomorphism_ring()

g = E.gens()

So, in that example, actions seem to be garbage collected although

they will be used again. However, other examples involving elliptic

curves (e.g., EllipticCurve('960d1').prove_BSD()) are fine, the same

number of actions is created with or without the patch.

Now I wonder what you would recommend to do?

* Is there a Python package that provides "soft" references (that is

stronger than a weak reference)? That would be a good solution, but

google has not been able to find it.

* Should I try to trace down "E.gens()" and add a reference to some

parent in an appropriate location, so that the 160 actions in the

example above are prevented from being collected? That would solve the

particular problem, but there is no guarantee that it is the only

example of that type.

* Should I modify my weak version of

`sage.structure.coerce_dict.TripleDict` so that strong references are

used in the first place, but so that the oldest references are

"weakened" when the dictionary grows bigger than a certain threshold?

That is likely to fix the example above and other types of examples as

well, but may create an overhead.

* Should I consider it as the price to pay for fixing memory leaks?

Best regards,

Simon

Jan 17, 2012, 11:39:03 AM1/17/12

to sage-...@googlegroups.com

Just out of curiosity, what is the impact on speed?

>

> Best regards,

> Simon

>

> --

> To post to this group, send an email to sage-...@googlegroups.com

> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com

> For more options, visit this group at http://groups.google.com/group/sage-devel

> URL: http://www.sagemath.org

Jan 17, 2012, 12:09:22 PM1/17/12

to sage-devel

Hi William,

On 17 Jan., 17:39, William Stein <wst...@gmail.com> wrote:

Jean-Pierre Flori reported on the ticket:

"""

Running "make ptestlong" gave me:

* sage-5.0.prealpha1 vanilla: 1397.9 sec

* sage-5.0.prealpha1 + 715: 1415.0 sec

"""

I only compared `sage -t sage/schemes`, and the total time for the

tests went up from 647.3 seconds to 658 seconds. So, both Jean-

Pierre's and my timings show a regression of less than 2%. Since the

changes are in coercion and since coercion is ubiquitous, I think this

is a fair benchmark. One could argue that 2% is quite little, given

that it fixes memory leaks of the following type:

sage: K = GF(1<<55,'t')

sage: for i in range(50):

....: a = K.random_element()

....: E = EllipticCurve(j=a)

....: P = E.random_point()

....: Q = 2*P

(currently, it would fill up all memory, even if the j-invariant was

always the same)

Since the number of actions created during `sage -t sage/schemes` went

up from 41381 to 46157 (11%), I could imagine that one could get rid

of the remaining regression as well. The question is: How, and how

difficult?

#12215, #11521 and #12313 deal with other leaks, and I don't have

timings yet.

Cheers,

Simon

On 17 Jan., 17:39, William Stein <wst...@gmail.com> wrote:

> On Jan 17, 2012 2:45 AM, "Simon King" <simon.k...@uni-jena.de> wrote:

>

> Just out of curiosity, what is the impact on speed?

There are some timings for #715.
>

> Just out of curiosity, what is the impact on speed?

Jean-Pierre Flori reported on the ticket:

"""

Running "make ptestlong" gave me:

* sage-5.0.prealpha1 vanilla: 1397.9 sec

* sage-5.0.prealpha1 + 715: 1415.0 sec

"""

I only compared `sage -t sage/schemes`, and the total time for the

tests went up from 647.3 seconds to 658 seconds. So, both Jean-

Pierre's and my timings show a regression of less than 2%. Since the

changes are in coercion and since coercion is ubiquitous, I think this

is a fair benchmark. One could argue that 2% is quite little, given

that it fixes memory leaks of the following type:

sage: K = GF(1<<55,'t')

sage: for i in range(50):

....: a = K.random_element()

....: E = EllipticCurve(j=a)

....: P = E.random_point()

....: Q = 2*P

(currently, it would fill up all memory, even if the j-invariant was

always the same)

Since the number of actions created during `sage -t sage/schemes` went

up from 41381 to 46157 (11%), I could imagine that one could get rid

of the remaining regression as well. The question is: How, and how

difficult?

#12215, #11521 and #12313 deal with other leaks, and I don't have

timings yet.

Cheers,

Simon

Jan 17, 2012, 1:28:33 PM1/17/12

to sage-...@googlegroups.com

First of all, thank you for tackling this issue! I think the speed regression is mild enough that it is not too pressing an issue.

On Tuesday, January 17, 2012 2:44:58 AM UTC-8, Simon King wrote:

On Tuesday, January 17, 2012 2:44:58 AM UTC-8, Simon King wrote:

* Should I modify my weak version of

`sage.structure.coerce_dict.TripleDict` so that strong references are

used in the first place, but so that the oldest references are

"weakened" when the dictionary grows bigger than a certain threshold?

That is likely to fix the example above and other types of examples as

well, but may create an overhead.

As far as implementation goes, you just have to put every newly-generated entry also in a fixed-size ring buffer. I don't think that this would incur measurable overhead.

Jan 17, 2012, 1:55:32 PM1/17/12

to sage-devel

Hi Volker,

On 17 Jan., 19:28, Volker Braun <vbraun.n...@gmail.com> wrote:

> As far as implementation goes, you just have to put every newly-generated

> entry also in a fixed-size ring buffer.

Sure, this is what I had imagined (although I didn't know the word

"ring buffer").

> I don't think that this would incur

> measurable overhead.

Creating a new entry in the dictionary would be slower, because you

need to update the ring buffer. But reading an entry from the

dictionary would not change. And that's good, because the purpose (as

much as I understand) of TripleDict is to have optimized reading

speed.

Cheers,

Simon

On 17 Jan., 19:28, Volker Braun <vbraun.n...@gmail.com> wrote:

> As far as implementation goes, you just have to put every newly-generated

> entry also in a fixed-size ring buffer.

"ring buffer").

> I don't think that this would incur

> measurable overhead.

need to update the ring buffer. But reading an entry from the

dictionary would not change. And that's good, because the purpose (as

much as I understand) of TripleDict is to have optimized reading

speed.

Cheers,

Simon

Jan 17, 2012, 4:10:13 PM1/17/12

to sage-...@googlegroups.com

My original intent with respect to caching was that the parents would

be the root objects, and everything else would be lazily referenced

except through them, but just pushing things through was a large

enough task that I never got this done. In particular, what I was

thinking is that a coercion or action involving R and S would have a

lifetime of min(lifetime(R), lifetime(S)) and not keep R or S alive.

If R + S -> R', then this action would keep R' alive, and once R or S

were reallocated R' would be as well (assuming it wasn't referenced

elsewhere). This would reflect use well, but could be quadratic in the

number of live parents if the full crossproduct of interactions was

explored.

be the root objects, and everything else would be lazily referenced

except through them, but just pushing things through was a large

enough task that I never got this done. In particular, what I was

thinking is that a coercion or action involving R and S would have a

lifetime of min(lifetime(R), lifetime(S)) and not keep R or S alive.

If R + S -> R', then this action would keep R' alive, and once R or S

were reallocated R' would be as well (assuming it wasn't referenced

elsewhere). This would reflect use well, but could be quadratic in the

number of live parents if the full crossproduct of interactions was

explored.

+1 to the ring buffer idea as well. Though not as complete, it could

be reasonably large and still prevent memory leaks. In terms of

performance, I wonder if sage/schemes does a lot of large, external

operations (e.g. pari or factoring or non-coercion-requiring

arithmetic). What's the slowdown for ZZ * RDF for this patch?

- Robert

Jan 17, 2012, 4:21:44 PM1/17/12

to sage-devel

Hi Robert

On 17 Jan., 22:10, Robert Bradshaw <rober...@math.washington.edu>

wrote:

> What's the slowdown for ZZ * RDF for this patch?

Do you mean "how long does it take to get the multiplication action of

ZZ on RDF out of the cache", or "how long does it take to multiply an

integer with a real"?

Cheers,

Simon

On 17 Jan., 22:10, Robert Bradshaw <rober...@math.washington.edu>

wrote:

> In particular, what I was

> thinking is that a coercion or action involving R and S would have a

> lifetime of min(lifetime(R), lifetime(S)) and not keep R or S alive.

> If R + S -> R', then this action would keep R' alive, and once R or S

> were reallocated R' would be as well (assuming it wasn't referenced

> elsewhere).

Yes, that is what I am aiming at.
> thinking is that a coercion or action involving R and S would have a

> lifetime of min(lifetime(R), lifetime(S)) and not keep R or S alive.

> If R + S -> R', then this action would keep R' alive, and once R or S

> were reallocated R' would be as well (assuming it wasn't referenced

> elsewhere).

> What's the slowdown for ZZ * RDF for this patch?

ZZ on RDF out of the cache", or "how long does it take to multiply an

integer with a real"?

Cheers,

Simon

Jan 17, 2012, 6:20:53 PM1/17/12

to sage-...@googlegroups.com

Both, but primarily the latter. It's a microbenchmark, but loop like

a = Integer(10)

b = QQ(20)

s = RDF(30)

for x in range(10**n):

s += a*b*x

should give us an upper bound on how expensive any changes could be.

(And yes, people write code like this...) Maybe a similar test with a

tower of small finite fields. The impact could be greater for more

complicated rings, but the relative impact probably smaller.

- Robert

Jan 18, 2012, 6:56:45 AM1/18/12

to sage-devel

Hi Robert,

Hooray!

On 18 Jan., 00:20, Robert Bradshaw <rober...@math.washington.edu>

wrote:

sage: def test(n):

....: a = Integer(10)

....: b = QQ(20)

....: s = RDF(30)

....: for x in xrange(10**n):

....: s += a*b*x

And then, sage-5.0.prealpha0+#11780 yields

sage: %time test(6)

CPU times: user 7.25 s, sys: 0.04 s, total: 7.29 s

Wall time: 7.31 s

whereas adding #715 yields

sage: %time test(6)

CPU times: user 7.29 s, sys: 0.01 s, total: 7.31 s

Wall time: 7.31 s

So, no difference whatsoever!

> (And yes, people write code like this...) Maybe a similar test with a

> tower of small finite fields.

I don't understand what that would look like.

I'll update the trac ticket with your example.

Thank you,

Simon

Hooray!

On 18 Jan., 00:20, Robert Bradshaw <rober...@math.washington.edu>

wrote:

> Both, but primarily the latter. It's a microbenchmark, but loop like

>

> a = Integer(10)

> b = QQ(20)

> s = RDF(30)

> for x in range(10**n):

> s += a*b*x

>

> should give us an upper bound on how expensive any changes could be.

I did the following on my laptop:
>

> a = Integer(10)

> b = QQ(20)

> s = RDF(30)

> for x in range(10**n):

> s += a*b*x

>

> should give us an upper bound on how expensive any changes could be.

sage: def test(n):

....: a = Integer(10)

....: b = QQ(20)

....: s = RDF(30)

....: for x in xrange(10**n):

....: s += a*b*x

And then, sage-5.0.prealpha0+#11780 yields

sage: %time test(6)

CPU times: user 7.25 s, sys: 0.04 s, total: 7.29 s

Wall time: 7.31 s

whereas adding #715 yields

sage: %time test(6)

CPU times: user 7.29 s, sys: 0.01 s, total: 7.31 s

Wall time: 7.31 s

So, no difference whatsoever!

> (And yes, people write code like this...) Maybe a similar test with a

> tower of small finite fields.

I'll update the trac ticket with your example.

Thank you,

Simon

Jan 18, 2012, 10:06:15 AM1/18/12

to sage-devel

By the way, the "small" regression in my timings you posted above is

actually non-existent.

Those are the first one I ran (and posted on #715).

Afterward, I ran other "make ptestlong" and the the variance was big

enough for the above difference to be meaningless.

I mean I sometimes got more than the initial test with vanilla+715 in

a subsequent test with vanilla only and less than the initial test

with vanilla only with vanilla+715.

The test proposed by Robert and your implementation confirms that.

As I just posted on #715, I guess that we indeed delete some actions

that could get reused during the test suite, but we also access

dictionnaries quickly so there is no regression.

actually non-existent.

Those are the first one I ran (and posted on #715).

Afterward, I ran other "make ptestlong" and the the variance was big

enough for the above difference to be meaningless.

I mean I sometimes got more than the initial test with vanilla+715 in

a subsequent test with vanilla only and less than the initial test

with vanilla only with vanilla+715.

The test proposed by Robert and your implementation confirms that.

As I just posted on #715, I guess that we indeed delete some actions

that could get reused during the test suite, but we also access

dictionnaries quickly so there is no regression.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu