Could we remove "SR.symbols"?

Diego Sejas

unread,

Mar 13, 2021, 5:02:25 PM3/13/21

to sage-devel

Any symbolic ring in Sage has a dictionary called "symbols". (I'll refer specifically to "SR" for the following.) "SR.symbols" contains all the previously defined variables through "SR.var()" and/or "SR.symbol()". However, this generates a problem: Many symbols remain stored there even when they have been overwritten for other purposes. For example, when one starts a new Sage session, Sage executes the line

from sage.calculus.predefined import x

This is done in order to have "x" as a default predefined symbolic variable. However, the whole file "sage/calculus/predefined.py" is executed which calls "SR.var()" on every single letter of the alphabet (lowercase and uppercase). So, when one has a freshly started Sage session (without any additional execution), "SR.symbols" already has length 48:

SR.symbols = {'a':a, ...'z':z, 'A':A, ...,'Z':Z}

Since "x" is the only predefined variable, this should be simply "SR.symbols={'x':x}".

An even bigger problem is seen with the following code:

SR.var('foo')

# Do something meaningful with "foo" until you don't need it anymore

foo = 2

# Do something meaningful with the new meaning of "foo"

This keeps "foo" listed as a symbolic variable in "SR.symbols" even though it is now an Integer. In time, "SR.symbols" gets cluttered with non-existing symbolic variables.

Do we really need this behavior? Is it "SR.symbols" necessary at all? Could we remove it?

Note: The motivation for this proposal comes from this Ask SageMath question, where @tmonteil proposed a command "SR.symbols()" for defining symbolic variables. The current "SR.symbols" could be renamed to "SR._symbols_" (this complies with Python's conventions on private variables), but perhaps it is unnecessary to have "SR._symbols_" at all.

(Sorry for the long post.)

Michael Orlitzky

unread,

Mar 13, 2021, 8:01:46 PM3/13/21

to sage-...@googlegroups.com

On Sat, 2021-03-13 at 14:00 -0800, Diego Sejas wrote:
> Any symbolic ring in Sage has a dictionary called "symbols". (I'll refer
> specifically to "SR" for the following.) "SR.symbols" contains all the
> previously defined variables through "SR.var()" and/or "SR.symbol()".
> However, this generates a problem: Many symbols remain stored there even
> when they have been overwritten for other purposes. For example, when one
> starts a new Sage session, Sage executes the line
>
> from sage.calculus.predefined import x
>
> This is done in order to have "x" as a default predefined symbolic
> variable.

This defines two things; a symbol named "x" that is some kind of python
object, and a python variable "x" that points to the symbol object. The
distinction is important but you're right regardless.

> However, the whole file "sage/calculus/predefined.py" is executed
> which calls "SR.var()" on every single letter of the alphabet (lowercase
> and uppercase). So, when one has a freshly started Sage session (without
> any additional execution), "SR.symbols" already has length 48:
>
> SR.symbols = {'a':a, ...'z':z, 'A':A, ...,'Z':Z}
>
> Since "x" is the only predefined variable, this should be simply
> "SR.symbols={'x':x}".

If anything, that file should be calling SR.symbol on each letter, or
SR.var on the collection of letters. But I agree with your guess on
ask.sagemath that this is a premature optimization and a waste of
startup time/memory in most cases.

>
> An even bigger problem is seen with the following code:
>
> SR.var('foo')
> # Do something meaningful with "foo" until you don't need it anymore
> foo = 2
> # Do something meaningful with the new meaning of "foo"
>
> This keeps "foo" listed as a symbolic variable in "SR.symbols" even though
> it is now an Integer. In time, "SR.symbols" gets cluttered with
> non-existing symbolic variables.

This is okay. The symbol object still exists even though you've
clobbered the name that refers to it. Having "foo" in SR.symbols allows
you to retrieve the underlying symbol object again (keeping its domain
intact, for example) with foo = SR.symbol("foo"). You don't need to use
the variable name "foo" to point to the "foo" symbol object, of course.
If you ever start to feel that programming in sage is too easy, try
defining x = SR.symbol("y") and y = SR.symbol("x").

I say this is "okay" because that's how it's intended to work. Maybe on
average that prevents us from garbage collecting a bunch of names for
symbol objects that will never be referenced again. Who knows.

> Do we really need this behavior? Is it "SR.symbols" necessary at all? Could
> we remove it?

I have no idea why the dictionary is public in the first place. It
looks to me like it should be private, and that you should use
SR.symbol("foo") to retrieve the stored object rather than
SR.symbols["foo"].

And yes we should avoid predefining the entire alphabet.

Diego Sejas

unread,

Mar 14, 2021, 7:42:33 AM3/14/21

to sage-devel

On Saturday, March 13, 2021 at 9:01:46 PM UTC-4 Michael Orlitzky wrote:

On Sat, 2021-03-13 at 14:00 -0800, Diego Sejas wrote:
> Any symbolic ring in Sage has a dictionary called "symbols". (I'll refer
> specifically to "SR" for the following.) "SR.symbols" contains all the
> previously defined variables through "SR.var()" and/or "SR.symbol()".
> However, this generates a problem: Many symbols remain stored there even
> when they have been overwritten for other purposes. For example, when one
> starts a new Sage session, Sage executes the line
>
> from sage.calculus.predefined import x
>
> This is done in order to have "x" as a default predefined symbolic
> variable.

This defines two things; a symbol named "x" that is some kind of python
object, and a python variable "x" that points to the symbol object. The
distinction is important but you're right regardless.

Aha! This is a good point! Your observation is indeed correct, and it makes even more important @tmonteil's proposed terminology in the Ask Sagemath question I referenced in my original post. In this case, it is valuable to make a language distinction between "variables" (or "Python names") and "symbols".

You are right! The symbol still exists because it is stored in "SR.symbols". Perhaps saying "non-existent symbolic variables" was a poor choice of words on my part.

In any case, I am wondering if storing (caching?) all previously defined variables (like "foo" in my example) has indeed some value. I am thinking of the following points here:

1. I would believe it is not too common to need a symbol "foo" (as in "var('foo')") to do some stuff, then reassign it with something like "foo = 2" if you are still going to need to do some symbolic computations with "foo". If that were the case, perhaps it is better or more clear to keep "foo" as a symbol and use some other name to store the number "2".

2.If one defined "var('foo')", then overwrites it with "foo = 2", and for some reason needs the symbol "foo" again, is there any meaningful advantage in retrieving it from the cache "SR.symbols" instead of redefining it again?

3. On one hand, having "SR.symbols" makes every call to "SR.var()" and "SR.symbol()" need to check if the symbol already exists, which has a tiny unnecessary computational cost in case the symbol is new (which is most of the cases); on the other hand, not having "SR.symbols" has another tiny unnecessary computational cost in case the variable was previously defined.

4. The every day user is not aware of the "SR.symbols" dictionary and the retrieving of previously defined symbols. (I made a quick search of Sage's documentation, and there seems to be no reference there, but please correct me if I'm wrong.) This could cause a problem; let me explain. By default, "foo = SR.var('foo')" crates a symbol (and its related Python name) with complex domain. Having the "SR.symbols" dictionary could lead to confusions. For example, consider the following workflow:

foo = SR.var('foo', domain='real')

# Do something meaningful with "foo".

# Do some other things not related to "foo".

# At this point, you forgot that "foo" exists, and now you need a symbol with complex domain.

foo = SR.var('foo')

After this code is executed, I would wrongly assume "foo" is complex. However, since it was retrieved from the cache "SR.symbols", it still has real domain. For the sake of illustration, if after this code you execute "solve(foo^2==-1)", you get "[ ]".

> Do we really need this behavior? Is it "SR.symbols" necessary at all? Could
> we remove it?

I have no idea why the dictionary is public in the first place. It
looks to me like it should be private, and that you should use
SR.symbol("foo") to retrieve the stored object rather than
SR.symbols["foo"].

And yes we should avoid predefining the entire alphabet.

Thank you for your insight! For now, I will rename "SR.symbols" to "SR._symbols_" (so it is a private variable), and I will avoid the predefining of the entire alphabet. It shall be in the git repository in a couple of hours.

kcrisman

unread,

Mar 14, 2021, 1:51:59 PM3/14/21

to sage-devel

This is done in order to have "x" as a default predefined symbolic variable. However, the whole file "sage/calculus/predefined.py" is executed which calls "SR.var()" on every single letter of the alphabet (lowercase and uppercase). So, when one has a freshly started Sage session (without any additional execution), "SR.symbols" already has length 48:

SR.symbols = {'a':a, ...'z':z, 'A':A, ...,'Z':Z}

I wonder if that is a remnant from the short period of time where Sage did predefine all those letters as symbolic variables. (Several competitors have different approaches to this; the compromise that lasted for us - in my view, correctly - was to only predefine x.) I'd be interested in any other archaeology around that, otherwise it seems that this could be shortened, as you say. Variables like phi are more likely than e, for instance. I assume that this SR does not clobber the predefined constant e, for instance.

Nils Bruin

unread,

Mar 14, 2021, 3:25:21 PM3/14/21

to sage-devel

On Saturday, March 13, 2021 at 5:01:46 PM UTC-8 Michael Orlitzky wrote:

This is okay. The symbol object still exists even though you've
clobbered the name that refers to it. Having "foo" in SR.symbols allows
you to retrieve the underlying symbol object again (keeping its domain
intact, for example) with foo = SR.symbol("foo"). You don't need to use
the variable name "foo" to point to the "foo" symbol object, of course.
If you ever start to feel that programming in sage is too easy, try
defining x = SR.symbol("y") and y = SR.symbol("x").

I say this is "okay" because that's how it's intended to work. Maybe on
average that prevents us from garbage collecting a bunch of names for
symbol objects that will never be referenced again. Who knows.

I don't think the "symbols" list would be the only reference that prevents symbols from being garbage collected. Once a symbol has been translated to another system (through an "expect" or a library interface), a dictionary for back-and-forth translation is maintained. Since this tends to straddle *two* memory managers, it tends to be very hard to determine when an object there is ready for collection, so this tends to not happen. I also don't know if pynac/ginac even supports symbol memory deallocation. So having an immortal symbols list on SR may also just be a record of what is happening in reality anyway. If data structures describing a symbol are never going to be deleted anyway, it's probably better to maintain a link to them.

Michael Orlitzky

unread,

Mar 14, 2021, 5:46:18 PM3/14/21

to sage-...@googlegroups.com

On Sun, 2021-03-14 at 04:42 -0700, Diego Sejas wrote:
> 2.If one defined "var('foo')", then overwrites it with "foo = 2", and for
> some reason needs the symbol "foo" again, is there any meaningful advantage
> in retrieving it from the cache "SR.symbols" instead of redefining it again?

I think the answers to all of the other questions hinge on this one,
and personally, I don't know. I'm tempted to say "for speed" but I
really doubt that it is faster unless you're creating new symbols in a
tight loop (why?).

The backend for our symbolics however is written in C. Unlike in
Python, it's common for a C library to say something like "it is an
error to call this function twice with the same symbol name," and you
are expected to Just Not Do That. Storing the existing symbols in a
dictionary may therefore be a way to keep the underlying C library
happy and consistent.

>
> 4. The every day user is not aware of the "SR.symbols" dictionary and the

> retrieving of previously defined symbols....For example, consider the following

> workflow:
>
> foo = SR.var('foo', domain='real')
> # Do something meaningful with "foo".
> # Do some other things not related to "foo".
> # At this point, you forgot that "foo" exists, and now you
> need a symbol with complex domain.
> foo = SR.var('foo')
>
> After this code is executed, I would wrongly assume "foo" is complex.
> However, since it was retrieved from the cache "SR.symbols", it still has
> real domain. For the sake of illustration, if after this code you execute
> "solve(foo^2==-1)", you get "[ ]".
>

We have nothing but bad choices here. Here's another example:

sage: x = SR.var('x', domain='complex')
sage: f = sqrt(x)
sage: x = SR.var('x', domain='real')

If you now expect the underlying symbol object to change its domain,
then you've covertly changed the meaning of "f". Or if you expect to
get back a new, real symbol named x... then you've got two symbols
alive with the same name that live in different spaces.

Some of these options may not be possible given the constraints imposed
by the underlying C library, but what I'm trying to say is, be careful
what you wish for!

Nils Bruin

unread,

Mar 15, 2021, 1:01:21 AM3/15/21

to sage-devel

On Sunday, March 14, 2021 at 2:46:18 PM UTC-7 Michael Orlitzky wrote:

On Sun, 2021-03-14 at 04:42 -0700, Diego Sejas wrote:
> 2.If one defined "var('foo')", then overwrites it with "foo = 2", and for
> some reason needs the symbol "foo" again, is there any meaningful advantage
> in retrieving it from the cache "SR.symbols" instead of redefining it again?

I think the answers to all of the other questions hinge on this one,
and personally, I don't know. I'm tempted to say "for speed" but I
really doubt that it is faster unless you're creating new symbols in a
tight loop (why?).

I'd say there's a benefit in that the symbol object is probably already kept alive in the interface dictionaries. It's better to have it available for reusing on python level, rather than create another python wrapper object that may or may not satisfy the equality properties that you'd want it to have. This is something that could be further investigated, though.

Reply all

Reply to author

Forward