GSoc '21 Proposal

149 views
Skip to first unread message

Mayank Raj

unread,
Apr 10, 2021, 6:47:57 PM4/10/21
to sympy
Hello everyone, I've made a proposal for GSoc '21 on implementing the Simplex method for solving LPP. It would be really helpful if someone could provide feedback on this. Here you can find my proposal.
Thanks

With regards
Mayank Raj

David Bailey

unread,
Apr 11, 2021, 11:26:10 AM4/11/21
to sy...@googlegroups.com
Dear group,

Recently Bruce Allen discussed a problem that he had after pickling a
list. However, I think he revealed a deeper problem that is nothing to
do with pickling.

w=sin(Symbol("x", positive=True))*cos(Symbol("x"))

       sin(x)*cos(x)

test=Integral(w,,x)

        Integral(sin(x)*cos(x), x)

test.doit()

        sin(x)*sin(x)

OK, that was a bit contrived, but it does show how worryingly easy it is
to generate multiple symbols with the same name in the same Python context.

Clearly when processing the first statement, SymPy inserts x into its
data base as a symbol with an assumption. After that it creates a
distinct symbol x without that assumption!

I would have thought it would be kinder to cause an exception in that
case to prevent the ensuing confusion.

David

Oscar Benjamin

unread,
Apr 11, 2021, 1:18:11 PM4/11/21
to sympy
On Sun, 11 Apr 2021 at 16:26, David Bailey <da...@dbailey.co.uk> wrote:
>
> Dear group,
>
> Recently Bruce Allen discussed a problem that he had after pickling a
> list. However, I think he revealed a deeper problem that is nothing to
> do with pickling.
>
> w=sin(Symbol("x", positive=True))*cos(Symbol("x"))
>
> sin(x)*cos(x)
>
> test=Integral(w,,x)
>
> Integral(sin(x)*cos(x), x)
>
> test.doit()
>
> sin(x)*sin(x)
>
> OK, that was a bit contrived, but it does show how worryingly easy it is
> to generate multiple symbols with the same name in the same Python context.

The solution is straight-forward: just call Symbol once for each
symbol that you want to use e.g.:

x = Symbol('x')
w = sin(x)*cos(x)

These kinds of problems come from wanting to construct the symbol in
different places and have it be the same symbol like in:

def func1():
x = Symbol('x')
expr = x**2 + 1
return func2(expr)

def func2(expr):
x = Symbol('x')
return expr.subs(x, 2)

Although that approach sometimes works it won't always work with
assumptions and it's bad practice anyway. The robust solution here is
to create the symbol once and pass the symbol from func1 to func2:

def func1():
x = Symbol('x')
expr = x**2 + 1
return func2(expr, x)

def func2(expr, x):
return expr.subs(x, 2)

While it can be convenient in interactive use I never write code that
depends on being able to reconstruct a symbol (although this is widely
used in the sympy test suite e.g. for the integration constants
returned by dsolve).

> Clearly when processing the first statement, SymPy inserts x into its
> data base as a symbol with an assumption. After that it creates a
> distinct symbol x without that assumption!

Don't think of there being any "database". There is a cache but it is
not supposed to have any noticeable effect except when you compare
objects like "a is b". Demonstration:

In [11]: x1 = Symbol('x')

In [12]: other_symbols = symbols('y:10000') # create other symbols to
blow the cache

In [13]: x2 = Symbol('x')

In [14]: x1 is x2
Out[14]: False

In [15]: x1 == x2
Out[15]: True

Here x1 and x2 are distinct Python objects (x1 is x2 -> False). They
compare as equal in SymPy (x1 == x2 -> True) because they have the
same name and assumptions. SymPy does not keep any registry of the
symbols created, it just compares their properties.

> I would have thought it would be kinder to cause an exception in that
> case to prevent the ensuing confusion.

I'm not sure when you would expect that the exception would be raised
but I don't think there's any way to implement this without causing a
lot of unjustifiable breakage for code that uses sympy as a library.
Suppose I make a library that depends on sympy and defines a function
called do_stuff which internally creates symbols. The do_stuff
function might be like:

def do_stuff():
x = Symbol('x')
r1, r2 = solve(x**2 - 2, x)
return r1, r2

Now a user of this library might do something like:

from sympy import Symbol
from library import do_stuff

x = Symbol('x', positive=True)
roots = do_stuff()

Now at the point when the do_stuff function calls Symbol there already
exists a Symbol with the same name. However these symbols are used in
different parts of the code and have no possibility of interacting. We
couldn't raise an exception at the moment Symbol('x') is created
because that would break all kinds of things.

Alternatively there could be a check for whether symbols with the same
name but different assumptions are present in a given expression.
Checking for that as part of every arithmetic operation would slow
things down considerably though.


Oscar

Bruce Allen

unread,
Apr 12, 2021, 1:53:27 AM4/12/21
to sy...@googlegroups.com
Hi David,

Thanks for picking this up. I wanted to comment that in your example,
the two symbols are defined differently:

Symbol("x", positive=True)
Symbol("x")

that is to say, with different assumptions. For the issue that I was
reporting, the two symbols that were defined with identical assumptions,
but were nevertheless being treated differently.

I do agree that if it's possible for a user to declare two different
symbols that have the same name (as the first argument of Symbol) but
which are not the same, this is likely to lead to confusion. On the
other hand, there must be a notion of the scope of a declaration.
It's helpful (within a function for example) to be able to define a
symbol which has its scope limited to that function, even if (perhaps
unknown to the programmer) the Symbol shares its first argument name
with another Symbol declared in a different scope. Unfortunately I
don't know enough about python and sympy to know if my old-fashioned
"procedural" notions of scope apply here.

Cheers,
Bruce

mit videos

unread,
Apr 12, 2021, 3:53:03 AM4/12/21
to sympy
Hello everyone, I've made a proposal for GSoc'21 on implementing a neural network to solve symbolic integeration problem. It would be really helpful if someone could provide feedback on this. 


Open above and you can find my proposal.
Thanks

with regards
Haoyu Z

David Bailey

unread,
Apr 12, 2021, 6:50:15 AM4/12/21
to 'Bruce Allen' via sympy
On 12/04/2021 06:53, 'Bruce Allen' via sympy wrote:
> Hi David,
>
> Thanks for picking this up.  I wanted to comment that in your example,
> the two symbols are defined differently:
>
> Symbol("x", positive=True)
> Symbol("x")
>
> that is to say, with different assumptions. For the issue that I was
> reporting, the two symbols that were defined with identical
> assumptions, but were nevertheless being treated differently.

Well yes, my example was contrived, as I said. It mimicked what you
encountered because of the pickling fault.

However, my feeling is that some proportion of SymPy users will work
interactively - in one scope - without defining any Python functions. So
they might calculate a polynomial without regard to any assumptions, and
then wish to apply an assumption for one specific calculation and hit
the problem you encountered.

>
> I do agree that if it's possible for a user to declare two different
> symbols that have the same name (as the first argument of Symbol) but
> which are not the same, this is likely to lead to confusion. On the
> other hand, there must be a notion of the scope of a declaration.
I think all you need to do is redefine x using inconsistent assumptions.
This can be done in the same scope - there is no analogy with C or
Fortran, where you can't declare a variable twice in the same scope.
Indeed, you can call Symbol as many times as you like - it doesn't
define anything, it just sets up an instance of a class.

I would have thought that it would not be hard for the SymPy developers
to provide a switch that could fault the situation where a symbol is
redefined with the same name but different assumptions. I fully accept
that this would be mostly useful for interactive work.

I feel that gotchas of this sort could cause some real disaster because
it might be burried in a mass of algebra, and not be noticed before
papers or theses had been written.

David

Bruce Allen

unread,
Apr 12, 2021, 7:03:51 AM4/12/21
to sy...@googlegroups.com
Hi David,

> However, my feeling is that some proportion of SymPy users will work
> interactively - in one scope - without defining any Python functions. So
> they might calculate a polynomial without regard to any assumptions, and
> then wish to apply an assumption for one specific calculation and hit
> the problem you encountered.

Agreed.

>> On the
>> other hand, there must be a notion of the scope of a declaration.

> I think all you need to do is redefine x using inconsistent assumptions.
> This can be done in the same scope - there is no analogy with C or
> Fortran, where you can't declare a variable twice in the same scope.
> Indeed, you can call Symbol as many times as you like - it doesn't
> define anything, it just sets up an instance of a class.

Unfortunately my OO background/knowledge is not strong enough to
understand this properly. Roughly speaking, I know what your words
mean, but not what the implications are.

> I would have thought that it would not be hard for the SymPy developers
> to provide a switch that could fault the situation where a symbol is
> redefined with the same name but different assumptions. I fully accept
> that this would be mostly useful for interactive work.

IMO it would be good to flag ANY instance of a symbol redefinition that
creates a different object than the original one. But if I understood
Oscar's reply to your message, this could add a lot of overhead. I
suppose there a function which is called each time a new Symbol is
instantiated. If that's only called for code that looks like

x = Symbol('x', ...)

then I would not expect a big overhead. But Oscar's reply suggests that
it could be called much more frequently.

> I feel that gotchas of this sort could cause some real disaster because
> it might be burried in a mass of algebra, and not be noticed before
> papers or theses had been written.

Certainly the behavior surprised me, and it means that some of the sympy
code I have written in the past month is wrong. Even though it appears
to work correctly, that's accidental, not by design.

Cheers,
Bruce

Oscar Benjamin

unread,
Apr 12, 2021, 8:01:29 AM4/12/21
to sympy
On Mon, 12 Apr 2021 at 12:03, 'Bruce Allen' via sympy
<sy...@googlegroups.com> wrote:
>
> Hi David,
>
> > However, my feeling is that some proportion of SymPy users will work
> > interactively - in one scope - without defining any Python functions. So
> > they might calculate a polynomial without regard to any assumptions, and
> > then wish to apply an assumption for one specific calculation and hit
> > the problem you encountered.
>
> Agreed.
>
> >> On the
> >> other hand, there must be a notion of the scope of a declaration.
>
<snip>
>
> > I would have thought that it would not be hard for the SymPy developers
> > to provide a switch that could fault the situation where a symbol is
> > redefined with the same name but different assumptions. I fully accept
> > that this would be mostly useful for interactive work.
>
> IMO it would be good to flag ANY instance of a symbol redefinition that
> creates a different object than the original one. But if I understood
> Oscar's reply to your message, this could add a lot of overhead. I
> suppose there a function which is called each time a new Symbol is
> instantiated. If that's only called for code that looks like
>
> x = Symbol('x', ...)
>
> then I would not expect a big overhead. But Oscar's reply suggests that
> it could be called much more frequently.

It wouldn't be hard to make any new definition of a Symbol with the
same name as a previously created symbol raise an error but it would
break the assumption that it is okay to define a symbol that is only
used local to some context and that assumption is depended on by many
users and downstream libraries and is also used internally by sympy
itself.

I just checked with grep and there are hundreds of lines in sympy's
internal code using either Symbol or symbols to define some symbol.
That's an underestimate because there are other functions that can
create symbols as well. I expect that without some changes it wouldn't
even be possible to import sympy if an error was raised any time two
symbols have the same name.

It would also mean that if you import the sympy.abc module then that
would conflict with defining any single letter symbol name and giving
it any assumptions e.g.:

from sympy import Symbol
from sympy.abc import x
y = Symbol('y', real=True) # This would raise because abc already
defined Symbol('y')

What I said about the overhead is that when creating an expression like

x = Symbol('x')
x2 = Symbol('x', positive=True)
expr = x*x2

it would be possible to check here that expr contains two different
symbols having the same name. However that would need to be checked in
the evaluation of x*x2 and then also in the evaluation of cos(x) +
3*sin(x2) etc. We would need to walk the expression tree looking for
symbols with the same name every time any operation constructs a new
expression which would be too expensive.

> > I feel that gotchas of this sort could cause some real disaster because
> > it might be burried in a mass of algebra, and not be noticed before
> > papers or theses had been written.
>
> Certainly the behavior surprised me, and it means that some of the sympy
> code I have written in the past month is wrong. Even though it appears
> to work correctly, that's accidental, not by design.

I think that the root of the surprise here is the fact that it
sometimes works. Actually it is better to never depend on independent
calls to Symbol giving interchangeable results regardless of the names
of the symbols. All symbols in your calculation should just be defined
in one place and then if you need to use them somewhere else (e.g. in
a function) then you should pass them through.

Of course pickle does need to depend on this and so we need to fix the
pickling code to do this correctly. I sent a pull request here that
fixes the original issue that Bruce had although it disables pickling
with older versions of the pickle protocol:
https://github.com/sympy/sympy/pull/21260


Oscar

Bruce Allen

unread,
Apr 12, 2021, 8:23:08 AM4/12/21
to sy...@googlegroups.com
Hi Oscar,

> It wouldn't be hard to make any new definition of a Symbol with the
> same name as a previously created symbol raise an error but it would
> break the assumption that it is okay to define a symbol that is only
> used local to some context and that assumption is depended on by many
> users and downstream libraries and is also used internally by sympy
> itself.

This context what I meant when I was talking about the 'scope' of a
variable, and now it makes sense to me. Is it right that this function
is "broken":

def my_power(n):
x=Symbol('x')
expr = x**n
return expr

because the scope of x is lost on function return? But this function is OK:

def my_power(x, n):
expr = x**n
return expr

because when the second function is called, the variable x is already
defined in the scope/context of the calling function?

> What I said about the overhead is that when creating an expression like
>
> x = Symbol('x')
> x2 = Symbol('x', positive=True)
> expr = x*x2
>
> it would be possible to check here that expr contains two different
> symbols having the same name. However that would need to be checked in
> the evaluation of x*x2 and then also in the evaluation of cos(x) +
> 3*sin(x2) etc. We would need to walk the expression tree looking for
> symbols with the same name every time any operation constructs a new
> expression which would be too expensive.

Wouldn't it be enough if the second line above:
x2 = Symbol('x', positive=True)
issued a warning message to the user, saying that "Symbol('x', ...) was
called more than once with the same name? Or is that what would break
existing code/libraries?

>> Certainly the behavior surprised me, and it means that some of the sympy
>> code I have written in the past month is wrong. Even though it appears
>> to work correctly, that's accidental, not by design.

> I think that the root of the surprise here is the fact that it
> sometimes works. Actually it is better to never depend on independent
> calls to Symbol giving interchangeable results regardless of the names
> of the symbols. All symbols in your calculation should just be defined
> in one place and then if you need to use them somewhere else (e.g. in
> a function) then you should pass them through.

I'll take a closer look at this code. What you write makes sense to me
for code with independent scope/context. What I had not appreciated was
that this could cause an issue in code that shares a context (for
example, in a single interactive session).

> Of course pickle does need to depend on this and so we need to fix the
> pickling code to do this correctly. I sent a pull request here that
> fixes the original issue that Bruce had although it disables pickling
> with older versions of the pickle protocol:
> https://github.com/sympy/sympy/pull/21260

I'm sorry that I have not tested this yet. I'm trying hard to get a
paper finished, and that makes it hard to focus on other things.

Cheers,
Bruce

Oscar Benjamin

unread,
Apr 12, 2021, 9:41:19 AM4/12/21
to sympy
On Mon, 12 Apr 2021 at 13:23, 'Bruce Allen' via sympy
<sy...@googlegroups.com> wrote:
>
> Hi Oscar,
>
> > It wouldn't be hard to make any new definition of a Symbol with the
> > same name as a previously created symbol raise an error but it would
> > break the assumption that it is okay to define a symbol that is only
> > used local to some context and that assumption is depended on by many
> > users and downstream libraries and is also used internally by sympy
> > itself.
>
> This context what I meant when I was talking about the 'scope' of a
> variable, and now it makes sense to me.

I think you misunderstand how this works in Python. I'm going to guess
that you are more familiar with C and describe this in those terms.

In Python there are objects and then there are names. The Python
expression Symbol('x') creates an object. The Python statement y =
Symbol('x') binds the name y to that object within the current scope
or namespace e.g.:

# bind the name y in the module scope:
y = Symbol('x')

def f():
# bind the name t in the local scope of the function f
t = Symbol('x')
return t

z = f()

Variable names are scoped but objects are not and reside in a global
space. Each call to Symbol('x') adds a new object to that global space
(ignoring SymPy's cache for the moment). In C terms the object itself
is a heap-allocated struct. Binding a name is like making a pointer
point at the struct. Returning from a function actually returns the
pointer. In the above t and z are different pointers to the same
struct referenced by different names in different scopes.

The code implementing Symbol('x') has no way to know the scope of the
variable (pointer) that it is being assigned to. There is no way in
SymPy to know that y and t above are names in different scopes.
Likewise in C I can make a function that returns a pointer to a
heap-allocated object but then there is no way for me to keep track of
what a user does with that pointer:

object *x = make_heap_object();
object *y = x; /* the author of make_heap_object has no control over this */

Within SymPy we can not distinguish where in the Python code SymPy
expressions are being used. We can only look at the values stored in
the "struct" when a user calls a SymPy function.

Note that I describe this in terms of structs and pointers as an
analogy but if you use the standard CPython interpreter then that is a
C program and this is literally how it is implemented under the hood.

> Is it right that this function
> is "broken":
>
> def my_power(n):
> x=Symbol('x')
> expr = x**n
> return expr
>
> because the scope of x is lost on function return? But this function is OK:
>
> def my_power(x, n):
> expr = x**n
> return expr
>
> because when the second function is called, the variable x is already
> defined in the scope/context of the calling function?

It's not necessarily broken. That depends on the context. Within the
SymPy codebase this would be considered bad practice because there's
no way to know if the user is already using a symbol called `x`.
Instead either the user should be able to specify the symbol or at
least a Dummy symbol should be used e.g.:

In [4]: minpoly(sqrt(2))
Out[4]:
2
x - 2

In [5]: [sym] = minpoly(sqrt(2)).free_symbols

In [6]: sym
Out[6]: x

In [7]: type(sym)
Out[7]: sympy.core.symbol.Dummy

In [8]: sym == Symbol('x')
Out[8]: False

In [9]: sym == Dummy('x') # Dummy behaves different to Symbol
Out[9]: False

In [10]: minpoly(sqrt(2), y)
Out[10]:
2
y - 2

> > What I said about the overhead is that when creating an expression like
> >
> > x = Symbol('x')
> > x2 = Symbol('x', positive=True)
> > expr = x*x2
> >
> > it would be possible to check here that expr contains two different
> > symbols having the same name. However that would need to be checked in
> > the evaluation of x*x2 and then also in the evaluation of cos(x) +
> > 3*sin(x2) etc. We would need to walk the expression tree looking for
> > symbols with the same name every time any operation constructs a new
> > expression which would be too expensive.
>
> Wouldn't it be enough if the second line above:
> x2 = Symbol('x', positive=True)
> issued a warning message to the user, saying that "Symbol('x', ...) was
> called more than once with the same name? Or is that what would break
> existing code/libraries?

I think that would lead to a whole load of warnings from sympy itself
let alone other libraries. This would give out warnings for code that
works perfectly fine. I would consider this a breaking change.


Oscar

Aaron Meurer

unread,
Apr 12, 2021, 6:27:42 PM4/12/21
to sympy
The best way to avoid this issue is to be hygienic in how you define
Symbols. My recommended best practices would be

- Always define symbols at the top of your file/notebook, or top of
the function if your use of sympy is restricted to a single function.
- Assign symbols to variables. Don't inline Symbol('x') in an
expression, but rather define x = symbols('x') first. Note that you
can define multiple symbols with the same assumptions on a single line
with the symbols() function.
- Always keep the same assumptions for any given symbol name. So for
instance if t = Symbol('t', positive=True) in one place, you should
always make it positive. Which variables will have which assumptions
will depend on your application.
- Name your symbol variables the same as your symbols, or as something
reasonably close if the symbol name isn't a valid variable name.
- Never overwrite a variable name assigned to a symbol. For example,
don't do this

a = Symbol('a')
...
<some stuff>
...
a = 1.1

This makes it impossible to access the symbol 'a' without recreating
it again. Use a different variable name, or consider storing values in
a dictionary, like {a: 1.1} (note that you can pass dictionaries to
subs()).

There's also a confusion I've noticed with this, where people will do
something like this

a, x = symbols('a x', real=True)
a = 1.1
expr = a*x**2

The a = symbols(...) here does nothing, because it is immediately
overwritten by a = 1.1. The fact that this sets 'a' as real is
irrelevant. More correct could would be

x = symbols('x', real=True)
a = 1.1
expr = a*x**2

Or if you wish for 'a' to be symbolic for part of the calculation then
later replaced with a numeric value,

a, x = symbols('a x', real=True)
expr = a*x**2
<symbolic calculations>
evaluated_expr = expr.subs(a, 1.1)

As Oscar noted, SymPy doesn't have a "database" or anything like that.
Symbols are defined as independent objects. On the line 'a = 1.1', the
symbol 'a' is deleted and the variable 'a' is set to 1.1. Creating a
symbol with certain assumptions has no effect on any other object.
SymPy functions are generally speaking, side effect free.

Remember that SymPy objects, including symbols, do not and cannot know
anything about the Python variable names they are assigned to. They
are just objects, and Python variables are names that point to them.
Read https://docs.sympy.org/latest/tutorial/gotchas.html and
https://nedbatchelder.com/text/names.html if you are confused about
this.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/e008170c-fc64-8101-3f03-9f988429c848%40dbailey.co.uk.

Chris Smith

unread,
Apr 13, 2021, 4:51:49 PM4/13/21
to sympy
And if you have any doubt about whether clashing symbols are being used, you can use `disambiguate`:

```
>>> eq=var('x')*var('x',positive=1)
>>> eq
x*x
>>> from sympy.core.symbol import disambiguate
>>> disambiguate(eq)
(x*x_1,)
```
/c
Reply all
Reply to author
Forward
0 new messages