Support for using Symbols as Pandas indexes

45 zobrazení
Přeskočit na první nepřečtenou zprávu

Nowan Ilfideme

nepřečteno,
9. 2. 2018 16:51:2609.02.18
komu: sympy
Hi, I ran into problems while trying to use Symbols as columns for a pandas.DataFrame *.
This is probably not relevant to SymPy development itself, but rather just use.
This seems to be a problem with both frameworks treating callable() objects specially:

>>> import pandas as pd
>>> from sympy.abc import x, y

>>> gg = pd.DataFrame([[1,2],[3,3],[4,5]], columns=[x,y])

>>> print(gg[[x]]) # works fine, but returns a DataFrame
   x
0  1
1  3
2  4
>>> print(gg[x])
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
 
File "...\pandas\core\frame.py", line 1941, in __getitem__
    key
= com._apply_if_callable(key, self)
 
File "...\pandas\core\common.py", line 441, in _apply_if_callable
   
return maybe_callable(obj, **kwargs)
 
File "...sympy\core\symbol.py", line 158, in __call__
   
return Function(self.name)(*args)
 
File "...\sympy\core\function.py", line 761, in __new__
    obj
= super(AppliedUndef, cls).__new__(cls, *args, **options)
 
File "...\sympy\core\function.py", line 431, in __new__
    pr
= max(cls._should_evalf(a) for a in result.args)
 
File "...\sympy\core\function.py", line 431, in <genexpr>
    pr
= max(cls._should_evalf(a) for a in result.args)
 
File "...\sympy\core\function.py", line 449, in _should_evalf
   
if arg.is_Float:
 
File "...\pandas\core\generic.py", line 3081, in __getattr__
   
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'is_Float'

The way to get a pd.Series would be:
>>> gg[[x]].iloc[:,0] # super uncomfortable...
0    1
1    3
2    4
Name: x, dtype: int64


Is there any reasonable way this interaction could be worked around? It seems like a problem on the Pandas end, or maybe even designed functionality, since functions work the same way (but fail faster):

>>> fg = pd.DataFrame([[1,2],[3,3],[4,5]], columns=[f,g])
>>> fg[[f]]
   
<function f at 0x000000000F79BEB8>
0                                   1
1                                   3
2                                   4
>>> fg[f]
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
 
File "...\pandas\core\frame.py", line 1941, in __getitem__
    key
= com._apply_if_callable(key, self)
 
File "...\pandas\core\common.py", line 441, in _apply_if_callable
   
return maybe_callable(obj, **kwargs)
TypeError: f() takes no arguments (1 given)

But is there any way I can check whether the inputs are Symbols and "freeze" them so that they become uncallable? Maybe some other workaround? Thanks in advance.



* Why am I doing this? Partially due to curiosity, and partially because my users might end up doing the same. It turns out that using Symbols as variable placeholders in a data science workflow might be useful.

Aaron Meurer

nepřečteno,
9. 2. 2018 17:18:1009.02.18
komu: sy...@googlegroups.com
Symbol objects being callable is something that we've wanted to remove
for some time, but I don't know if it will happen any time soon.
https://github.com/sympy/sympy/issues/3539

I don't know of any simple workarounds. My first thought was to use a
subclass, but I can't figure out how to make a subclass of a
callable() class not callable(). And you can't del Symbol.__call__
because it uses __slots__.

The only thing I can recommend is to use any other kind of expression
than Symbol. Even Function('f')(x) isn't callable.

It would be nice if pandas had some way to indicate that an object
shouldn't be considered callable. I looked at the source and it
doesn't like there is one (it just checks callable(),
https://github.com/pandas-dev/pandas/blob/6485a36483884fb817800a8380a4a4197d6df4ad/pandas/core/common.py#L470).
Perhaps you should open a pandas issue about it.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/1f26bab1-a623-4054-b392-c1e78f0ac9a7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Nowan Ilfideme

nepřečteno,
12. 2. 2018 5:21:5512.02.18
komu: sympy
Thanks, understood. I wanted to be able to use symbols like "variables" and, at some point, allow transformations via SymPy. As a terrible example:
z = x * log(y)
smart_func
(df, z)
print(df[z])
That's not terribly important, however, mostly just convenience.

I submitted an issue here, maybe it will prove a simple fix: https://github.com/pandas-dev/pandas/issues/19654
Odpovědět všem
Odpověď autorovi
Přeposlat
0 nových zpráv