eval(repr(df)) <- can this be feasible?

41 views
Skip to first unread message

Jacob Stevens-Haas

unread,
May 14, 2021, 11:04:26 PM5/14/21
to PyData
Hi all,

Would it be desirable to start allowing repr to follow this convention when an option is set for it, at least for simple arrays?

I built an integration test for some processing steps that take a fairly large input, clean it, train a model on it, and make predictions.  The output is a dataframe that would be difficult to hand-enter into our test module.  I was hoping to debug, copy the repr of `result`, and paste it into `expected = eval(<copied string>)`

Obviously, this didn't work, I think because pandas uses repr primarily to prepare output for display.  Nevertheless, could it be possible, at least for simple arrays?  I tried:

```
import pandas as pd
import numpy as np

temp = pd.DataFrame([[1,2],[3,4]], columns = ['A','B'], index = ['i1', 'i2'])


def optimistic_repr(df):
    col_rep = repr(df.columns)
    ind_rep = repr(df.index)
    arr_rep_raw = repr(df.values)
    replace = lambda m: 'numpy.'+m.group(1)
    pattern = '(' + '(?!=)|'.join(numpy.__dict__.keys()) + '(?!=))'
    arr_rep = re.sub(pattern, replace, arr_rep_raw)
    return f"DataFrame({arr_rep}, columns={col_rep}, index={ind_rep})"


temp_repr = optimistic_repr(temp)
```
temp is
  A  B
i1 1 2
i2 3 4

the optimistic repr is:
"DataFrame(numpy.array([[1, 2],\n [3, 4]], dtype=numpy.int64), columns=Index(['A', 'B'], dtype='object'), index=Index(['i1', 'i2'], dtype='object'))"

Then the following code works as expected:
```
from pandas import DataFrame, Index
import numpy
eval(temp_repr)
```

More complicated objects would be trickier (doesn't even work for MultiIndex alone, let alone a DataFrame with a MultiIndex).

-Jake

Dennis O'Brien

unread,
May 15, 2021, 1:16:04 PM5/15/21
to PyData
Hi Jake,

Is your test framework able to read from static files?  If so, saving the dataframe as a Parquet file might be your best approach.  Deserializing from the repr is interesting, but there are more robust ways to do serialization/deserialization that avoids string representations.

This doesn't help with your original question, which _is_ an interesting one.

cheers,
Dennis

Jacob Stevens-Haas

unread,
May 15, 2021, 6:13:43 PM5/15/21
to PyData
Thanks Dennis!  I've serialized data to the disk with a parquet before, but that loses the readability advantages of the string representation.  To clarify: The docs for repr() indicate that the idiom

eval(repr(obj))

is useful and often desired.  My use case was the first time I actually wanted to use it, but I can't imagine I'm the first person to ask pandas about it.  Didn't want to start with a GH issue without asking people here first, though.

Reply all
Reply to author
Forward
0 new messages