finding non numerical values in a pandas series

1,062 views
Skip to first unread message

Michael Schmidt

unread,
Oct 27, 2011, 7:14:20 PM10/27/11
to pystatsmodels
I'm having some trouble interpolating each of the columns of a pandas DataFrame that I have.  It seems that pandas thinks that 3 out of 4 columns have some kind of non-numerical objects stored in them - I'm guessing this is the case because when I use DataFrame.values to export the data as a numpy array, the dtype of that array is 'object', and because the interpolate function fails on 3 out of 4 columns by raising a TypeError.

Is there a way to quickly check a DataFrame/Series to make sure that all values are of type float/int?  Is there a way to force that to be the case?  I figured that doing a fillna() on my DataFrame would fill all the holes, but I guess that's not the case.

I've attached a shelve containing a DataFrame (called 'df') to this email.  The offending columns in this are: "Dew Point", "Relative Humidity" and "Wet Bulb", while "Dry Bulb" works just fine for interpolation.  Keep in mind that I've already done fillna() on a previous DataFrame object to obtain this one.

I'd like to ultimately be able to export the data contained in this DataFrame as a structured numpy array, in order to maintain backwards compatibility with some old code of mine.

Thanks,
Mike

pandas_example

Wes McKinney

unread,
Nov 5, 2011, 6:57:44 PM11/5/11
to pystat...@googlegroups.com

Hi Michael,

well, indeed if you have a mixed-type DataFrame, grabbing frame.values
will return an object array.

In this case, maybe the best solution is just to cast back to float:

df = df.astype(float)

In [19]: df
Out[19]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 10824 entries, 2010-06-01 00:00:00 to 2011-08-25 23:00:00
offset: <1 Hour>
Data columns:
Dew Point 10824 non-null values
Dry Bulb 10824 non-null values
Relative Humidity 10824 non-null values
Wet Bulb 10824 non-null values
dtypes: float64(1), object(3)

In [20]: df.astype(float)
Out[20]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 10824 entries, 2010-06-01 00:00:00 to 2011-08-25 23:00:00
offset: <1 Hour>
Data columns:
Dew Point 10824 non-null values
Dry Bulb 10824 non-null values
Relative Humidity 10824 non-null values
Wet Bulb 10824 non-null values
dtypes: float64(4)

I would like to add a function to do type inference on object columns
to convert columns back to the right types. I have all the machinery
to do that already (from file parsing, etc.) but not sure what the API
should look like.

Note that you can do:

df.to_records()

to convert that data to a record array. it has some other options,
check the docstring (like you can exclude the index from the returned
array, etc.)

- Wes

Wes McKinney

unread,
Nov 5, 2011, 6:59:41 PM11/5/11
to pystat...@googlegroups.com

created an issue here to remind me

https://github.com/wesm/pandas/issues/339

Reply all
Reply to author
Forward
0 new messages