Bug: TypeError when plotting TimeSeries with NaN data

1,312 views
Skip to first unread message

meelmaar

unread,
Oct 23, 2013, 10:19:19 AM10/23/13
to pyd...@googlegroups.com
I get a TypeError message when I try to plot a pandas TimeSeries object that contains NaN's. The funny thing is, this does not happen when I have 'inf' values, or when a normal series is used with integer index values. Now, all of a sudden (no recent updates I can think of) all my plotting scripts get the TypeError, but before they worked fine. I can do dropna() before, but then missing data is interpolates in the plot, which I do not want; the data is after all missing... The following code reproduces the bug:

import numpy as np

import pandas as pd


s = pd.Series(np.arange(10.))

rng = pd.date_range('2012-12-12', periods=10, freq='H')
ts = pd.Series(np.arange(10.), index=rng)


ts.plot() and s.plot() work fine


now, if I replace one value with an NaN

ts[2] = np.NaN

ts.plot() give the following error:

File "C:\Python27\lib\site-packages\pandas\core\series.py", line 981, in reshape return ndarray.reshape(self, newshape, order)


It I do ts[2] = np.inf, ts.plot() works...


I am using pandas 0.12 with matplotlib 1.3 on python x,y 2.7.5.1 on Windows 7.
Upgrading to matplotlib 1.3.1 or downgrading to pandas 0.11 did not solve the problem.

Happy to provide more details. Looking forward to your help!

Maarten

Jeff

unread,
Oct 23, 2013, 10:39:14 AM10/23/13
to pyd...@googlegroups.com
can you post an issue on github with this example...thxs

meelmaar

unread,
Oct 24, 2013, 4:36:00 AM10/24/13
to pyd...@googlegroups.com
I have posted it as an issue on GitHub; https://github.com/pydata/pandas/issues/5310

I understand now that this is the preferred platform for solving and discussing these type of problems. Sorry for bother the PyData group with this.

Cheers

dartdog

unread,
Oct 25, 2013, 10:20:18 PM10/25/13
to pyd...@googlegroups.com
I think not a time in time series for pandas is NaT that should fix..

meelmaar

unread,
Oct 28, 2013, 6:37:55 AM10/28/13
to pyd...@googlegroups.com
Good idea dartdog! I just tried to get an NaT in my Series index but didnot manage, I could get a None in de index, or a NaT in a Series data and use this for an index and then getting the date 0001-255-255 00:00:00, which then causes a TypeError similar to the problem I posted above:

File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number

The GitHub issue is closed, but maybe we should make a new one for plotting TimeSeries with NaT, NaN or None in their Index? I suggest the behavior should be ommiting plotting the datapoint, because 1 of the x,y coordinates is missing...

Jeffrey Tratner

unread,
Oct 28, 2013, 8:28:56 AM10/28/13
to pyd...@googlegroups.com

2 things -

1. If this issue isn't resolved, it's fine to reopen it.
2. Can you post the full traceback of that last error? That would make it much easier to determine what's going on...

dartdog

unread,
Oct 29, 2013, 9:20:18 AM10/29/13
to pyd...@googlegroups.com
Did you convert as follows?
In [16]: to_datetime(Series(['Jul 31, 2009', '2010-01-10', None]))

0   2009-07-31 00:00:00
1   2010-01-10 00:00:00
2                   NaT

meelmaar

unread,
Oct 30, 2013, 11:15:51 AM10/30/13
to pyd...@googlegroups.com
Sorry, can't recall how I did it but repeated your method but with 2 more dates:

In [18]: x = pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None, '2010-07-31', '2011-10-10']))


In [19]: x

Out[19]:

0 2009-07-31 00:00:00

1 2010-01-10 00:00:00

2 NaT

3 2010-07-31 00:00:00

4 2011-10-10 00:00:00

dtype: datetime64[ns]


now x.plot() give a good plot with the lines ending at x=1 and x=3 (gap is not filled)


In [20]: y=pd.Series(data=x.index, index=x.values)


In [21]: y

Out[21]:

2009-07-31 0

2010-01-10 1

NaT 2

2010-07-31 3

2011-10-10 4

dtype: int64


This is good, I have a NaT in the Series index (and not the date 0001-255-255 00:00:00 I got earlier). Surely the pd.to_date() method is doing its job better than whatever method I used.
Plotting y.plot():

In [22]: y.plot()

Traceback (most recent call last):


File "<ipython-input-22-2d08882c171f>", line 1, in <module>

y.plot()


File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 1730, in plot_series

plot_obj.generate()


File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 856, in generate

self._make_plot()


File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 1268, in _make_plot

newline = plotf(*args, **kwds)[0]


File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 4138, in plot

self.add_line(line)


File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 1497, in add_line

self._update_line_limits(line)


File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 1508, in _update_line_limits

path = line.get_path()


File "C:\Python27\lib\site-packages\matplotlib\lines.py", line 743, in get_path

self.recache()


File "C:\Python27\lib\site-packages\matplotlib\lines.py", line 420, in recache

x = np.asarray(xconv, np.float_)


File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray

return array(a, dtype, copy=False, order=order)


TypeError: float() argument must be a string or a number


Using pylab's plot() or plot_date() on y or pairs of y.index.values and y.values, gives a figure but with the all datapoints plotted, which I don't think is proper default behavior.

My conclusions;
1. Using of NaT Series to Series.index can be good if one uses the pd.to_date() method. Other method produce strange results
2. NaT values in Series.index are not properly handled with pandas plot() function
3. Error may already be caused by funny matplotlib behavior

Further remarks; using matplotlib 1.3.0

Goyo

unread,
Nov 4, 2013, 5:21:18 AM11/4/13
to pyd...@googlegroups.com


El miércoles, 30 de octubre de 2013 16:15:51 UTC+1, meelmaar escribió:

My conclusions;
1. Using of NaT Series to Series.index can be good if one uses the pd.to_date() method. Other method produce strange results

Not sure what happened here, but it might be related to te fact that numpy.datetime64 does not provida a NA value.
 
2. NaT values in Series.index are not properly handled with pandas plot() function
3. Error may already be caused by funny matplotlib behavior

In my own tests both pandas and matplotlib fail when there are NA values in a date-like X axis. See http://nbviewer.ipython.org/7300408
Not sure how/if both issues are related though.

Goyo

Jeffrey Tratner

unread,
Nov 4, 2013, 7:08:11 AM11/4/13
to pyd...@googlegroups.com

Do you mean pandas' NaT(datetime) or literal nan (float)?

Goyo

unread,
Nov 5, 2013, 7:56:14 AM11/5/13
to pyd...@googlegroups.com

El lunes, 4 de noviembre de 2013 13:08:11 UTC+1, Jeff Tratner escribió:

Do you mean pandas' NaT(datetime) or literal nan (float)?


With "date-like X axis" I mean a DatetimeIndex with NA values so pandas' NaT.

t = pd.DatetimeIndex(
    ['2013-01-01', '2013-01-02', '2013-01-03', None, '2013-01-05', '2013-01-06', '2013-01-07']
)  #None becomes NAT in the index.
plt.plot(t, y)  #this works but the actual drawing raises an exception.
plt.draw()  #raises an exception
pd.Series(data=y, index=t).plot() #raises an exception

See the linked notebook for backtraces
http://nbviewer.ipython.org/7300408

The issue using bare matplotlib is actually to be expected, the functions in matplotlib.dates always expect valid dates an do not deal with any kind of NA values.

Of course datetime64('NaT') is meant to be sort of a NA value for datetime64 but not in the same sense than nan (for example it equals to itself). It is not the same thing as pandas NaT either. Sorry for not being clearer before.

Goyo
Reply all
Reply to author
Forward
0 new messages