zipline with pandas datareader

818 views
Skip to first unread message

Fab6

unread,
Dec 13, 2012, 3:25:52 PM12/13/12
to zip...@googlegroups.com
Hello

I am testing the pandas datareader together with the dual_moving_average-example (please see attached example):

    data=DataReader( 'AAPL' , 'yahoo' , start= pd.datetime(1999, 1, 1, 0, 0, 0, 0, pytz.utc))

Unfortunately I get this error message:

Error message
====================================================================================================
Traceback (most recent call last):
  File "dual_moving_average_pandas.py", line 77, in <module>
    results = dma.run(data)
  File "build\bdist.win32\egg\zipline\algorithm.py", line 186, in run
  File "build\bdist.win32\egg\zipline\utils\factory.py", line 114, in create_trading_environment
  File "build\bdist.win32\egg\zipline\finance\trading.py", line 86, in __init__
  File "datetime.pyx", line 405, in pandas.lib._Timestamp.__richcmp__ (pandas\src\tseries.c:31147)
  File "datetime.pyx", line 455, in pandas.lib._Timestamp._assert_tzawareness_compat (pandas\src\tseries.c:31754)
Exception: Cannot compare tz-naive and tz-aware timestamps

Compilation exited abnormally with code 1 at Thu Dec 13 21:18:52


Does anyone have a hint what I am doing wrong?
Best Regards
Fab
dual_moving_average_pandas.py

Thomas Wiecki

unread,
Dec 13, 2012, 5:11:29 PM12/13/12
to Fab6, zip...@googlegroups.com
Hi Fab,

I think that's an out of date pandas. Can you try upgrading to 0.9.1? For good measure, try updating numpy as well (if you haven't). Let us know if that doesn't help!

Thomas


--
 
 

Fabian Braennstroem

unread,
Dec 14, 2012, 12:20:48 AM12/14/12
to zip...@googlegroups.com, Thomas Wiecki
Hello Thomas,

thanks for the quick reply! I tried this right now... unfortunately I get now this error after updating to 0.9.1 and numpy to 1.7.0b1:


====================================================================================================
Traceback (most recent call last):
  File "dual_moving_average_pandas.py", line 78, in <module>

    results = dma.run(data)
  File "build\bdist.win32\egg\zipline\algorithm.py", line 186, in run
  File "build\bdist.win32\egg\zipline\utils\factory.py", line 114, in create_trading_environment
  File "build\bdist.win32\egg\zipline\finance\trading.py", line 86, in __init__
  File "datetime.pyx", line 397, in pandas.lib._Timestamp.__richcmp__ (pandas\src\tseries.c:31200)
TypeError: can't compare offset-naive and offset-aware datetimes

Do you have an idea?

Best Regards
Fab

Michael Wills

unread,
Dec 14, 2012, 11:29:50 AM12/14/12
to Fabian Braennstroem, zip...@googlegroups.com, Thomas Wiecki
I had that issue when I pulled in my own data before it could be used. You're only using zipline data though, correct?


--
 
 

Fabian Braennstroem

unread,
Dec 14, 2012, 2:28:32 PM12/14/12
to Thomas Wiecki, zip...@googlegroups.com
Hello Thomas,

thanks for the help! Though I get this. Do you have an idea?


====================================================================================================
Traceback (most recent call last):
  File "dual_moving_average_pandas.py", line 73, in <module>
    data.tz_convert('utc')
  File "d:\Python27\lib\site-packages\pandas-0.9.1-py2.7-win32.egg\pandas\core\generic.py", line 946, in tz_convert
    new_ax = ax.tz_convert(tz)
  File "d:\Python27\lib\site-packages\pandas-0.9.1-py2.7-win32.egg\pandas\tseries\index.py", line 1290, in tz_convert
    raise Exception('Cannot convert tz-naive timestamps, use '
Exception: Cannot convert tz-naive timestamps, use tz_localize to localize

Best Regards
Fabian

Am 14.12.2012 13:12, schrieb Thomas Wiecki:
Hm, then it probably is an honest tz mismatch. Can you try .tz_convert('utc) on your dataframe before you pass it? If that works we should probably do that conversion by default.

Fabian Braennstroem

unread,
Dec 14, 2012, 2:30:24 PM12/14/12
to Michael Wills, zip...@googlegroups.com, Thomas Wiecki
Hello Michael,

I do not understand... I am actually planing to read in hdf5 data, but as I got similar problems there I reduced it to the pandas reader.
What do you mean with using zipline data?

Best Regards
Fabian

Michael Wills

unread,
Dec 15, 2012, 12:20:08 AM12/15/12
to Fabian Braennstroem, zip...@googlegroups.com, Thomas Wiecki
It's basically what Thomas said but you probably need that localize step as well. My broker data is US/Eastern but pydata didn't autoso before I could convert it to UTC I had to localize it first so the converter would know how to handle the conversion. So in my case I needed:

datetime_index.tz_localize('US/Eastern').tz_convert('UTC')

Fabian Braennstroem

unread,
Dec 15, 2012, 12:45:32 AM12/15/12
to Michael Wills, zip...@googlegroups.com, Thomas Wiecki
Hello Michael,

thanks for the quick help! It seems, that it is not working yet... I think I need to start understanding the time handling in pandas by reading a bit more.
Though if you have more hints I am happy to read them.

Best Regards
Fabian

Michael Wills

unread,
Dec 15, 2012, 1:24:53 AM12/15/12
to Fabian Braennstroem, zip...@googlegroups.com, Thomas Wiecki
Time handling is great in pandas. But you do have to convert it to what it expects. My data is OHLC data with a non standard time format and it's in 2 columns actually with date and time. I load data with pandas, This is what I go through

loaded_data has a Date and a Time column

loaded_data['dt'][i] = datetime.strptime(loaded_data['Date'][i] + " " + loaded_data['Time'][i] + ":00", '%m-%d-%Y %H:%M:%S') # make the raw date time data python date time data
del loaded_data['Date'] #cleanup
del loaded_data['Time']
loaded_data['Date'] = loaded_data['dt']
loaded_data.index = loaded_data['Date'] #make the index of the dataset your python date time data
loaded_data.index = tseries.index.DatetimeIndex(data=loaded_data.index).tz_localize('US/Eastern').tz_convert('UTC') # now get the timezone right. tseries is from pandas so you'll need a {code}from pandas import *{code} to get it
loaded_data[symbol] = loaded_data.Close
loaded_data.save('data.dat') #cache it for later

I hope that helps. I'm still new to pandas and haven't had much time to work with it yet as I will be testing it with currencies instead of stocks. Leverage, margin, liquidity are a bit different so I have some work yet to get it to run as expected.

Let me know if that helps though or if it gets you closer. Otherwise, could you post and data that resembles what you're working with (just matching the formatting) and other code in use?

Thomas Wiecki

unread,
Dec 15, 2012, 10:05:33 AM12/15/12
to Michael Wills, Fabian Braennstroem, zipline
I think you can just call .tz_localize('utc') instead of .tz_convert() on the pandas dataframe. Does that not work?

Fabian Braennstroem

unread,
Dec 16, 2012, 9:13:12 AM12/16/12
to Thomas Wiecki, Michael Wills, zipline
Hello,

it works now with this part:
    symbol='AAPL'
    data=DataReader( 'AAPL' , 'yahoo' , start= pd.datetime(2012, 1, 1, 0, 0, 0, 0),end=pd.datetime(2012,12,1,0,0,0,0))
    print 100*"-"
    data=pd.DataFrame(data['Close'],columns=['AAPL'])
    # data = load_from_yahoo(stocks=[symbol], indexes={}, start= pd.datetime(2012, 1, 1, 0, 0, 0, 0, pytz.utc), end= pd.datetime(2012, 12, 14, 0, 0, 0, 0, pytz.utc))
    data=data.tz_localize('UTC')

Thanks again for the help!

Best Regards
Fabian

Michael Wills

unread,
Dec 18, 2012, 1:10:45 AM12/18/12
to Thomas Wiecki, Fabian Braennstroem, zipline
I'll have to check if that works for my data set. I didn't think it would since there is no way it can know it's EST when the data is loaded.
Reply all
Reply to author
Forward
0 new messages