Zipline with OHCLV data

668 views
Skip to first unread message

HG

unread,
Nov 19, 2013, 4:53:47 PM11/19/13
to zip...@googlegroups.com
Hi everyone,
I'm trying to understand how zipline works. It seems there isn't that much documentation (I'm following the one on readthedocs.com)

Can the backtester work with OHCL data ? As far as I saw in the examples and on some SO answers, the dataframes are used like this :

data['mysymbol'] = array 

where array is either some 'close' values, or 'open' ones, whichever you want. 
But can we work with 5 arrays : OHCL +V ?


For example, when we load some data from yahoo (taken from the doc)

In [83]:
from zipline.utils.factory import load_from_yahoo
start = datetime(1990, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(1991, 1, 1, 0, 0, 0, 0, pytz.utc)
data = load_from_yahoo(stocks=['AAPL'], indexes={}, start=start,
                           end=end)
data
AAPL
Out[83]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 1990-01-02 00:00:00+00:00 to 1990-12-31 00:00:00+00:00
Data columns:
AAPL    253  non-null values
dtypes: float64(1)

There is only one data col, namely 'AAPL' and I guess it contains the 'close' price ?

Thanks !

Thomas Wiecki

unread,
Nov 21, 2013, 10:54:55 AM11/21/13
to zipline
Hi,

Yes, see load_bars_from_yahoo() which returns OHLC which you can pass into the .run() method as well. data['price'] will then be closing but there's also gonna be 'open' etc.


Thomas

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bill Li

unread,
Aug 4, 2014, 10:18:35 PM8/4/14
to zip...@googlegroups.com
Hi,
 
I'm new to zipline and I have a similar question.  Rather than loading data from Yahoo(blocked by my firm) I created a data = OrderedDict() also with OHLC and a 'Price' field.  Snippet of my code:
index - an array of datetime objects
columns = ['high','low','open','close','price']
stock -
data[stock] = pd.DataFrame(record,index=index,columns=columns)
 
Then I tried to run the data object through the moving average and pair trade sample codes and got the following errors on both cases:
    results = dma.run(data)
  File "C:\Python34\lib\site-packages\zipline-0.7.0-py3.4.egg\zipline\algorithm.
py", line 387, in run
    all_sids = [sid for s in self.sources for sid in s.sids]
  File "C:\Python34\lib\site-packages\zipline-0.7.0-py3.4.egg\zipline\algorithm.
py", line 387, in <listcomp>
    all_sids = [sid for s in self.sources for sid in s.sids]
AttributeError: 'OrderedDict' object has no attribute 'sids'
 
I tried setting a 'sids' attribute to an array of the stocks in the data object but still got the same error.  What is that attribute error referring to?  Please help.  Thanks.

 

Richard Prokopyshen

unread,
Aug 5, 2014, 10:53:09 AM8/5/14
to zip...@googlegroups.com
A work around I use to get non-yahoo data in to zipline is to extract my data from wherever and then format a CSV file and name it per the zipline cache file convention.  Find your cache directory - on my linux workstation it is ~/.zipline/cache.  When data/loader.py runs, it checks for a cache file first.   I have some more implementations details at http://quant.prokopyshen.com

r

Bill Li

unread,
Aug 5, 2014, 9:06:55 PM8/5/14
to zip...@googlegroups.com
Thank you Richard.  That is useful.
 

Carl Wells

unread,
Sep 11, 2015, 7:15:20 AM9/11/15
to Zipline Python Opensource Backtester
I don't have a direct answer to your question.  However, I have just got the dual moving average example to work with my data from my database (not yahoo).  I would suggest that you have to get your data into exactly the same format that zipline requires.  In your case, you are missing the volume column.  Whilst I have not tried this, I suspect that the addition of a volume column containing NA (or all zeros) would solve your problem.  Note also that the columns have to be in exactly the right order, and likely with exactly the same column names given in the zipline tutorial examples.  I imagine you have got this to work by now, but thought that I would comment in case another new user happens upon this thread.  Essentially, you need to convert your stock data into a data frame for each stock, manipulate your columns into the correct format (ohlcvp, as discussed above, example below), get your date parsed into the correct date format (ISO I believe), and then create a panel of your data frames.

                            open   high    low  close   volume   price
Date                                                                  
2002-07-01 00:00:00+00:00  17.71  17.88  17.05  17.06  3688900  260.06
2002-07-02 00:00:00+00:00  17.03  17.15  16.83  16.94  5119199  258.23
2002-07-03 00:00:00+00:00  16.81  17.68  16.75  17.55  3167800  267.53
2002-07-05 00:00:00+00:00  17.71  18.75  17.71  18.74  2600400  285.67
2002-07-08 00:00:00+00:00  18.52  18.61  17.68  18.01  3404000  274.54

Note that you will want to adjust your open, high, low, and close prices.  This is done by load_bars_from_yahoo when using yahoo data (see https://github.com/quantopian/zipline/blob/master/zipline/data/loader.py line 315 onwards).  Essentially, the adjusted close (which we have renamed to 'price' above) adjusts the price for dividends, splits etc., without which the correct stock returns (and hence backtesting) cannot be calculated.  Some algorithms (e.g. risk management involving stop) operate on the open/high/low, and so its important to do this should you continue to use your data source going forwards.  

Note that zipline/quantopian uses a trading calendar, which means that it adjusts for public holidays.  Outside the US, holidays are different.  Therefore, a different library needs to be used, or the code must be modified to stop the US library being used.  For example, if backtesting European data, you might have a significant stock movement on a day that is a US holiday, and thus it will not be factored into zipline's backtest.  This is an issue I only recently became aware of and need to look into further myself.

Snippets of code to do some of the things above are pasted here:

df.rename(columns={'adjclose': 'price'}, inplace=True)

df.drop('mycode', axis=1, inplace=True)


df['Date'] = pd.to_datetime(df['Date'])

df= df.set_index('Date')

df.index = df.index.tz_localize(pytz.UTC) #Quantopian/zipline's dates include the timezone data.  I need to look into this more, but UTC is the default.  This may need to be changed to be suitable for the correct jurisdiction.


#Get dataframe into correct order

df = df[['open', 'high', 'low', 'close', 'volume','price']]


#create panel

dataDict = OrderedDict()

dataDict['AAPL'] = df

 

data = pd.Panel(dataDict)

#set minor axis (major is already set with date index) 

data.minor_axis = ['open', 'high', 'low', 'close', 'price', 'volume']

Reply all
Reply to author
Forward
0 new messages