Discrepancy between yahoo bundles and data from yahoo

81 views
Skip to first unread message

Dave Gilbert

unread,
Nov 30, 2016, 4:46:02 AM11/30/16
to Zipline Python Opensource Backtester
I have been using yahoo bundles for fund data and, by chance found that there seems to be a discrepancy between bundle data and data obtained directly from yahoo.

To reproduce the result:

1. Get bundle data using

def initialize (context) :
    
    context.securities = symbols('FAGIX')
    context.lookback = 1 * 252 + 1
    
    print ('initializing')
    
def before_trading_start (context, data) :
    data = data.history(context.securities, 'price', context.lookback, '1d')
    data.to_pickle("data.pkl")

capital_base = 10000
start = datetime(2009, 12,31, 0, 0, 0, 0, pytz.utc)
end = datetime(2011, 1, 1, 0, 0, 0, 0, pytz.utc)    
result = run_algorithm(start = start, end = end, initialize=initialize,\
                before_trading_start = before_trading_start, \
                capital_base=capital_base, \
                bundle = 'custom_fund_bundle')

prices1 = pd.read_pickle("data.pkl")

2. Get Yahoo data using

prices2 = pd.read_csv('http://chart.finance.yahoo.com/table.csv?s=FAGIX&a=11&b=31&c=2009&d=0&e=1&f=2012&g=d&ignore=.csv',parse_dates=True, index_col='Date').sort_index(ascending=True)['Adj Close']

3. Calculate monthly returns

prices = pd.DataFrame(index=prices1.index)
prices['bundle'] = prices1.values
prices['yahoo'] = prices2.values[:len(prices1)]

monthly_returns = prices.resample('M', how='last').pct_change() * 100

The results:
























I would have expected the results to be the same (or very close) which they clearly aren't.
The worrying thing is that I have found that for some funds, they ARE the same, while for others, like the example above, they differ considerably.

Can anyone shed some light on this?

Ed Bartosh

unread,
Dec 4, 2016, 12:02:30 PM12/4/16
to Dave Gilbert, Zipline Python Opensource Backtester
Hi Dave,

It looks like the data comes to the algo is not adjusted for dividends and splits.
I sketched simple algo to show price data coming to the algo, 'Close' and 'Adj Close' from yahoo.finance:

import pandas as pd

from zipline.api import symbol, get_datetime

URL = 'http://chart.finance.yahoo.com/table.csv?s=FAGIX&a=11&b=31&c=2009&d=1&e=1&f=2012&g=d&ignore=.csv'

def initialize(context):

    context.stock = symbol('FAGIX')

    context.prices_yahoo = pd.read_csv(URL, index_col='Date')[['Close', 'Adj Close']].to_dict()


def handle_data(context, data):

    date = str(get_datetime().date())

    print date, data.current(context.stock, 'price'), context.prices_yahoo['Close'][date], context.prices_yahoo['Adj Close'][date]


Here is the output:

2009-12-31 8.61 8.62 5.718514

2010-01-04 8.62 8.69 5.764951

2010-01-05 8.69 8.76 5.81139

2010-01-06 8.76 8.8 5.837926

2010-01-07 8.8 8.83 5.857828

2010-01-08 8.83 8.85 5.871096

2010-01-11 8.85 8.85 5.871096

2010-01-12 8.85 8.82 5.851194

2010-01-13 8.82 8.83 5.857828

2010-01-14 8.83 8.83 5.857828

2010-01-15 8.83 8.81 5.84456

2010-01-19 8.81 8.81 5.84456

2010-01-20 8.81 8.79 5.831292

2010-01-21 8.79 8.74 5.798122

2010-01-22 8.74 8.67 5.751684

2010-01-25 8.67 8.64 5.731782

2010-01-26 8.64 8.63 5.725148

2010-01-27 8.63 8.62 5.718514

2010-01-28 8.62 8.61 5.711879

2010-01-29 8.61 8.58 5.722551

...

Note that yahoo price data is shifted for one day.

My guess is you don't see this difference for other stocks because that don't pay dividends.

Regards,

Ed



--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
BR,
Ed

Dave Gilbert

unread,
Dec 5, 2016, 4:38:44 AM12/5/16
to Zipline Python Opensource Backtester
OK, I get it, thanks.

but this is inconsistent with Quantopian data, isn't it? Shouldn't 'price' include adjustments for splits and dividends?
Also, if I get 'splits' and 'dividends' via the bundle, they all appear to be nan.
So how does one adjust for splits and dividends?

Regards,
Dave

Ed Bartosh

unread,
Dec 5, 2016, 11:44:29 AM12/5/16
to Dave Gilbert, Zipline Python Opensource Backtester
Hi Dave,

but this is inconsistent with Quantopian data, isn't it? Shouldn't 'price' include adjustments for splits and dividends?
I thought it should. Now I'm not sure.

Also, if I get 'splits' and 'dividends' via the bundle, they all appear to be nan.
How do you 'get' them? I can see dividends in the bundle:

$ sqlite3 ~/.zipline/data/fagix_bundle/2016-12-04T15\;28\;45.236282/adjustments.sqlite 'select * from dividends' |head -5

0|1262217600|0.98906976744186|0

1|1264723200|0.994663573085847|0

2|1267142400|0.99546511627907|0

3|1269993600|0.995067264573991|0

4|1272585600|0.99549945115258|0

So how does one adjust for splits and dividends?

Just a wild guess. Could it be that the price data is adjusted only if accessed through Pipline API?

Regards,

Ed



--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
BR,
Ed

Ed Bartosh

unread,
Dec 6, 2016, 10:46:52 AM12/6/16
to Dave Gilbert, Zipline Python Opensource Backtester
Hi Dave,

data.current returns nan for anything it doesn't know about.
you can run data.current(fund, ['bla1', 'bla2']) with the same result.

2016-12-06 14:39 GMT+00:00 Dave Gilbert <scub...@gmail.com>:
I used data.current(fund, ['splits', 'dividends'])

Rgds,
Dave



--
BR,
Ed
Reply all
Reply to author
Forward
0 new messages