zipline ingest bundle error -

186 views
Skip to first unread message

Dario

unread,
Jan 30, 2017, 9:26:27 AM1/30/17
to Zipline Python Opensource Backtester


Hi Everyone,

I'm trying to create a bundle with FTSE MIB stocks downloaded from Yahoo Finance
Here is my extension.py code

Enter code here...from zipline.data.bundles import register, yahoo_equities

# these are the tickers you would like data for
equities = {
    'A2A.MI',
'ANIM.MI',
'ATL.MI',
'AZM.MI',
'BMED.MI',
'BMPS.MI',
'BP.MI',
'BPE.MI',
'BZU.MI',
'CNHI.MI',
'CPR.MI',
'ENEL.MI',
'ENI.MI',
'EXO.MI',
'FBK.MI',
'FCA.MI',
'FNC.MI',
'G.MI',
'ISP.MI',
'IT.MI',
'LUX.MI',
'MB.MI',
'MONC.MI',
'MS.MI',
'PMI.MI',
'PRY.MI',
'PST.MI',
'RACE.MI',
'SFER.MI',
'SPM.MI',
'SRG.MI',
'STM.MI',
'TEN.MI',
'TIT.MI',
'TRN.MI',
'UBI.MI',
'UCG.MI',
'UNI.MI',
'US.MI',
'YNAP.MI',
}
register(
    'ftse-mib-bundle',  # name this whatever you like
    yahoo_equities(
equities, 
),
)

Unfortunately, when I try to ingest the bundle, zipline, after downloading the data, ends with the following error:

Downloading Yahoo adjustment data:   [####################################]  100%
Traceback (most recent call last):
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\Scripts\zipline-scri
pt.py", line 11, in <module>
    load_entry_point('zipline==1.0.2', 'console_scripts', 'zipline')()
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\cl
ick\core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\cl
ick\core.py", line 696, in main
    rv = self.invoke(ctx)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\cl
ick\core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\cl
ick\core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\cl
ick\core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\__main__.py", line 306, in ingest
    show_progress,
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\data\bundles\core.py", line 443, in ingest
    pth.data_path([name, timestr], environ=environ),
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\data\bundles\yahoo.py", line 177, in ingest
    adjustment_writer.write(splits=splits, dividends=dividends)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\data\us_equity_pricing.py", line 1150, in write
    self.write_dividend_data(dividends, stock_dividends)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\data\us_equity_pricing.py", line 1062, in write_dividend_data
    dividend_ratios = self.calc_dividend_ratios(dividends)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\data\us_equity_pricing.py", line 985, in calc_dividend_ratios
    sid, prev_close_date, 'close')
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\data\us_equity_pricing.py", line 707, in get_value
    ix = self.sid_day_index(sid, day)
  File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
pline\data\us_equity_pricing.py", line 679, in sid_day_index
    day, sid))
zipline.data.session_bars.NoDataBeforeDate: No data on or before day=2005-05-06
00:00:00+00:00 for sid=19

I think the error depends on how the function to register a bundle deals with stocks that have no data before a specific date. 

For instance, not all the stocks in the current FTSE MIB index were available back in 2005-05-06

Any suggestion to remove the error?

Best regards
Dario

Dario

unread,
Jan 31, 2017, 6:41:24 AM1/31/17
to Zipline Python Opensource Backtester
I checked  bit the implemenation of yahoo.py data bundle and I found that
Notes
    -----
    The sids for each symbol will be the index into the symbols sequence.
    """

So I reduced a lot the symbols in the list, cutting the elements from 19 included. But now I get the same error for sid 5 which was previously OK. Any clue to solve this issue?
from zipline.data.bundles import register, yahoo_equities

# these are the tickers you would like data for  
equities = {
    'A2A.MI',
'ANIM.MI',
'ATL.MI',
'AZM.MI',
'BMED.MI',
'BMPS.MI',
'BP.MI',
'BPE.MI',
'BZU.MI',
'CNHI.MI',
'CPR.MI',
'ENEL.MI',
'ENI.MI',
'EXO.MI',
'FBK.MI',
'FCA.MI',
'FNC.MI',
'G.MI',
# 'ISP.MI',
# 'IT.MI',
# 'LUX.MI',
# 'MB.MI',
# 'MONC.MI',
# 'MS.MI',
# 'PMI.MI',
# 'PRY.MI',
# 'PST.MI',
# 'RACE.MI',
# 'SFER.MI',
# 'SPM.MI',
# 'SRG.MI',
# 'STM.MI',
# 'TEN.MI',
# 'TIT.MI',
# 'TRN.MI',
# 'UBI.MI',
# 'UCG.MI',
# 'UNI.MI',
# 'US.MI',
# 'YNAP.MI',
}
register(
    'ftse-mib-bundle',  # name this whatever you like
    yahoo_equities(
equities, 
),
)
 File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zi
ine\data\us_equity_pricing.py", line 679, in sid_day_index
  day, sid))
pline.data.session_bars.NoDataBeforeDate: No data on or before day=2005-05-06
:00:00+00:00 for sid=5

Dario

unread,
Jan 31, 2017, 9:07:09 AM1/31/17
to Zipline Python Opensource Backtester
I tried also to limit the range of data to download modifing my extension.py as follows
import pandas as pd

register(
    'ftse-mib-bundle',  # name this whatever you like
    yahoo_equities(
equities, 
pd.Timestamp('2006-01-02', tz='utc'),
                pd.Timestamp('2017-01-02', tz='utc'),
),
)

But I still get the odd error and for a different sid

 File "C:\Users\DALLDA01\AppData\Local\Continuum\Anaconda2\lib\site-packages\zipline\data\us_equity_pricing.py", line 679, in sid_day_index
   day, sid))
zipline.data.session_bars.NoDataBeforeDate: No data on or before day=2005-05-06
0:00:00+00:00 for sid=2

fva...@quantopian.com

unread,
Feb 24, 2017, 5:31:58 PM2/24/17
to Zipline Python Opensource Backtester
Hey Dario,

So there's two things I'll address here: getting errors on various equities, and the NoDataBeforeDate exception.

For getting errors on multiple equities: can you change your list of equities from a set to a tuple? You can do this just by changing the curly braces to parentheses. Tuples are ordered, whereas sets are not ordered. The reason I'm asking this is because I ran your code locally, with the same equities, and although it looked like I was getting an error for multiple equities, I was actually only getting it from CPR.MI

You can verify this by:
  • Running your code that you originally posted and adding a print statement that prints out the equities before ingesting
  • Then you can count up from 0 until the number that corresponds to the correct equity (sid)
  • Write that equity down
  • Then convert your set of equities to a tuple of equities, repeat the first three steps and compare the equities that are raising the error
(there might be a better way to do that, but this is what I did really quick to get my results)

If they are the same, then it's just one equity that gives you the error, which is what I'm thinking is occurring.

In regards to NoDataBeforeDate, we don't yet have something that handles that exception, so the ingest just stops after that exception is raised. 

Peter Harrington

unread,
Feb 28, 2017, 7:05:58 PM2/28/17
to Zipline Python Opensource Backtester
Are you sure Yahoo Finance has data for all of those tickers?  
The Yahoo loader will just fail if you pass it a ticker for which there is no data.  That is what is going on here.  
I have written another script to check the tickers before passing them to yahoo_equities().  

import requests
from time import sleep

fw = open("write_good_tickers.txt", "w")

for line in open("list_of_tickers_to_check.txt").readlines():
    tkr = line.strip()
    print "trying: ",tkr
    try:
        r = requests.get(YAHOO_URL.format(tkr))
        if r.status_code == 200:
            fw.write("{}\n".format(tkr))
        else: print "PROBLEM WITH: ",tkr
        sleep(1)
    except:
        print "PROBLEM WITH: ",tkr 

fva...@quantopian.com

unread,
Mar 10, 2017, 11:50:23 AM3/10/17
to Zipline Python Opensource Backtester
Update on this: so it looks like we have this fixed on master, where the ingest won't just stop if there's missing data; so whenever we do our next release that version of zipline should fix the issue.

If you want to install zipline from master feel free to do that (https://github.com/quantopian/zipline)

Thanks,
Freddie


On Monday, January 30, 2017 at 9:26:27 AM UTC-5, Dario wrote:
Reply all
Reply to author
Forward
0 new messages