minutebarwriter and wierd pandas timestamp index key error

339 views
Skip to first unread message

Hugo Koopmans

unread,
Dec 4, 2017, 3:32:59 AM12/4/17
to Zipline Python Opensource Backtester
Hi,

I am trying to create a custom bundle from csv.

I use the code from here:


When I run the bundle i get:

 zipline ingest -b csv
entering kraken
.  tuSymbols= ('XETHZEUR',)
about to
return ingest function
entering ingest
and creating blank dfMetadata
dfMetadata
<class 'pandas.core.frame.DataFrame'>
<bound method NDFrame.describe of   start_date   end_date auto_close_date symbol
0 1970-01-01 1970-01-01      1970-01-01   None>


S
= XETHZEUR IFIL= /home/hugo/workspace-jupyter/kraken-api/XETHZEUR20171121.csv
read_csv dfData
<class 'pandas.core.frame.DataFrame'> length 839


start_date
<class 'pandas.tslib.Timestamp'> 2017-11-21 09:42:00 None
end_date
<class 'pandas.tslib.Timestamp'> 2017-11-21 23:40:00 None
ac_date
<class 'pandas.tslib.Timestamp'> 2017-11-22 23:40:00 None
liData
<class 'list'> length 1
Now calling minute_bar_writer
Traceback (most recent call last):
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/pandas/indexes/base.py", line 1945, in get_loc
   
return self._engine.get_loc(key)
 
File "pandas/index.pyx", line 538, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11140)
 
File "pandas/index.pyx", line 558, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10701)
KeyError: Timestamp('2017-11-21 23:40:00+0000', tz='UTC')


During handling of the above exception, another exception occurred:


Traceback (most recent call last):
 
File "/home/hugo/anaconda3/envs/krakenex35/bin/zipline", line 11, in <module>
    load_entry_point
('zipline==1.1.1', 'console_scripts', 'zipline')()
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 722, in __call__
   
return self.main(*args, **kwargs)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv
= self.invoke(ctx)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
   
return _process_result(sub_ctx.command.invoke(sub_ctx))
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 895, in invoke
   
return ctx.invoke(self.callback, **ctx.params)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 535, in invoke
   
return callback(*args, **kwargs)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/__main__.py", line 312, in ingest
    show_progress
,
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/bundles/core.py", line 451, in ingest
    pth
.data_path([name, timestr], environ=environ),
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/bundles/viacsv.py", line 115, in ingest
    minute_bar_writer
.write(liData, show_progress=False)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/minute_bars.py", line 697, in write
    write_sid
(*e, invalid_data_behavior=invalid_data_behavior)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/minute_bars.py", line 730, in write_sid
   
self._write_cols(sid, dts, cols, invalid_data_behavior)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/minute_bars.py", line 810, in _write_cols
    latest_min_count
= all_minutes.get_loc(last_minute_to_write)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/pandas/tseries/index.py", line 1422, in get_loc
   
return Index.get_loc(self, key, method, tolerance)
 
File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
   
return self._engine.get_loc(self._maybe_cast_indexer(key))
 
File "pandas/index.pyx", line 538, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11140)
 
File "pandas/index.pyx", line 558, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10701)
KeyError: Timestamp('2017-11-21 23:40:00+0000', tz='UTC')


Seems like a pandas timeseries index issue?

Any suggestions?

thx

hugo

Hugo Koopmans

unread,
Dec 4, 2017, 3:35:30 AM12/4/17
to Zipline Python Opensource Backtester
btw : it is always the LAST line in the csv file it complains about...

Op maandag 4 december 2017 09:32:59 UTC+1 schreef Hugo Koopmans:

Richard P

unread,
Dec 4, 2017, 7:48:15 AM12/4/17
to zip...@googlegroups.com
Hugo -

The technique in that post works only for minute bar data that conforms to the NYSE trading calendar.

There are some other posts in this group that describe how to use bundles with different trading calendars.

You will need to find one (or make your own) to match the data you have.

Richard
--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kaveh Vakili

unread,
Dec 4, 2017, 4:20:00 PM12/4/17
to Zipline Python Opensource Backtester

Hugo,

I'm learning too so maybe I'm wrong on this. But I just had exactly the same error.
The solution was pretty simple.

Look at the line (in the code you lined to):

dfData.index.tz_localize('US/Eastern')


Did you modify it to adapt it to the timezone of your data?


Kind regards,

Hugo Koopmans

unread,
Dec 30, 2017, 9:32:05 AM12/30/17
to Zipline Python Opensource Backtester
Yep, the data is from kraken so in UTC.

Now it works!

To get it to work:
1) downgrade to py35
2) use the POLONIEX calendar in the extentions def likeso :

from zipline.data.bundles.viacsv import viacsv

eqSym = {
    "XETHZEUR"
}

register(
    'csv',    # name this whatever you like
    viacsv(eqSym),
    calendar_name='POLONIEX',
    minutes_per_day=24*60
)

3) the bundle code has to go in a wierd place in my opinion. 
it has to go into the virtual environment that is created by anaconda.
for me that is  : /home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/bundles

viacsv.py looks like:

#
# Ingest stock csv files to create a zipline data bundle
#

import os

import numpy  as np
import pandas as pd
import datetime
from pytz import timezone

boDebug=True # Set True to get trace messages

from zipline.utils.cli import maybe_show_progress

def viacsv(symbols,start=None,end=None):

    # strict this in memory so that we can reiterate over it.
    # (Because it could be a generator and they live only once)
    tuSymbols = tuple(symbols)

    if boDebug:
        print("entering kraken.  tuSymbols=",tuSymbols)

    # Define our custom ingest function
    def ingest(environ,
               asset_db_writer,
               minute_bar_writer,
               daily_bar_writer,
               adjustment_writer,
               calendar,
               cache,
               show_progress,
               output_dir,
               # pass these as defaults to make them 'nonlocal' in py2
               start=start,
               end=end):

        if boDebug:
            print("entering ingest and creating blank dfMetadata")

        dfMetadata = pd.DataFrame(np.empty(len(tuSymbols), dtype=[
            ('start_date', 'datetime64[ns]'),
            ('end_date', 'datetime64[ns]'),
            ('auto_close_date', 'datetime64[ns]'),
            ('symbol', 'object'),
        ]))

        if boDebug:
            print("dfMetadata",type(dfMetadata))
            print(dfMetadata.describe)
            print()

        # We need to feed something that is iterable - like a list or a generator -
        # that is a tuple with an integer for sid and a DataFrame for the data to
        # daily_bar_writer

        liData=[]
        iSid=0
        for S in tuSymbols:
            # IFIL="~/machina_contest/machina_mini.csv"
            IFIL="/home/hugo/workspace-jupyter/kraken-api/XETHZEUR.csv"
            if boDebug:
               print("S=",S,"IFIL=",IFIL)
            dfData=pd.read_csv(IFIL,index_col='time',parse_dates=True).sort_index() # hk
            # csv time stamp is in UTC, but pandas doesnt know that yet so tell it
            dfData.index=dfData.index.tz_localize('utc')
            # zipline needs data in UTC format so lets convert it
            dfData.index=dfData.index.tz_convert('UTC')
            # But, zipline ingest function wants data in Naive date format
            # so remove the tzinfo.
            dfData.index=dfData.index.tz_convert(None)
            if boDebug:
               print("read_csv dfData",type(dfData),"length",len(dfData))
               dfData.index[0]
               print()
            # time,open,high,low,close,vwap,volume,count
            dfData.rename(
                columns={
                    'open': 'open',
                    'high': 'high',
                    'low': 'low',
                    'close': 'close',
                    'volume': 'volume',
                },
                inplace=True,
            )
            liData.append((iSid,dfData))

            # the start date is the date of the first trade and
            start_date = dfData.index[0]
            if boDebug:
                print("start_date",type(start_date),start_date,start_date.tzinfo)

            # the end date is the date of the last trade
            end_date = dfData.index[-1]
            if boDebug:
                print("end_date",type(end_date),end_date,end_date.tzinfo)

            # The auto_close date is the day after the last trade.
            ac_date = end_date + pd.Timedelta(days=1)
            if boDebug:
                print("ac_date",type(ac_date),ac_date,end_date.tzinfo)

            # Update our meta data
            dfMetadata.iloc[iSid] = start_date, end_date, ac_date, S

            iSid += 1

        if boDebug:
            print("liData",type(liData),"length",len(liData))
            print("Now calling minute_bar_writer")

        # daily_bar_writer.write(liData, show_progress=False)
        minute_bar_writer.write(liData, show_progress=False)

        # Hardcode the exchange to "YAHOO" for all assets and (elsewhere)
        # register "YAHOO" to resolve to the NYSE calendar, because these are
        # all equities and thus can use the NYSE calendar.
        dfMetadata['exchange'] = "POLONIEX"

        if boDebug:
            print("returned from minute_bar_writer")
            print("calling asset_db_writer")
            print("dfMetadata",type(dfMetadata))
            print(dfMetadata)
            print()

        # Not sure why symbol_map is needed
        symbol_map = pd.Series(dfMetadata.symbol.index, dfMetadata.symbol)
        if boDebug:
            print("symbol_map",type(symbol_map))
            print(symbol_map)
            print()

        asset_db_writer.write(equities=dfMetadata)

        if boDebug:
            print("returned from asset_db_writer")
            print("calling adjustment_writer")

        adjustment_writer.write()

        if boDebug:
            print("returned from adjustment_writer")
            print("now leaving ingest function")


    if boDebug:
       print("about to return ingest function")
    return ingest


Op maandag 4 december 2017 22:20:00 UTC+1 schreef Kaveh Vakili:
Reply all
Reply to author
Forward
0 new messages