minutebarwriter and wierd pandas timestamp index key error

Hugo Koopmans

unread,

Dec 4, 2017, 3:32:59 AM12/4/17

to Zipline Python Opensource Backtester

Hi,

I am trying to create a custom bundle from csv.

I use the code from here:

http://www.prokopyshen.com/Use-minute-bar-data-in-zipline-custom-data-bundle

When I run the bundle i get:

 zipline ingest -b csv
entering kraken.  tuSymbols= ('XETHZEUR',)
about to return ingest function
entering ingest and creating blank dfMetadata
dfMetadata <class 'pandas.core.frame.DataFrame'>
<bound method NDFrame.describe of   start_date   end_date auto_close_date symbol
0 1970-01-01 1970-01-01      1970-01-01   None>


S= XETHZEUR IFIL= /home/hugo/workspace-jupyter/kraken-api/XETHZEUR20171121.csv
read_csv dfData <class 'pandas.core.frame.DataFrame'> length 839


start_date <class 'pandas.tslib.Timestamp'> 2017-11-21 09:42:00 None
end_date <class 'pandas.tslib.Timestamp'> 2017-11-21 23:40:00 None
ac_date <class 'pandas.tslib.Timestamp'> 2017-11-22 23:40:00 None
liData <class 'list'> length 1
Now calling minute_bar_writer
Traceback (most recent call last):
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/pandas/indexes/base.py", line 1945, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 538, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11140)
  File "pandas/index.pyx", line 558, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10701)
KeyError: Timestamp('2017-11-21 23:40:00+0000', tz='UTC')


During handling of the above exception, another exception occurred:


Traceback (most recent call last):
  File "/home/hugo/anaconda3/envs/krakenex35/bin/zipline", line 11, in <module>
    load_entry_point('zipline==1.1.1', 'console_scripts', 'zipline')()
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/__main__.py", line 312, in ingest
    show_progress,
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/bundles/core.py", line 451, in ingest
    pth.data_path([name, timestr], environ=environ),
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/bundles/viacsv.py", line 115, in ingest
    minute_bar_writer.write(liData, show_progress=False)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/minute_bars.py", line 697, in write
    write_sid(*e, invalid_data_behavior=invalid_data_behavior)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/minute_bars.py", line 730, in write_sid
    self._write_cols(sid, dts, cols, invalid_data_behavior)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/minute_bars.py", line 810, in _write_cols
    latest_min_count = all_minutes.get_loc(last_minute_to_write)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/pandas/tseries/index.py", line 1422, in get_loc
    return Index.get_loc(self, key, method, tolerance)
  File "/home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 538, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11140)
  File "pandas/index.pyx", line 558, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10701)
KeyError: Timestamp('2017-11-21 23:40:00+0000', tz='UTC')

Seems like a pandas timeseries index issue?

Any suggestions?

thx

hugo

Hugo Koopmans

unread,

Dec 4, 2017, 3:35:30 AM12/4/17

to Zipline Python Opensource Backtester

btw : it is always the LAST line in the csv file it complains about...

Op maandag 4 december 2017 09:32:59 UTC+1 schreef Hugo Koopmans:

Richard P

unread,

Dec 4, 2017, 7:48:15 AM12/4/17

to zip...@googlegroups.com

Hugo -

The technique in that post works only for minute bar data that conforms to the NYSE trading calendar.

There are some other posts in this group that describe how to use bundles with different trading calendars.

You will need to find one (or make your own) to match the data you have.

Richard

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kaveh Vakili

unread,

Dec 4, 2017, 4:20:00 PM12/4/17

to Zipline Python Opensource Backtester

Hugo,

I'm learning too so maybe I'm wrong on this. But I just had exactly the same error.
The solution was pretty simple.

Look at the line (in the code you lined to):

dfData.index.tz_localize('US/Eastern')

Did you modify it to adapt it to the timezone of your data?

Kind regards,

Hugo Koopmans

unread,

Dec 30, 2017, 9:32:05 AM12/30/17

to Zipline Python Opensource Backtester

Yep, the data is from kraken so in UTC.

Now it works!

To get it to work:

1) downgrade to py35

2) use the POLONIEX calendar in the extentions def likeso :

from zipline.data.bundles.viacsv import viacsv

eqSym = {

"XETHZEUR"

}

register(

'csv', # name this whatever you like

viacsv(eqSym),

calendar_name='POLONIEX',

minutes_per_day=24*60

)

3) the bundle code has to go in a wierd place in my opinion.

it has to go into the virtual environment that is created by anaconda.

for me that is : /home/hugo/anaconda3/envs/krakenex35/lib/python3.5/site-packages/zipline/data/bundles

viacsv.py looks like:

#

# Ingest stock csv files to create a zipline data bundle

#

import os

import numpy as np

import pandas as pd

import datetime

from pytz import timezone

boDebug=True # Set True to get trace messages

from zipline.utils.cli import maybe_show_progress

def viacsv(symbols,start=None,end=None):

# strict this in memory so that we can reiterate over it.

# (Because it could be a generator and they live only once)

tuSymbols = tuple(symbols)

if boDebug:

print("entering kraken. tuSymbols=",tuSymbols)

# Define our custom ingest function

def ingest(environ,

asset_db_writer,

minute_bar_writer,

daily_bar_writer,

adjustment_writer,

calendar,

cache,

show_progress,

output_dir,

# pass these as defaults to make them 'nonlocal' in py2

start=start,

end=end):

if boDebug:

print("entering ingest and creating blank dfMetadata")

dfMetadata = pd.DataFrame(np.empty(len(tuSymbols), dtype=[

('start_date', 'datetime64[ns]'),

('end_date', 'datetime64[ns]'),

('auto_close_date', 'datetime64[ns]'),

('symbol', 'object'),

]))

if boDebug:

print("dfMetadata",type(dfMetadata))

print(dfMetadata.describe)

print()

# We need to feed something that is iterable - like a list or a generator -

# that is a tuple with an integer for sid and a DataFrame for the data to

# daily_bar_writer

liData=[]

iSid=0

for S in tuSymbols:

# IFIL="~/machina_contest/machina_mini.csv"

IFIL="/home/hugo/workspace-jupyter/kraken-api/XETHZEUR.csv"

if boDebug:

print("S=",S,"IFIL=",IFIL)

dfData=pd.read_csv(IFIL,index_col='time',parse_dates=True).sort_index() # hk

# csv time stamp is in UTC, but pandas doesnt know that yet so tell it

dfData.index=dfData.index.tz_localize('utc')

# zipline needs data in UTC format so lets convert it

dfData.index=dfData.index.tz_convert('UTC')

# But, zipline ingest function wants data in Naive date format

# so remove the tzinfo.

dfData.index=dfData.index.tz_convert(None)

if boDebug:

print("read_csv dfData",type(dfData),"length",len(dfData))

dfData.index[0]

print()

# time,open,high,low,close,vwap,volume,count

dfData.rename(

columns={

'open': 'open',

'high': 'high',

'low': 'low',

'close': 'close',

'volume': 'volume',

},

inplace=True,

)

liData.append((iSid,dfData))

# the start date is the date of the first trade and

start_date = dfData.index[0]

if boDebug:

print("start_date",type(start_date),start_date,start_date.tzinfo)

# the end date is the date of the last trade

end_date = dfData.index[-1]

if boDebug:

print("end_date",type(end_date),end_date,end_date.tzinfo)

# The auto_close date is the day after the last trade.

ac_date = end_date + pd.Timedelta(days=1)

if boDebug:

print("ac_date",type(ac_date),ac_date,end_date.tzinfo)

# Update our meta data

dfMetadata.iloc[iSid] = start_date, end_date, ac_date, S

iSid += 1

if boDebug:

print("liData",type(liData),"length",len(liData))

print("Now calling minute_bar_writer")

# daily_bar_writer.write(liData, show_progress=False)

minute_bar_writer.write(liData, show_progress=False)

# Hardcode the exchange to "YAHOO" for all assets and (elsewhere)

# register "YAHOO" to resolve to the NYSE calendar, because these are

# all equities and thus can use the NYSE calendar.

dfMetadata['exchange'] = "POLONIEX"

if boDebug:

print("returned from minute_bar_writer")

print("calling asset_db_writer")

print("dfMetadata",type(dfMetadata))

print(dfMetadata)

print()

# Not sure why symbol_map is needed

symbol_map = pd.Series(dfMetadata.symbol.index, dfMetadata.symbol)

if boDebug:

print("symbol_map",type(symbol_map))

print(symbol_map)

print()

asset_db_writer.write(equities=dfMetadata)

if boDebug:

print("returned from asset_db_writer")

print("calling adjustment_writer")

adjustment_writer.write()

if boDebug:

print("returned from adjustment_writer")

print("now leaving ingest function")

if boDebug:

print("about to return ingest function")

return ingest

Op maandag 4 december 2017 22:20:00 UTC+1 schreef Kaveh Vakili:

Reply all

Reply to author

Forward