Re: Running custom algorithm with custom data

1,070 views
Skip to first unread message
Message has been deleted

Andreas Clenow

unread,
Sep 14, 2018, 4:04:23 AM9/14/18
to Zipline Python Opensource Backtester
They are phasing out support for supplying a panel directly. You should make a bundle instead. If your data is in CSV format, use the csv bundle that's included in the distribution.



On Thursday, September 13, 2018 at 6:07:04 PM UTC+2, Mosfiqur Rahman wrote:
Hi, 
    I'm new to the zipline. I was try to run this custom algorithm of paris-trading using my own data from a local csv. Here's the format:

input: 
  
import pandas as pd
import pytz
from collections import OrderedDict

data = OrderedDict()
data['A'] = pd.read_csv('ohlcfetch.csv', index_col=1, parse_dates=['date'])
data['A'].fillna(method="ffill", inplace=True)
print data['A'].head()



Output:

           symbol   open    high    low  close   volume
date                                                   
2017-09-01      A  64.90  65.180  64.21  64.38  1224667
2017-09-05      A  64.02  64.480  63.81  64.29   910613
2017-09-06      A  64.56  64.810  64.05  64.71   974511
2017-09-07      A  64.85  65.245  64.49  65.14  1075757
2017-09-08      A  65.15  65.680  64.83  65.02  1588439

Link to the algoritmhttps://github.com/bartchr808/Quantopian_Pairs_Trader/blob/master/algo.py


I was trying something like this but couldn't figure it out after importing the data:

perf = zipline.run_algorithm(start=pd.to_datetime('2017-09-01').tz_localize(pytz.utc),
                      end=pd.to_datetime('2017-10-01').tz_localize(pytz.utc),
                      initialize=initialize,
                      capital_base=100000,
                      handle_data=my_handle_data,
                      data=panel)

Can someone please help me how I should run it now?
Message has been deleted
Message has been deleted

Mosfiqur Rahman

unread,
Sep 14, 2018, 4:34:39 AM9/14/18
to Zipline Python Opensource Backtester


Did you mean ingesting data from csv files like the following?
I have added this code in my ~/.zipline/extension.py:

import pandas as pd

from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities

start_session = pd.Timestamp('2017-9-1', tz='utc')
end_session = pd.Timestamp('2018-9-1', tz='utc')

register(
    'custom-csvdir-bundle',
    csvdir_equities(
        ['daily'],
        '/home/mosfiqur/Documents/csvdir',
    ),
    calendar_name='NYSE', # US equities
    start_session=start_session,
    end_session=end_session
)

Then when I ran $ zipline ingest -b custom-csvdir-bundle, it didn't work at all. Is it because of the way the data is formatted in my csv or the timing?

csv data:

date       symbol   open    high    low  close   volume
                                                   
2017-09-01      A  64.90  65.180  64.21  64.38  1224667
2017-09-05      A  64.02  64.480  63.81  64.29   910613
2017-09-06      A  64.56  64.810  64.05  64.71   974511
2017-09-07      A  64.85  65.245  64.49  65.14  1075757
2017-09-08      A  65.15  65.680  64.83  65.02  1588439

can you please point me to something that's possibly going wrong? 




Ed Bartosh

unread,
Sep 14, 2018, 5:45:45 AM9/14/18
to where.is....@gmail.com, Zipline Python Opensource Backtester
Hi,

> Then when I ran $ zipline ingest -b custom-csvdir-bundle, it didn't work at all. Is it because of the way the data is formatted in my csv or the timing?

Where did you put your csv files and how did you name them?
This documentation may help you to proceed further: http://www.zipline.io/bundles.html#ingesting-data-from-csv-files
Ther you can see expected csv format.

I hope that helps.

Regards,
Ed

пт, 14 сент. 2018 г. в 11:34, Mosfiqur Rahman <where.is....@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
BR,
Ed

Mosfiqur Rahman

unread,
Sep 14, 2018, 6:43:36 AM9/14/18
to Zipline Python Opensource Backtester
Hi, I have followed the same document you have just mentioned. I have put my csv files in the daily folder inside csvdir. I mean inside csvdir/daily/ <- here. Now, the only difference for my csv from the sample ones is, I don't have a dividend and split column, is that an issue?

I'm getting the following error when I'm trying to ingest.

osfiqur@xps:~$ zipline ingest -b custom-csvdir-bundle
Loading custom pricing data:   [####################################]  100% | a: sid 0


Traceback (most recent call last):
  File "/usr/local/bin/zipline", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/zipline/__main__.py", line 348, in ingest
    show_progress,
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/bundles/core.py", line 451, in ingest
    pth.data_path([name, timestr], environ=environ),
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/bundles/csvdir.py", line 94, in ingest
    self.csvdir)
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/bundles/csvdir.py", line 156, in csvdir_bundle
    show_progress=show_progress)
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/us_equity_pricing.py", line 257, in write
    return self._write_internal(it, assets)
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/us_equity_pricing.py", line 319, in _write_internal
    for asset_id, table in iterator:
  File "/usr/local/lib/python2.7/dist-packages/click/_termui_impl.py", line 259, in next
    rv = next(self.iter)
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/us_equity_pricing.py", line 248, in <genexpr>
    (sid, self.to_ctable(df, invalid_data_behavior))
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/bundles/csvdir.py", line 193, in _pricing_iter
    ac_date = end_date + Timedelta(days=1)
TypeError: cannot concatenate 'str' and 'Timedelta' objects

 
I still can't figure it out.  

Ed Bartosh

unread,
Sep 14, 2018, 7:17:11 AM9/14/18
to where.is....@gmail.com, Zipline Python Opensource Backtester
> the only difference for my csv from the sample ones is, I don't have a dividend and split column, is that an issue?

It could be an issue, but you didn't come to it yet :)

Where did you put your csv files and how did you name them?
Can you show your csv file? I doubt it has the same structure as the examples.

Regards,
Ed

пт, 14 сент. 2018 г. в 13:43, Mosfiqur Rahman <where.is....@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
BR,
Ed

Mosfiqur Rahman

unread,
Sep 14, 2018, 7:26:35 AM9/14/18
to Zipline Python Opensource Backtester
I'm attaching the csv.
 Now, how should I use this specific csv file while testing my algorithm using zipline? and what if I want to ingest multiple csv for multiple symbols and use them all together?
a.csv

Ed Bartosh

unread,
Sep 14, 2018, 7:44:05 AM9/14/18
to Mosfiqur Rahman, Zipline Python Opensource Backtester
> Now, how should I use this specific csv file while testing my algorithm using zipline?
You should copy it to /home/mosfiqur/Documents/csvdir/daily/A.csv and run zipline ingest -b custom-csvdir-bundle

I'll investigate later why it doesn't work for you. The file looks ok to me.

> What if I want to ingest multiple csv for multiple symbols and use them all together?
You should put them into /home/mosfiqur/Documents/csvdir/daily/<symbol>.csv one file per symbol and run zipline ingest -b custom-csvdir-bundle

Regards,
Ed

пт, 14 сент. 2018 г. в 14:26, Mosfiqur Rahman <where.is....@gmail.com>:
I'm attaching the csv.
 Now, how should I use this specific csv file while testing my algorithm using zipline? and what if I want to ingest multiple csv for multiple symbols and use them all together?

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
BR,
Ed

Mosfiqur Rahman

unread,
Sep 14, 2018, 7:46:53 AM9/14/18
to Zipline Python Opensource Backtester
I get the part that I'll ingest one file per symbol. The part that I didn't get it is: how should I import the data while using zipline and how will zipline know which csv is for which symbol? Can you please give me an example?

Ed Bartosh

unread,
Sep 14, 2018, 7:52:06 AM9/14/18
to Mosfiqur Rahman, Zipline Python Opensource Backtester
> how should I import the data while using zipline and how will zipline know which csv is for which symbol? Can you please give me an example?

It contains data for 4 symbols: AAPL, IBM, KO, MSFT

Can you try to put those files into  /home/mosfiqur/Documents/csvdir/daily/ and run zipline ingest -b custom-csvdir-bundle ?

Regards,
Ed

пт, 14 сент. 2018 г. в 14:46, Mosfiqur Rahman <where.is....@gmail.com>:
I get the part that I'll ingest one file per symbol. The part that I didn't get it is: how should I import the data while using zipline and how will zipline know which csv is for which symbol? Can you please give me an example?

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
BR,
Ed

Mosfiqur Rahman

unread,
Sep 14, 2018, 8:53:31 AM9/14/18
to Zipline Python Opensource Backtester
I get that. here's what I have done so fat. 
Step 1: The name of the CSV file. A.csv
Step 2: Then I ran zipline ingest -b custom-csvdir-bundle
Step 3: Here's my code in Jupiter notebook

from zipline.api import order, record, symbol
import zipline
import pandas as pd
import pytz
from datetime import datetime
import zipline
import pytz
from datetime import datetime

def initialize(context):
    pass


def handle_data(context, data):
    order(symbol('A'), 10)
    record(A=data.current(symbol('A'), 'price'))

zipline.run_algorithm(start=pd.to_datetime('2017-09-01').tz_localize(pytz.utc),
                      end=pd.to_datetime('2018-09-01').tz_localize(pytz.utc),
                      initialize=initialize,
                      capital_base=100000,
                      handle_data=handle_data)

So, my question was how I should import it now in the notebook. 

Mosfiqur Rahman

unread,
Sep 14, 2018, 10:46:32 AM9/14/18
to Zipline Python Opensource Backtester
Hi, 
I have pretty much got it working. Now, I'm getting the following error. I have checked the date in the data, and I'm not sure what's causing this.

HistoryWindowStartsBeforeData: History window extends before 2017-09-01. To use this history window, start the backtest on or after 2018-11-09.

I'm attaching the code.
pairs_trading_algo.html

Ed Bartosh

unread,
Sep 14, 2018, 1:32:53 PM9/14/18
to Mosfiqur Rahman, Zipline Python Opensource Backtester
You're trying to request historical data on the first bar of available data. You need to skip first 300 bars to be able to do that.

пт, 14 сент. 2018 г. в 17:46, Mosfiqur Rahman <where.is....@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
BR,
Ed

Mosfiqur Rahman

unread,
Sep 14, 2018, 1:34:12 PM9/14/18
to Zipline Python Opensource Backtester
Thanks a lot. It's working now.

Mosfiqur Rahman

unread,
Sep 14, 2018, 3:05:35 PM9/14/18
to Zipline Python Opensource Backtester
Somehow, when I change the csv files it's showing an index out of bound error. I tried with other csv files. same issue as well. Not sure why. You mentioned earlier about skipping the bars, I have resolved that but is it related to that fix?
pairs_trading_algo.html
BP.csv
MRO.csv

Suraj Thorat

unread,
Oct 16, 2019, 7:34:48 AM10/16/19
to Zipline Python Opensource Backtester
I have the same issue. Please tell me if you resolved it.

Truong Pham Manh

unread,
Oct 16, 2019, 8:21:19 AM10/16/19
to Suraj Thorat, Zipline Python Opensource Backtester
Can you show the error log.

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.

Suraj Thorat

unread,
Oct 16, 2019, 8:24:56 AM10/16/19
to Zipline Python Opensource Backtester
I have also posted a comment in GitHub for the same issue. The link is https://github.com/quantopian/zipline/issues/2346 .
This is the code.
 
from datetime import datetime
from zipline.api import order, symbol, record, order_target, set_benchmark
from zipline.algorithm import TradingAlgorithm
import zipline
from trading_calendars.exchange_calendar_twentyfourhr import TwentyFourHR

ticker = 'TCS'

#code
def initialize(context):
    context.security = symbol(ticker)
    set_benchmark(symbol(ticker))

#code
def handle_data(context, data):
    price_hist_25 = data.history(context.security, 'price', 25, '1m')  
    price_hist_50 = data.history(context.security, 'price', 50, '1m')  
    MA1 = price_hist_25.mean()  
    MA2= price_hist_50.mean()
    print(price_hist_25.head())
    print(price_hist_50.head())
    current_price = data.current(context.security, 'price') 
    current_positions = context.portfolio.positions[symbol(ticker)].amount
    cash = context.portfolio.cash
    value = context.portfolio.portfolio_value
    current_pnl = context.portfolio.pnl
    #code (this will come under handle_data function only)
    if (MA1 > MA2) and current_positions == 0:
        number_of_shares = int(cash/current_price)
        order(context.security, number_of_shares)
        record(MA1 = MA1, MA2 = MA2, Price=current_price,status="buy",shares=number_of_shares,PnL=current_pnl,cash=cash,value=value)
    elif (MA1 < MA2) and current_positions != 0:
         order_target(context.security, 0)
         record(MA1 = MA1, MA2 = MA2, Price= current_price,status="sell",shares="--",PnL=current_pnl,cash=cash,value=value)
    else:
        record(MA1 = MA1, MA2 = MA2, Price= current_price,status="--",shares="--",PnL=current_pnl,cash=cash,value=value)  


#initializing trading enviroment
perf = zipline.run_algorithm(start=datetime(2019, 10, 14, 3, 45, 0, 0, pytz.utc),
                              end=datetime(2019, 10, 15, 9, 59, 0, 0, pytz.utc),
                              initialize=initialize,
                              capital_base=100000,
                              handle_data=handle_data,
                              trading_calendar=TwentyFourHR(),
                              data_frequency ='minute',
                              data=panel)

This is the error log.
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-14-be6f14382239> in <module>
     45                               trading_calendar=TwentyFourHR(),
     46                               data_frequency ='minute',
---> 47                               data=panel)

/usr/local/lib/python3.5/site-packages/zipline/utils/run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter)
    428         local_namespace=False,
    429         environ=environ,
--> 430         blotter=blotter,
    431     )

/usr/local/lib/python3.5/site-packages/zipline/utils/run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)
    186             trading_calendar=trading_calendar,
    187             trading_day=trading_calendar.day,
--> 188             trading_days=trading_calendar.schedule[start:end].index,
    189         )
    190         choose_loader = None

/usr/local/lib/python3.5/site-packages/zipline/finance/trading.py in __init__(self, load, bm_symbol, exchange_tz, trading_calendar, trading_day, trading_days, asset_db_path, future_chain_predicates, environ)
    101             trading_day,
    102             trading_days,
--> 103             self.bm_symbol,
    104         )
    105 

/usr/local/lib/python3.5/site-packages/zipline/data/loader.py in load_market_data(trading_day, trading_days, bm_symbol, environ)
    154         last_date,
    155         now,
--> 156         environ,
    157     )
    158 

/usr/local/lib/python3.5/site-packages/zipline/data/loader.py in ensure_treasury_data(symbol, first_date, last_date, now, environ)
    263 
    264     data = _load_cached_data(filename, first_date, last_date, now, 'treasury',
--> 265                              environ)
    266     if data is not None:
    267         return data

/usr/local/lib/python3.5/site-packages/zipline/data/loader.py in _load_cached_data(filename, first_date, last_date, now, resource_name, environ)
    321         try:
    322             data = from_csv(path)
--> 323             if has_data_for_dates(data, first_date, last_date):
    324                 return data
    325 

/usr/local/lib/python3.5/site-packages/zipline/data/loader.py in has_data_for_dates(series_or_df, first_date, last_date)
     84     if not isinstance(dts, pd.DatetimeIndex):
     85         raise TypeError("Expected a DatetimeIndex, but got %s." % type(dts))
---> 86     first, last = dts[[0, -1]]
     87     return (first <= first_date) and (last >= last_date)
     88 

/usr/local/lib/python3.5/site-packages/pandas/core/indexes/datetimelike.py in __getitem__(self, key)
    294             attribs['freq'] = freq
    295 
--> 296             result = getitem(key)
    297             if result.ndim > 1:
    298                 # To support MPL which performs slicing with 2 dim

IndexError: index 0 is out of bounds for axis 0 with size 0
 




On Wednesday, October 16, 2019 at 5:51:19 PM UTC+5:30, Truong Pham Manh wrote:
Can you show the error log.

On Wed, Oct 16, 2019 at 6:34 PM Suraj Thorat <surajt...@gmail.com> wrote:
I have the same issue. Please tell me if you resolved it.

On Saturday, September 15, 2018 at 12:35:35 AM UTC+5:30, Mosfiqur Rahman wrote:
Somehow, when I change the csv files it's showing an index out of bound error. I tried with other csv files. same issue as well. Not sure why. You mentioned earlier about skipping the bars, I have resolved that but is it related to that fix?

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zip...@googlegroups.com.

Suraj Thorat

unread,
Oct 16, 2019, 8:29:03 AM10/16/19
to Zipline Python Opensource Backtester
Also you can add this code before my code to get the data. You can run the entire thing yourself like I did.
 
from alpha_vantage.timeseries import TimeSeries
from pprint import pprint
import matplotlib.pyplot as plt


ts = TimeSeries(key='C9IM7688YWD7NLPR', output_format='pandas')
data, meta_data = ts.get_intraday(symbol='NSE:INFY',interval='1min', outputsize='full')

#$%^$&^^(*&*)
# df = data_15min.loc[data_15min['Symbol']==sorted.index[0], ['Open', 'High', 'Low', 'Close', 'Volume']]
data.drop_duplicates(keep = 'first', inplace = True) 
data.columns = ['close','open','low','volume','high']
data = data[['open', 'high', 'low', 'close', 'volume']]

#$%^$&^^(*&*)

datatobacktest2 = data
single_symboldata2 = OrderedDict()
single_symboldata2['TCS'] = datatobacktest2

import pytz
#code
panel = pd.Panel(single_symboldata2)
panel.minor_axis = ['open', 'high', 'low', 'close', 'volume']
panel.major_axis = panel.major_axis.tz_localize('US/Eastern').tz_convert('UTC')
ist = pytz.timezone('Asia/Calcutta')



On Wednesday, October 16, 2019 at 5:51:19 PM UTC+5:30, Truong Pham Manh wrote:
Can you show the error log.

On Wed, Oct 16, 2019 at 6:34 PM Suraj Thorat <surajt...@gmail.com> wrote:
I have the same issue. Please tell me if you resolved it.

On Saturday, September 15, 2018 at 12:35:35 AM UTC+5:30, Mosfiqur Rahman wrote:
Somehow, when I change the csv files it's showing an index out of bound error. I tried with other csv files. same issue as well. Not sure why. You mentioned earlier about skipping the bars, I have resolved that but is it related to that fix?

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zip...@googlegroups.com.

Luigi

unread,
Mar 19, 2020, 9:34:23 PM3/19/20
to Zipline Python Opensource Backtester
I have similar problem:
- cannot digest 1 min frequency futures data:
- csv file with correct format
- utc timezone
- resampled to fill missing minutes
- trading calendar: CME

Here is my extension.py:
(does not work withor without the # parts)

start_session = pd.Timestamp('2017-6-26', tz = 'UTC')
end_session = pd.Timestamp('2017-06-28',tz = 'UTC')
#minutes_per_day=1440,
#calendar_name='CME',
#start_session=None,
#end_session=None
register(
    'custom-csvdir-bundle',
    csvdir_equities(
        ["minute"],
        '/home/luigi',
    ),
    calendar_name='CME', 
    start_session=start_session,
    end_session=end_session
)

After fixing timezone, missing minutes, trading_calendar, file location, the digest runs without (key_error or other ) error messages but still get sid = 0.
!zipline ingest -b custom-csvdir-bundle
 | NQ_2days_UTC: sid 0

and running simple sample algo fails probably because the bundle is empty.
ANy idea or suggestion how to digest csv minute data?
Thanks!
NQ_2days_UTC.csv

Noved

unread,
May 19, 2020, 4:42:39 PM5/19/20
to Zipline Python Opensource Backtester
Luigi, Did you also encounter the "index 0 is out of bounds for axis 0 with size 0" with your ingest? If yes, do you mind sharing your fix? Many thanks

Luigi

unread,
May 20, 2020, 8:56:22 AM5/20/20
to Zipline Python Opensource Backtester
No I did not have "index 0 is out of bounds for axis 0 with size 0" error.
ALso, I could not digest my data in zipline, I moved to another backtester.
Reply all
Reply to author
Forward
0 new messages