Zipline with CSV file example

1,405 views
Skip to first unread message

William Wong

unread,
Mar 12, 2014, 4:00:09 AM3/12/14
to zip...@googlegroups.com
I'm new to this group and was trying to find examples on how zipline can read from csv. After some extensive search, i found this example http://www.snip2code.com/Snippet/3908/example-datasource-of-zipline

I now have a complete example which can read 'daily' data from an external csv file (barchart.com) in this format

AAPL,04/10/2012,639.93,644,626,628.44,31774600
AAPL,04/11/2012,636.2,636.87,623.34,626.2,24878898
AAPL,04/12/2012,625,631.33,620.5,622.77,21940301
AAPL,04/13/2012,624.11,624.7,603.51,605.23,30701498

Here is the complete code for those who are interested:

"""
leverage work of briancappello and quantopian team
(especcially twiecki, eddie, and fawce)
"""
import pandas as pd
from zipline.gens.utils import hash_args
from zipline.sources.data_source import DataSource
import datetime
import csv
import numpy as np
from zipline.algorithm import TradingAlgorithm
from pandas.tseries.tools import to_datetime
import matplotlib.pyplot as plt

def get_time(time_str):
    time_array = map(int, time_str.split(":"))
    assert len(time_array) == 2
    assert time_array[0] < 24 and time_array[1] < 61
    return datetime.time(time_array[0], time_array[1])


def gen_ts(date, time):
    return pd.Timestamp(datetime.datetime.combine(date, time))


class DatasourceCSVohlc(DataSource):
    """ expects dictReader for a csv file
     with the following columns in the  header
    dt, symbol, open, high, low, close, volume
    dt expected in ISO format and order does not matter"""
    def __init__(self, filename, **kwargs):
        self.filename = filename
        # Unpack config dictionary with default values.
        if 'symbols' in kwargs:
            self.sids = kwargs.get('symbols')
        else:
            self.sids = None
        self.tz_in = kwargs.get('tz_in', "US/Eastern")
        self.start = pd.Timestamp(to_datetime(kwargs.get('start'))).tz_localize('utc')
        self.end = pd.Timestamp(to_datetime(kwargs.get('end'))).tz_localize('utc')
        self._raw_data = None
        self.arg_string = hash_args(filename, **kwargs)

    @property
    def instance_hash(self):
        return self.arg_string

    def raw_data_gen(self):
        previous_ts = None
        with open(self.filename, 'rb') as csvfile:
            self.data = csv.reader(csvfile, delimiter=',', quotechar='|')
            for row in self.data:
                ts = pd.Timestamp(to_datetime(row[1])).tz_localize('utc')
                if ts < self.start or ts > self.end:
                    continue
                volumes = {}
                price_volumes = {}
                sid = row[0]
                if self.sids is None or sid in self.sids:
                    if sid not in volumes:
                        volumes[sid] = 0
                        price_volumes[sid] = 0
                        event = {"sid": sid, "type": "TRADE", "symbol": sid}
                        cols = ["open", "high", "low", "close"]
                        event["dt"] = ts
                        event["price"] = float(row[5])
                        event["volume"] = row[6]
                        volumes[sid] += float(event["volume"])
                        price_volumes[sid] += event["price"] * float(event["volume"])
                        event["vwap"] = price_volumes[sid] / volumes[sid]
                        event["open"] = row[2]
                        event["high"] = row[3]
                        event["low"] = row[4]
                        event["close"] = row[5]
                        yield event

    @property
    def raw_data(self):
        if not self._raw_data:
            self._raw_data = self.raw_data_gen()
        return self._raw_data

    @property
    def mapping(self):
        return {
            'sid': (lambda x: x, 'sid'),
            'dt': (lambda x: x, 'dt'),
            'open': (float, 'open'),
            'high': (float, 'high'),
            'low': (float, 'low'),
            'close': (float, 'close'),
            'price': (float, 'price'),
            'volume': (int, 'volume'),
            'vwap': (lambda x: x, 'vwap')
        }
class BuyApple(TradingAlgorithm):  # inherit from TradingAlgorithm
    """This is the simplest possible algorithm that does nothing but
    buy 1 apple share on each event.
    """
    def initialize(self):
        pass

    def handle_data(self, data):  # overload handle_data() method
        self.order('AAPL', 1)  # order SID (=0) and amount (=1 shares)
if __name__ == "__main__":
    source = DatasourceCSVohlc('barchart.csv', symbols='AAPL', start='2013-01-01', end='2014-01-01')
    simple_algo = BuyApple()
    results = simple_algo.run(source)
    results.portfolio_value.plot()

William Wong

unread,
Mar 12, 2014, 11:47:17 AM3/12/14
to zip...@googlegroups.com
I'm trying to plot the custom data source

if __name__ == "__main__":
    source = DatasourceCSVohlc('AAPL.csv', symbols='AAPL', start='2013-01-01', end='2014-01-01')
    source['AAPL'].plot

but got this error

TypeError: 'DatasourceCSVohlc' object has no attribute '__getitem__'

Do I need to implement __getitem__ ?

Thomas Wiecki

unread,
Mar 12, 2014, 10:34:26 PM3/12/14
to William Wong, zipline
Hi William

I believe we already have a csv data-source, have you seen this:
https://github.com/quantopian/zipline/blob/master/zipline/sources/data_source_csv.py
?

Thomas


--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

right...@gmail.com

unread,
Mar 12, 2014, 10:47:28 PM3/12/14
to Thomas Wiecki, zipline
Thanks Thomas. Yes, I have seen it. I was trying to use it and got a bunch of errors which I believe was related to the csv file format. Is there a copy sample csv which works with the csv data-source? Thanks.

Thomas Wiecki

unread,
Mar 13, 2014, 3:28:51 PM3/13/14
to right...@gmail.com, zipline
Can you post the errors you are getting and perhaps a short csv that replicates the issue as a github issue?

Thomas

Khanh Nguyen

unread,
Apr 13, 2014, 10:07:09 AM4/13/14
to zip...@googlegroups.com, right...@gmail.com
Hi,

Here are the error from my attempt at using data_source_csv.py.

1) Download data_source_csv.py to my working directory

2) Prepare my cvs file (no header)

2014-04-11,VNINDEX,597.87,602.01,594.96,600.57,104068010
2014-04-10,VNINDEX,605.4,608.89,600.43,601.33,110558340
2014-04-08,VNINDEX,600.71,603.25,598.52,603.25,109210640
2014-04-07,VNINDEX,595.72,602.55,595.35,600.57,108560420
2014-04-04,VNINDEX,589.9,593.72,588.34,593.04,92353580
2014-04-03,VNINDEX,584.45,590.91,582.83,589.44,99127880

3) Then

from data_source_csv import DatasourceCSVohlc
feed = DatasourceCSVohlc('vnindex_iso.csv',symbol='VNINDEX')

4) which gives me the error

RuntimeError                              Traceback (most recent call last)
<ipython-input-3-f0af12107791> in <module>()
----> 1 data = DatasourceCSVohlc('vnindex_iso.csv',symbol='VNINDEX')

/home/knguyen/money_maker/data_source_csv.pyc in __init__(self, data, **kwargs)
     39         self.tz_in = kwargs.get('tz_in', "US/Eastern")
     40         self.start = pd.Timestamp(np.datetime64(kwargs.get('start')))
---> 41         self.start = self.start.tz_localize('utc')
     42         self.end = pd.Timestamp(np.datetime64(kwargs.get('end')))
     43         self.end = self.end.tz_localize('utc')

/usr/local/lib/python2.7/dist-packages/pandas-0.13.1-py2.7-linux-x86_64.egg/pandas/tslib.so in pandas.tslib.Timestamp.tz_localize (pandas/tslib.c:7415)()

/usr/local/lib/python2.7/dist-packages/pandas-0.13.1-py2.7-linux-x86_64.egg/pandas/tslib.so in pandas.tslib.Timestamp.to_pydatetime (pandas/tslib.c:7893)()

RuntimeError: NumPy datetime metadata is corrupted with invalid base unit

Can you please help? Thanks.

-k

Michael S

unread,
Apr 14, 2014, 9:17:32 AM4/14/14
to zip...@googlegroups.com, right...@gmail.com
Its asking for an arg for when you want to start and end trading. It should have been required.
Reply all
Reply to author
Forward
0 new messages