Any example to use pipeline offiline?

457 views
Skip to first unread message

Tao Luo

unread,
Dec 11, 2015, 11:45:18 PM12/11/15
to Zipline Python Opensource Backtester
There are some posts on quantopian.com introducing pipeline under the online Algorithm environment, while I am wondering how to modify the algorithm to run pipeline offline.

can anyone provide an short example ? I really appreciate any help you could provide. 

Tao Luo

Michael Bennett

unread,
Dec 12, 2015, 4:30:16 AM12/12/15
to Zipline Python Opensource Backtester
I'm looking at this as well, if I get any further forward with my research I'll post the findings here.

Also worth posting this question on the wiki in github, its probably got more people looking at it:

Tao Luo

unread,
Dec 13, 2015, 4:49:23 AM12/13/15
to Zipline Python Opensource Backtester
OK, posted on github's issue, I hope it will draw developers' attention.

Scott Sanderson

unread,
Dec 13, 2015, 12:54:55 PM12/13/15
to Zipline Python Opensource Backtester
I replied on the GitHub issue, but copy/pasting here as well:

The best place to look for examples on how the pipeline machinery works is the pipeline test suite, which lives in tests/pipeline.


There isn't a great short answer to the question of "how do I use Pipeline with real data", because the Pipeline API exists primarily for simplifying computations on large point-in-time datasets, and there aren't many such datasets freely available for public use. The most promising one that I'm aware of is Quandl's WIKI dataset, which contains a couple thousand assets and includes dividends and splits. I have a branch lying around somewhere that started building machinery for creating a zipline-compatible asset database from there.


The long answer to your question is that, to run an algorithm using the Pipeline machinery, you need need to write a function, get_loader, that takes a pipeline dataset column, (e.g. USEquityPricing.close) and returns an object implementing a method named load_adjusted_array whose signature is

def load_adjusted_array(self, columns, dates, sids, mask):

load_adjusted_array should return a dictionary mapping the entries in columns to instances ofAdjustedArray containing data for the requested dates and sids (sids is a term for asset_ids in zipline for historical reasons).

If the dataset you want to use is small enough to hold in memory all at once, then you can use the built-inDataFrameLoader class from zipline.pipeline.loaders.frame for your loaders. The docstring for that class describes its functionality fairly well:

    """
    A PipelineLoader that reads its input from DataFrames.

    Mostly useful for testing, but can also be used for real work if your data
    fits in memory.

    Parameters
    ----------
    column : zipline.pipeline.data.BoundColumn
        The column whose data is loadable by this loader.
    baseline : pandas.DataFrame
        A DataFrame with index of type DatetimeIndex and columns of type
        Int64Index.  Dates should be labelled with the first date on which a
        value would be **available** to an algorithm.  This means that OHLCV
        data should generally be shifted back by a trading day before being
        supplied to this class.

    adjustments : pandas.DataFrame, default=None
        A DataFrame with the following columns:
            sid : int
            value : any
            kind : int (zipline.pipeline.loaders.frame.ADJUSTMENT_TYPES)
            start_date : datetime64 (can be NaT)
            end_date : datetime64 (must be set)
            apply_date : datetime64 (must be set)

        The default of None is interpreted as "no adjustments to the baseline".
    """

The adjustments frame is used to represent events that retroactively change our view of history. Most commonly, these are splits and dividends, which apply a backward-looking multiplier to the baseline array. If your dataset already uses "adjusted" prices/volumes, then you probably just want to pass None here.

Olivier Van Parys

unread,
Feb 20, 2018, 8:45:42 AM2/20/18
to Zipline Python Opensource Backtester
Hi Scott/Everyone

Would it be possible to see a sample code on say a simple "Buy and Hold" strategy to understand how this is implemented?
Also how can we use the opffline approach on different set of data not provided by Quantopian (say EUR USD forex), again an example would really be great

Scott Sanderson

unread,
Feb 20, 2018, 9:36:34 PM2/20/18
to Zipline Python Opensource Backtester
There's an example Buy and Hold strategy in zipline/examples/buy_and_hold.py.  You can ignore the `_test_args` function, which we use in the test suite to ensure that the example algorithms all run as expected.  The core of the algorithm is:

def handle_data(context, data):
   
if not context.has_ordered:
       
for stock in context.stocks:
            order
(symbol(stock), 100)
        context
.has_ordered = True

We don't really have much support for trading asset classes other than futures and equities. Some users have had success loading data for other asset classes and just telling Zipline that they're equities, which can work okay as long as you don't expect any special accounting or handling for your asset class.  The best place to look for adding your own data to zipline is probably the documentation for writing a new bundle.

- Scott

Olivier Van Parys

unread,
Feb 21, 2018, 5:10:14 AM2/21/18
to Scott Sanderson, Zipline Python Opensource Backtester
Thanks Scott. This is really useful, I have just invested in a pretty decent desktop so keen to see how it can handle this :-)
Have a fabulous day


Regards,
Olivier Van Parys
_________
PhD, MBA

--
You received this message because you are subscribed to a topic in the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zipline/J-R-2iMlLpQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zipline+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages