The best place to look for examples on how the pipeline machinery works is the pipeline test suite, which lives in tests/pipeline
.
There isn't a great short answer to the question of "how do I use Pipeline with real data", because the Pipeline API exists primarily for simplifying computations on large point-in-time datasets, and there aren't many such datasets freely available for public use. The most promising one that I'm aware of is Quandl's WIKI dataset, which contains a couple thousand assets and includes dividends and splits. I have a branch lying around somewhere that started building machinery for creating a zipline-compatible asset database from there.
The long answer to your question is that, to run an algorithm using the Pipeline machinery, you need need to write a function, get_loader
, that takes a pipeline dataset column, (e.g. USEquityPricing.close
) and returns an object implementing a method named load_adjusted_array
whose signature is
def load_adjusted_array(self, columns, dates, sids, mask):
load_adjusted_array
should return a dictionary mapping the entries in columns
to instances ofAdjustedArray
containing data for the requested dates and sids (sids is a term for asset_ids
in zipline for historical reasons).
If the dataset you want to use is small enough to hold in memory all at once, then you can use the built-inDataFrameLoader
class from zipline.pipeline.loaders.frame
for your loaders. The docstring for that class describes its functionality fairly well:
"""
A PipelineLoader that reads its input from DataFrames.
Mostly useful for testing, but can also be used for real work if your data
fits in memory.
Parameters
----------
column : zipline.pipeline.data.BoundColumn
The column whose data is loadable by this loader.
baseline : pandas.DataFrame
A DataFrame with index of type DatetimeIndex and columns of type
Int64Index. Dates should be labelled with the first date on which a
value would be **available** to an algorithm. This means that OHLCV
data should generally be shifted back by a trading day before being
supplied to this class.
adjustments : pandas.DataFrame, default=None
A DataFrame with the following columns:
sid : int
value : any
kind : int (zipline.pipeline.loaders.frame.ADJUSTMENT_TYPES)
start_date : datetime64 (can be NaT)
end_date : datetime64 (must be set)
apply_date : datetime64 (must be set)
The default of None is interpreted as "no adjustments to the baseline".
"""
The adjustments
frame is used to represent events that retroactively change our view of history. Most commonly, these are splits and dividends, which apply a backward-looking multiplier to the baseline array. If your dataset already uses "adjusted" prices/volumes, then you probably just want to pass None
here.
def handle_data(context, data):
if not context.has_ordered:
for stock in context.stocks:
order(symbol(stock), 100)
context.has_ordered = True
--
You received this message because you are subscribed to a topic in the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zipline/J-R-2iMlLpQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zipline+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.