How to create a new DataSet like USEquityPricing?

736 views
Skip to first unread message

Guy Zyskind

unread,
May 12, 2017, 1:54:33 PM5/12/17
to Zipline Python Opensource Backtester
Hi,

I'm trying to create a custom dataset package that can be used in a pipeline. Something like USEquityPricing that exists here: https://github.com/quantopian/zipline/blob/master/zipline/pipeline/data/equity_pricing.py. However, it seems that this file only points to a stripped down version of the actual implementation.

Is there a real example somewhere on how to achieve this? Since all the datasets are private, I couldn't find a code sample. Note that I'm not interested in the actual data itself, but rather how to wrap my own dataset to be used in the same manner that USEquityPricing, Morningstar's data, etc. are used.

Thanks!

Scott Sanderson

unread,
May 12, 2017, 3:17:46 PM5/12/17
to Guy Zyskind, Zipline Python Opensource Backtester
The file you linked is the full definition of USEquityPricing.  One of the design ideas for Pipeline is separating an abstract description of a dataset from a concrete source for that dataset.  We achieve that separation by distinguishing between DataSets and **loaders** for those datasets.  DataSets are just abstract collections of sentinel values describing the columns/types for a dataset.  A loader is an object which, given a request for a particular chunk of a dataset, can actually produce the requested data.  We associate loaders with datasets by passing a dispatching function to SimplePipelineEngine (https://github.com/quantopian/zipline/blob/master/zipline/pipeline/engine.py).

The loader used on the Quantopian platform for USEquityPricing, is the USEquityPricingLoader class defined in https://github.com/quantopian/zipline/blob/master/zipline/pipeline/loaders/equity_pricing_loader.py.  It, in turn, mostly delegates to lower-level subsystems that know how to fetch pricing data in the default formats used by Zipline (bcolz for pricing data, and SQLite for split/merger/dividend data).  You could get data in these formats by using (or implementing) a zipline bundle, as described in (http://www.zipline.io/bundles.html#discovering-available-bundles).  

Note that while USEquityPricingLoader only knows how to load the USEquityPricing dataset, it's perfectly possible to write a different loader for the same dataset and use that loader instead (we do this in many places in Zipline's test suite).
From a Pipeline API perspective, there's nothing essential about the bcolz format for pricing data other than that it's what we use for Quantopian and it's been pretty well-optimized for speed.

There are lots of other loaders defined in https://github.com/quantopian/zipline/tree/master/zipline/pipeline/loaders, including, for example, a loader that can just take a pandas DataFrame (https://github.com/quantopian/zipline/blob/master/zipline/pipeline/loaders/frame.py), which might be useful if your dataset is small enough to hold in RAM all at once.

One particularly powerful family of loaders lives in zipline.pipeline.loaders.blaze, and uses Blaze (http://blaze.readthedocs.io/en/latest/index.html) to automatically generate datasets and loaders for any kind of "morally tabular" data.  We use blaze loaders on Quantopian to load data from a variety of sources, including SQL databases, in-memory dataframes, and from HDF5 files.  There's a fairly lengthy description of how the blaze loader works at the top of `zipline/pipeline.loaders/blaze/core.py`.

Hope that helps,
- Scott

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rohit Swaroop

unread,
Feb 27, 2018, 11:20:06 AM2/27/18
to Zipline Python Opensource Backtester
Scott - Are any of the loaders you mentioned able to incorporate fundamental (e.g. Morningstar) data? Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.

Su Kai

unread,
Jul 2, 2018, 6:10:47 AM7/2/18
to Zipline Python Opensource Backtester
I found the branch which implement Fundamental DataSet:  https://github.com/bartosh/zipline/blob/fundamentals/zipline/pipeline/loaders/fundamentals_loader.py

在 2018年2月28日星期三 UTC+8上午12:20:06,Rohit Swaroop写道:
Reply all
Reply to author
Forward
0 new messages