The file you linked is the full definition of USEquityPricing. One of the design ideas for Pipeline is separating an abstract description of a dataset from a concrete source for that dataset. We achieve that separation by distinguishing between DataSets and **loaders** for those datasets. DataSets are just abstract collections of sentinel values describing the columns/types for a dataset. A loader is an object which, given a request for a particular chunk of a dataset, can actually produce the requested data. We associate loaders with datasets by passing a dispatching function to SimplePipelineEngine (
).
Note that while USEquityPricingLoader only knows how to load the USEquityPricing dataset, it's perfectly possible to write a different loader for the same dataset and use that loader instead (we do this in many places in Zipline's test suite).
From a Pipeline API perspective, there's nothing essential about the bcolz format for pricing data other than that it's what we use for Quantopian and it's been pretty well-optimized for speed.
One particularly powerful family of loaders lives in zipline.pipeline.loaders.blaze, and uses Blaze (
http://blaze.readthedocs.io/en/latest/index.html) to automatically generate datasets and loaders for any kind of "morally tabular" data. We use blaze loaders on Quantopian to load data from a variety of sources, including SQL databases, in-memory dataframes, and from HDF5 files. There's a fairly lengthy description of how the blaze loader works at the top of `zipline/pipeline.loaders/blaze/core.py`.