pipeline

Philip Schrader

unread,

Mar 13, 2018, 3:18:44 AM3/13/18

to Zipline Python Opensource Backtester

hi all, can anyone point me to some examples of using the zipline pipeline for research? I’ve used the quantopian research environment a bit but things don't seem to transfer easily into zipline. In particular, how can I make and run a pipeline in zipline?

Peter Harrington

unread,

Mar 13, 2018, 1:43:50 PM3/13/18

to Zipline Python Opensource Backtester

It takes a little more work to use Pipeline in Zipline but it is totally possible.

Here is a basic script that should get you going without spending days digging through the source code.

from zipline.pipeline.data.equity_pricing import USEquityPricing
from zipline.pipeline.engine import SimplePipelineEngine
from zipline.pipeline import Pipeline
from zipline.pipeline.loaders import USEquityPricingLoader
from zipline.data.bundles.core import load
import os
import pandas as pd
from ALPHAS101 import Alpha101


if __name__ == '__main__':
    bundle_data = load('quantopian-quandl', os.environ, None)

    my_pipeline = Pipeline(
        columns={
            'MyFactor': Alpha101(),
        }
    )

    pipeline_loader = USEquityPricingLoader(bundle_data.equity_daily_bar_reader, bundle_data.adjustment_reader)

    def choose_loader(column):
        if column in USEquityPricing.columns:
            return pipeline_loader
        raise ValueError("No PipelineLoader registered for column %s." % column)

    cal = bundle_data.equity_daily_bar_reader.trading_calendar.all_sessions
    cal2 = cal[(cal >= "2012-01-03") & (cal <= '2012-03-01')]

    spe = SimplePipelineEngine(get_loader=choose_loader,
                               calendar=cal2,
                               asset_finder=bundle_data.asset_finder)

    results = spe.run_pipeline(my_pipeline,
                               pd.to_datetime('2012-01-04', utc=True),
                               pd.to_datetime('2012-03-01', utc=True))

    print results.head()

    print "all done, boss!"

Philip Schrader

unread,

Mar 13, 2018, 11:51:31 PM3/13/18

to Zipline Python Opensource Backtester

Thanks very much Peter. You have indeed saved me days digging through source, time which I will now use looking through your alpha compiler examples and docs.

FWIW in order to run that code I had to use a different factor ( I don't have alphas101 yet) and needed parentheses around the print arguments.

Philip Schrader

unread,

Mar 15, 2018, 3:39:22 AM3/15/18

to Zipline Python Opensource Backtester

Hi Peter, I've been playing around with this a bit and reading through your alpha compiler blog and I have a few questions.

It seems that when zipline ingests the quandl wiki data set it ignores the columns ex-dividend, split_ratio ... or at least I can't see how to access them in USEquityPricing. Would it be relatively easy to write an ingest function which brings this data in? The zipline documentation on custom bundles is not particularly helpful.. I've had a look at your alphacompiler data loaders - I'm guessing a 'dump' is a download of all the data via the quandl api? do you have any example scripts showing how to do that?

Thanks again.

peter....@insead.edu

unread,

Mar 15, 2018, 7:26:43 AM3/15/18

to Zipline Python Opensource Backtester

Hi Pete,

A lot of the things that you do here like getting the cal, the bundle, we already know that when the algo is running. Should we extend the pipeline function so that it looks like in Quantopian? The question is actually, what do we have to develop to make it work like in Quantopian so we can make porting of algo's easier?

Peter

Joe Jevnik

unread,

Mar 15, 2018, 7:30:06 AM3/15/18

to peter....@insead.edu, Zipline Python Opensource Backtester

Which parts are you missing? Out of the box zipline can run pipelines using the USEquityPricing dataset which pulls pricing data out of the bundle. The corporate fundamentals and other third party datasets are not in Zipline because you would need to own your own personal license to use them, however, you can add your own datasets to pipeline by using the blaze pipeline loader. If that is what you are looking to do I can explain that in more detail.

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Bakker

unread,

Mar 15, 2018, 7:40:20 AM3/15/18

to Joe Jevnik, Zipline Python Opensource Backtester

cool, Havent played too much with it, just was wondering why there were so many steps to get the pipeline to work in the script of Peter.,..

Joe Jevnik

15 March 2018 at 22:29

Which parts are you missing? Out of the box zipline can run pipelines using the USEquityPricing dataset which pulls pricing data out of the bundle. The corporate fundamentals and other third party datasets are not in Zipline because you would need to own your own personal license to use them, however, you can add your own datasets to pipeline by using the blaze pipeline loader. If that is what you are looking to do I can explain that in more detail.

peter....@insead.edu

15 March 2018 at 22:26

Hi Pete,

A lot of the things that you do here like getting the cal, the bundle, we already know that when the algo is running. Should we extend the pipeline function so that it looks like in Quantopian? The question is actually, what do we have to develop to make it work like in Quantopian so we can make porting of algo's easier?

Peter

On Tuesday, 13 March 2018 18:18:44 UTC+11, Philip Schrader wrote:
--

You received this message because you are subscribed to a topic in the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zipline/kOazUWbki6U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zipline+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Regards,

Peter Bakker

m: +61435174700
https://www.linkedin.com/in/peterbakker/

Joe Jevnik

unread,

Mar 15, 2018, 7:43:09 AM3/15/18

to Peter Bakker, Zipline Python Opensource Backtester

Peter is showing the steps needed to run a pipeline outside of the context of an algorithm. If you want to run pipelines in an algorithm the API looks the same as Quantopian, but instead of `quantopian.pipeline` it is `zipline.pipeline`.

On Thu, Mar 15, 2018 at 7:39 AM, Peter Bakker <peter....@insead.edu> wrote:

cool, Havent played too much with it, just was wondering why there were so many steps to get the pipeline to work in the script of Peter.,..

Joe Jevnik

15 March 2018 at 22:29

Which parts are you missing? Out of the box zipline can run pipelines using the USEquityPricing dataset which pulls pricing data out of the bundle. The corporate fundamentals and other third party datasets are not in Zipline because you would need to own your own personal license to use them, however, you can add your own datasets to pipeline by using the blaze pipeline loader. If that is what you are looking to do I can explain that in more detail.

peter....@insead.edu

15 March 2018 at 22:26

Hi Pete,

A lot of the things that you do here like getting the cal, the bundle, we already know that when the algo is running. Should we extend the pipeline function so that it looks like in Quantopian? The question is actually, what do we have to develop to make it work like in Quantopian so we can make porting of algo's easier?

Peter

On Tuesday, 13 March 2018 18:18:44 UTC+11, Philip Schrader wrote:
--
You received this message because you are subscribed to a topic in the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zipline/kOazUWbki6U/unsubscribe.

To unsubscribe from this group and all its topics, send an email to zipline+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Joe Jevnik

unread,

Mar 15, 2018, 7:45:01 AM3/15/18

to Peter Bakker, Zipline Python Opensource Backtester

I just realized I only read the latest email, not the OP. I apologize. The steps Peter showed are correct for setting up a pipeline engine for research-like use. Basically, the setup to create the `SimplePipelineEngine` object is needed so that you can use the `run_pipeline` method. This method is basically what `quantopian.research.run_pipeline` is.

Peter Harrington

unread,

Mar 15, 2018, 9:54:16 AM3/15/18

to Zipline Python Opensource Backtester

Hi Philip,

I have never accessed splits and dividends from USEquityPricing. I believe under the hood zipline applies these two items to the OHLC bars when needed. I remember seeing some code that did that, but it has been a while. They are ingested (in the loader) using adjustment_writer.write(dividends=dfd). The SEP dataset only has dividends, (splits are pre-computed by the vendor, I have checked that.) You can see an example of both splits and dividends being written using the adjustment_writer here. If you wanted to test this, you could take build a bundle from a small sample of data, then pull the data out using pipeline above. You could then rebuild the bundle but delete the dividends or splits for one security, and check the data coming out of pipeline has indeed changed.

The bundle loaders that I have written, all pull data from a "dump" which, yes is just a text file on my hard drive. You could write code to pull data directly from an API, store it in main memory and then build it. Based on my experience of working on "big data" this is an extremely bad practice. If anything goes wrong, poof your input is gone and you are left guessing. Having a text file on hand, that is human readable makes debugging so easy. If I remember correctly the Zacks dump was ~3.5GB compressed, it would be really painful to download that 30-40 times while debugging. Imagine the CRSP dataset which has data going back to 1925.

Most of the datasets on Quandl have a link that says "download the full dataset here". Here is the one for SEP.

I hope that helps,

Peter

Rohit Swaroop

unread,

Mar 29, 2018, 9:46:52 AM3/29/18

to Zipline Python Opensource Backtester

Hi Joe - Thanks for your comments on the blaze loader in this discussion and here (https://groups.google.com/forum/#!searchin/zipline/blaze%7Csort:date/zipline/ME3GipmSro4/UwdhDf_GAAAJ). However, it would be great if you could provide a more detailed explanation or example of it working?

Many thanks,

Rohit

To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.

Reply all

Reply to author

Forward