Library for fundamental data on Zipline

Peter Harrington

unread,

Jan 25, 2018, 3:02:47 PM1/25/18

to Zipline Python Opensource Backtester

I wrote some code to use fundamental data on Zipline, it actually can used with any time-sparse data.

I spent a little bit of time documenting it.

http://alphacompiler.com/blog/6/

I hope someone finds this useful.

Peter

Rohit Swaroop

unread,

Feb 19, 2018, 12:01:48 PM2/19/18

to Zipline Python Opensource Backtester

Hi Peter,

Thanks, this is very useful.

Have you done any work on ingesting fundamental data sets other than from Quandl, e.g. locally?

Rohit

Peter Harrington

unread,

Feb 19, 2018, 3:52:03 PM2/19/18

to Zipline Python Opensource Backtester

I have not done that before but it is totally possible, it actually would be easier than pulling from an API.

If the files were in the right format you could skip the step with populate_raw_data().

All you would have to do is start with pack_sparse_data().

Rohit Swaroop

unread,

Feb 21, 2018, 12:38:40 PM2/21/18

to Zipline Python Opensource Backtester

Thanks. Have you by any chance updated the compiler for 3.6?

Peter Harrington

unread,

Feb 21, 2018, 2:57:06 PM2/21/18

to Zipline Python Opensource Backtester

I haven't checked the code on 3.6, I know some people have used it but not sure if they were using 2.X or 3.X.

If you want to update the code to work on both versions and submit a PR I would be happy to accept.

Rohit Swaroop

unread,

Feb 22, 2018, 11:02:28 AM2/22/18

to Zipline Python Opensource Backtester

Thanks Peter, I'll do that.

Sorry, one more thing - could you help me understand what the right format is for the fundamental data?

Peter Harrington

unread,

Feb 22, 2018, 12:45:31 PM2/22/18

to Zipline Python Opensource Backtester

So if you are using your own data it should be in a folder called raw specifically: alphacompiler/data/raw/.

Inside that raw folder should be a .csv file for each equity, it is easy to do this with Pandas DataFrame.to_csv(). The column name should be the name of the fundamental you are going to use, for example: CAPEX, EBIT, etc. Also the last but most important thing is that the file name of the .csv should be the SID of the asset from your data bundle. How are you supposed to know the ticker to SID mapping? I could explain that but instead I have provided some support code to do that for you in a function called: get_ticker_sid_dict_from_bundle(), see it in use here. So to summarize alphacompiler/data/raw/ should contain a bunch of files with integer names like: 1.csv, 2.csv, 269.csv, etc.[1]

Basically the steps I outlined above are done in populate_raw_tickers(), except it pulls the DataFrame from Qunadl.

Notes:

[1] The SIDs could in theory change each time you pull a bundle or load your own bundle, so every time a bundle is built the fundamentals should be built as well. This tightly couples the fundamental data to the bundle but gives us a large performance advantage. In production I would recommend writing a simple python or bash script called rebuild_data.py that rebuilds both of these at the same time.

Rohit Swaroop

unread,

Feb 26, 2018, 12:55:06 PM2/26/18

to Zipline Python Opensource Backtester

Thanks Peter, I'm getting the following error when running the example incorporating fundamental data into the pipeline. Just wondering if you'd come across this and found a solution? Happy to provide more detail

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Peter Harrington

unread,

Feb 26, 2018, 3:36:30 PM2/26/18

to Zipline Python Opensource Backtester

Can you please send me the stack track or post it here?

Rohit Swaroop

unread,

Feb 26, 2018, 4:25:43 PM2/26/18

to Zipline Python Opensource Backtester

Thanks:

Traceback (most recent call last):

File "C:\Users\rohit\AppData\Local\Programs\Python\Python36\Lib\runpy.py", line 193, in _run_module_as_main

"__main__", mod_spec)

File "C:\Users\rohit\AppData\Local\Programs\Python\Python36\Lib\runpy.py", line 85, in _run_code

exec(code, run_globals)

File "C:\Users\rohit\PycharmProjects\Basics\venv\Scripts\zipline.exe\__main__.py", line 9, in <module>

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 722, in __call__

return self.main(*args, **kwargs)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 697, in main

rv = self.invoke(ctx)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 1066, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 895, in invoke

return ctx.invoke(self.callback, **ctx.params)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 535, in invoke

return callback(*args, **kwargs)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\__main__.py", line 97, in _

return f(*args, **kwargs)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\decorators.py", line 17, in new_func

return f(get_current_context(), *args, **kwargs)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\__main__.py", line 240, in run

environ=os.environ,

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\utils\run_algo.py", line 179, in _run

overwrite_sim_params=False,

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 709, in run

for perf in self.get_generator():

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\gens\tradesimulation.py", line 237, in transform

algo.before_trading_start(self.current_data)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 453, in before_trading_start

self._before_trading_start(self, data)

File "mompip.py", line 72, in before_trading_start

context.pipeline_data = pipeline_output('my_pipeline')

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\utils\api_support.py", line 57, in wrapped

return getattr(algo_instance, f.__name__)(*args, **kwargs)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\utils\api_support.py", line 104, in wrapped_method

return method(self, *args, **kwargs)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 2457, in pipeline_output

return self._pipeline_output(p, chunks)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 2499, in _pipeline_output

pipeline, today, next(chunks),

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 2546, in _run_pipeline

self.engine.run_pipeline(pipeline, start_session, end_session), \

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\pipeline\engine.py", line 311, in run_pipeline

initial_workspace,

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\pipeline\engine.py", line 505, in compute_chunk

mask,

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\pipeline\mixins.py", line 214, in _compute

compute(date, masked_assets, out_row, *inputs, **params)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 92, in compute

self.cold_start(today, assets)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 71, in cold_start

self.time_index[asset] = self.bs_sparse_time(asset)

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 56, in bs_sparse_time

return self.bs(non_nan_dates) - 1

File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 41, in bs

if self.curr_date < arr[mid]:

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Peter Harrington

unread,

Feb 28, 2018, 9:56:16 AM2/28/18

to Zipline Python Opensource Backtester

Did you happen to update the N value in sf1_fundamentals.py?

I'm not sure what bundle you are using for OHLCV data, or for what time period, however if this value is not synced up with your bundle then you may get this error.

Also if your SF1.npy file is not too big you could send it to me or put in some place I could download and debug.

Costantino

unread,

Mar 5, 2018, 5:33:49 AM3/5/18

to Zipline Python Opensource Backtester

Hi Peter,

a good news... there is now from the vendor of SF1 (Sharadar) also a dataset with historical price data, included delisted companies:
https://www.quandl.com/databases/SEP
It's not fee but the fee is a lot cheaper than the Zacks Dataset.

Peter Harrington

unread,

Mar 15, 2018, 12:30:18 PM3/15/18

to Zipline Python Opensource Backtester

Hi Rohit,

Sorry for the late reply.

Did you happen to update the file:

alphacompiler/data/sf1_fundamentals.py ? It is not the best practice but after you build the .npy file you need to update that file, and reinstall alphacompiler. Alternatively you could just include the code from sf1_fundamentals.py in your algorithm or give the algorithm access to it.

This isn't the best practice and I am working on improving it.

Peter

Rohit Swaroop

unread,

Mar 29, 2018, 10:09:57 AM3/29/18

to Zipline Python Opensource Backtester

Hi Peter - Yes, it's all working well now - thanks very much, it's a great package. I'm actually using it for some custom fundamental data that I have.

I wondered if I could ask your thoughts on a couple of tangential things (i) did you consider using the blaze loader for fundamental data, would be great to understand any pros/cons; (ii) Is there a way setting the pipeline screen to filter by stock sector in zipline?

Rohit

Peter Harrington

unread,

Mar 29, 2018, 1:11:49 PM3/29/18

to Zipline Python Opensource Backtester

HI Rohit,

(i) Regarding the Blaze loader, I have read some docs but I never needed to mess with it. When you use the provided bundle ingestion code blaze is called under the hood. I think it would make sense if you were testing many "alternative" signals that changed rapidly and you needed to access them together, for my alpha models a handful of "alternative" signals is good enough.

(ii) Yes you can easily set up a pipeline screen to filter stock by sector data. The factors can all operate with inequalities (>, ==, !=, <).

look at Pipeline->Basic Usage here, (pasted below) now instead of (sma_10 < 5) you can use (my_sectorcode == 3). This assumes you have loaded the sector code somehow, but if you are using my SparseFactor class to load fundamental data I'm sure you can load sector codes.

    # Create and attach an empty Pipeline.
    pipe = Pipeline()
    pipe = attach_pipeline(pipe, name='my_pipeline')

    # Construct Factors.
    sma_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10)
    sma_30 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30)

    # Construct a Filter.
    prices_under_5 = (sma_10 < 5)

    # Register outputs.
    pipe.add(sma_10, 'sma_10')
    pipe.add(sma_30, 'sma_30')

    # Remove rows for which the Filter returns False.
    pipe.set_screen(prices_under_5)

Reply all

Reply to author

Forward