Library for fundamental data on Zipline

837 views
Skip to first unread message

Peter Harrington

unread,
Jan 25, 2018, 3:02:47 PM1/25/18
to Zipline Python Opensource Backtester
I wrote some code to use fundamental data on Zipline, it actually can used with any time-sparse data.  

I spent a little bit of time documenting it.  

I hope someone finds this useful.  

Peter

Rohit Swaroop

unread,
Feb 19, 2018, 12:01:48 PM2/19/18
to Zipline Python Opensource Backtester
Hi Peter,

Thanks, this is very useful.

Have you done any work on ingesting fundamental data sets other than from Quandl, e.g. locally?

Rohit

Peter Harrington

unread,
Feb 19, 2018, 3:52:03 PM2/19/18
to Zipline Python Opensource Backtester
I have not done that before but it is totally possible, it actually would be easier than pulling from an API.  
If the files were in the right format you could skip the step with populate_raw_data().
All you would have to do is start with pack_sparse_data().

Rohit Swaroop

unread,
Feb 21, 2018, 12:38:40 PM2/21/18
to Zipline Python Opensource Backtester
Thanks. Have you by any chance updated the compiler for 3.6?

Peter Harrington

unread,
Feb 21, 2018, 2:57:06 PM2/21/18
to Zipline Python Opensource Backtester
I haven't checked the code on 3.6, I know some people have used it but not sure if they were using 2.X or 3.X.  
If you want to update the code to work on both versions and submit a PR I would be happy to accept.  

Rohit Swaroop

unread,
Feb 22, 2018, 11:02:28 AM2/22/18
to Zipline Python Opensource Backtester
Thanks Peter, I'll do that.

Sorry, one more thing - could you help me understand what the right format is for the fundamental data?

Peter Harrington

unread,
Feb 22, 2018, 12:45:31 PM2/22/18
to Zipline Python Opensource Backtester
So if you are using your own data it should be in a folder called raw specifically: alphacompiler/data/raw/.  
Inside that raw folder should be a .csv file for each equity, it is easy to do this with Pandas DataFrame.to_csv().  The column name should be the name of the fundamental you are going to use, for example: CAPEX, EBIT, etc.  Also the last but most important thing is that the file name of the .csv should be the SID of the asset from your data bundle.  How are you supposed to know the ticker to SID mapping?  I could explain that but instead I have provided some support code to do that for you in a function called: get_ticker_sid_dict_from_bundle(), see it in use here. So to summarize alphacompiler/data/raw/ should contain a bunch of files with integer names like: 1.csv, 2.csv, 269.csv, etc.[1]

Basically the steps I outlined above are done in populate_raw_tickers(), except it pulls the DataFrame from Qunadl.  


Notes:
[1] The SIDs could in theory change each time you pull a bundle or load your own bundle, so every time a bundle is built the fundamentals should be built as well.  This tightly couples the fundamental data to the bundle but gives us a large performance advantage.  In production I would recommend writing a simple python or bash script called rebuild_data.py that rebuilds both of these at the same time.  

Rohit Swaroop

unread,
Feb 26, 2018, 12:55:06 PM2/26/18
to Zipline Python Opensource Backtester
Thanks Peter, I'm getting the following error when running the example incorporating fundamental data into the pipeline. Just wondering if you'd come across this and found a solution? Happy to provide more detail

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Peter Harrington

unread,
Feb 26, 2018, 3:36:30 PM2/26/18
to Zipline Python Opensource Backtester
Can you please send me the stack track or post it here?

Rohit Swaroop

unread,
Feb 26, 2018, 4:25:43 PM2/26/18
to Zipline Python Opensource Backtester
Thanks:

 Traceback (most recent call last):
  File "C:\Users\rohit\AppData\Local\Programs\Python\Python36\Lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\rohit\AppData\Local\Programs\Python\Python36\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\rohit\PycharmProjects\Basics\venv\Scripts\zipline.exe\__main__.py", line 9, in <module>
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\__main__.py", line 97, in _
    return f(*args, **kwargs)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\__main__.py", line 240, in run
    environ=os.environ,
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\utils\run_algo.py", line 179, in _run
    overwrite_sim_params=False,
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 709, in run
    for perf in self.get_generator():
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\gens\tradesimulation.py", line 237, in transform
    algo.before_trading_start(self.current_data)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 453, in before_trading_start
    self._before_trading_start(self, data)
  File "mompip.py", line 72, in before_trading_start
    context.pipeline_data = pipeline_output('my_pipeline')
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\utils\api_support.py", line 57, in wrapped
    return getattr(algo_instance, f.__name__)(*args, **kwargs)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\utils\api_support.py", line 104, in wrapped_method
    return method(self, *args, **kwargs)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 2457, in pipeline_output
    return self._pipeline_output(p, chunks)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 2499, in _pipeline_output
    pipeline, today, next(chunks),
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\algorithm.py", line 2546, in _run_pipeline
    self.engine.run_pipeline(pipeline, start_session, end_session), \
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\pipeline\engine.py", line 311, in run_pipeline
    initial_workspace,
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\pipeline\engine.py", line 505, in compute_chunk
    mask,
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\zipline\pipeline\mixins.py", line 214, in _compute
    compute(date, masked_assets, out_row, *inputs, **params)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 92, in compute
    self.cold_start(today, assets)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 71, in cold_start
    self.time_index[asset] = self.bs_sparse_time(asset)
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 56, in bs_sparse_time
    return self.bs(non_nan_dates) - 1
  File "c:\users\rohit\pycharmprojects\basics\venv\lib\site-packages\alphacompiler\util\sparse_data.py", line 41, in bs
    if self.curr_date < arr[mid]:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices


Peter Harrington

unread,
Feb 28, 2018, 9:56:16 AM2/28/18
to Zipline Python Opensource Backtester
Did you happen to update the N value in sf1_fundamentals.py?  
I'm not sure what bundle you are using for OHLCV data, or for what time period, however if this value is not synced up with your bundle then you may get this error.  

Also if your SF1.npy file is not too big you could send it to me or put in some place I could download and debug.  

Costantino

unread,
Mar 5, 2018, 5:33:49 AM3/5/18
to Zipline Python Opensource Backtester
Hi Peter,

a good news... there is now from the vendor of SF1 (Sharadar) also a dataset with historical price data, included delisted companies:
https://www.quandl.com/databases/SEP
It's not fee but the fee is a lot cheaper than the Zacks Dataset.

Peter Harrington

unread,
Mar 15, 2018, 12:30:18 PM3/15/18
to Zipline Python Opensource Backtester
Hi Rohit,
Sorry for the late reply.  
Did you happen to update the file: 
alphacompiler/data/sf1_fundamentals.py ?   It is not the best practice but after you build the .npy file you need to update that file, and reinstall alphacompiler.  Alternatively you could just include the code from sf1_fundamentals.py in your algorithm or give the algorithm access to it.  

This isn't the best practice and I am working on improving it.  
Peter

Rohit Swaroop

unread,
Mar 29, 2018, 10:09:57 AM3/29/18
to Zipline Python Opensource Backtester
Hi Peter - Yes, it's all working well now - thanks very much, it's a great package. I'm actually using it for some custom fundamental data that I have.  

I wondered if I could ask your thoughts on a couple of tangential things (i) did you consider using the blaze loader for fundamental data, would be great to understand any pros/cons; (ii) Is there a way setting the pipeline screen to filter by stock sector in zipline?

Rohit

Peter Harrington

unread,
Mar 29, 2018, 1:11:49 PM3/29/18
to Zipline Python Opensource Backtester
HI Rohit,

(i) Regarding the Blaze loader, I have read some docs but I never needed to mess with it.  When you use the provided bundle ingestion code blaze is called under the hood.  I think it would make sense if you were testing many "alternative" signals that changed rapidly and you needed to access them together, for my alpha models a handful of "alternative" signals is good enough. 

(ii) Yes you can easily set up a pipeline screen to filter stock by sector data.  The factors can all operate with inequalities (>, ==, !=, <).  
look at Pipeline->Basic Usage here, (pasted below) now instead of (sma_10 < 5) you can use (my_sectorcode == 3).  This assumes you have loaded the sector code somehow, but if you are using my SparseFactor class to load fundamental data I'm sure you can load sector codes.  


    # Create and attach an empty Pipeline.
   
pipe = Pipeline()
   
pipe = attach_pipeline(pipe, name='my_pipeline')

   
# Construct Factors.
   
sma_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10)
   
sma_30 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30)

   
# Construct a Filter.
   
prices_under_5 = (sma_10 < 5)

   
# Register outputs.
   
pipe.add(sma_10, 'sma_10')
   
pipe.add(sma_30, 'sma_30')

   
# Remove rows for which the Filter returns False.
   
pipe.set_screen(prices_under_5)
Reply all
Reply to author
Forward
0 new messages