Zipline performance / optimization

837 views
Skip to first unread message

Lucas Serpa Silva

unread,
Feb 8, 2015, 2:50:45 PM2/8/15
to zip...@googlegroups.com
Hi all,

I have connected zipline to some machine learning but right now on a low end computer it takes around 40 seconds to get 5 years with daily data simulation. On other platform it was in the milliseconds range.

I did some profiling https://docs.google.com/spreadsheets/d/1i_0B3xr8d_2PBL4lTfQ3yoTeqOXVwBb6aM4C2DpGqT0/edit?usp=sharing and noticed that the performance tracker takes a huge lump of the execution time, specially all the to_dict method calls.

Why is the performance computed during simulation and not after? given historical price and the positions I think we should be able to compute all the metrics after simulation right? Then we could also select what to compute?

Is there a way for me to disable performance tracking or have minimal tracking only? say return?  Is there a way that I can postpone the performance to after the simulation?

Has anyone tried to use cython? I tried but since my machine has a bunch of python versions installed cython did not like it. But if the improvement of cython is significant I might invest more time there.
 
I have seen in the latest commits that there has been some improvement on performance. Is there a branch or something I could read about it?

Thanks,
Lucas

Eddie Hebert

unread,
Feb 9, 2015, 11:43:29 AM2/9/15
to Lucas Serpa Silva, zipline
Lucas:

Thanks for digging into these areas of the code base! And you are right, we are currently looking at some performance improvements, there is a branch here https://github.com/dalejung/zipline/tree/history_perf_test_matrix where we are working some fixes to history, but there is currently no overarching performance branch.

The minimal tracking has been requested, and could be possible to wire in some flags to get that mode.

To set some context on what we are focusing on as maintainers, we focus on minute backtest mode; i.e. the pairing of minute data and daily performance emission is the primary case, as far as performance (runtime) optimization, since that is the backtest case on Quantopian.

When using minute bars, the profile changes so that the performance metrics are a marginal cost compared to data sources, position tracking and features like history.
However, we should be amenable to fixes to the over all performance metrics calculations, since a rising tide lifts all boats.

It should be possible to postpone the calculation of metrics as you mentioned, that would involve setting a flag in tracker.handle_market_close_daily to disable the risk metric creation.

We have looked at using some Cython, and we have started using it internally. It's a great tool to attack painful bottlenecks, I would expect more tight loops to be converted over to Cython soon. And you have some patches doing so, we would welcome them!

Thanks!

- Eddie

--
You received this message because you are subscribed to the Google Groups "Zipline Python Opensource Backtester" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zipline+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucas S. Silva

unread,
Feb 14, 2015, 4:40:34 PM2/14/15
to Eddie Hebert, zipline
Thank you for the reply Eddie,

I managed to cut some methods to its minimal. In the tracker I replaced handle_market_close_daily with

def handle_market_close_daily(self):
    self.update_performance()
    completed_date = normalize_date(self.market_close)
    self.day_count += 1.0
    self.returns[completed_date] = self.todays_performance.returns
    self.market_open, self.market_close =  trading.environment.next_open_and_close(self.market_open)
    small_dict_to_return = {'daily_perf': {'portfolio_value': self.todays_performance.returns}};
    return small_dict_to_return

I also disabled in tradesimulation#get_message

  self.algo.updated_portfolio()
  self.algo.updated_account()


it should not affect the return in performance right?

Good news is that I managed to drop the runtime from 40 seconds to 1.2 seconds on my laptop, on my server should be a few milliseconds.

The to_dict conversion was eating a lot of cpu cycles. It might be worth to keep it as Dataframe or Panel and avoid the conversions.

I tried to optimize strategies with min data before and for my infrastructure it is just infeasible, but one optimization that would prob help quantopian is TA/History caching. In my case since I run the optimization of the "Same" strategy I used to pre compute the TA's  and history and keep in memory and reuse it. For Quantopian it might be worth to have a pool with the latest calls to history and TAs if data,symbol,params match just grab from memory.

Another suggestion on the development side, it would be nice to introduce Gerrit https://code.google.com/p/gerrit/ code review framework. It makes it more efficient how people can contribute. With Gerrit people can just push to Gerrit their changes and if someone review it and give +2 the change can be pushed to the main.

Cheers,
Lucas

Eddie Hebert

unread,
Feb 15, 2015, 6:02:18 PM2/15/15
to Lucas S. Silva, zipline
That is some impressive gains Lucas!

The to_dict conversion may be something that both daily and minute can get benefits from cutting out.

If you have time, could you post full information to reproduce your results, I'd like 
i.e.:
- GitHub branch on your fork, or a patch.
- Algorithm code (though not need be your actual strategy if you want to keep that private.) and simulation parameters that you are using?

I've heard good things about Gerrit, and we should open up a separate thread to discuss it.  In particular, I'd be interested to see Thomas's opinion on it. One thing to consider in that conversation is that at Quantopian we need to balance efficiency versus compatibility.

- Eddie

Lucas S. Silva

unread,
Feb 16, 2015, 1:46:52 AM2/16/15
to zipline
Hi,

I will try to make a nice patch and make it configurable, maybe define a handle_market_close_daily_minimal and
handle_market_close_min_minimal.

What I have done so far is kinda hard to integrate since I meanly removed code.
But what I did should be all here: https://github.com/lssilva/zipline/commits/master

It removes all computation of performance, risk metrics, and account usage it only compute
portfolio value for each day and return each iteration. From portfolio value I should be
able to compute most  risk metrics / performance. Maybe we could even compute it in the handle_simulation_end method.


Cheers,
Lucas 

Laur Läänemets

unread,
Feb 26, 2015, 6:48:32 AM2/26/15
to zip...@googlegroups.com
Hey Lucas,

I am after a similar thing as I want to use recurrent neural networks but Zipline is waaaay too slow to use it to calculate the fitness function.

I tried your fork and here is what I got:

1) Writing the algo object used to take 15 sec, now down to 1 sec:

# Create algorithm object passing in initialize and
# handle_data functions
algo_obj = TradingAlgorithm(initialize=initialize, 
                            handle_data=handle_data,
                            data_frequency='daily',
                            capital_base=10000)

2) Running the backtest used to take 100 sec now down to 50 sec, BUT the daily_stats is an empty dataframe.

# Run algorithm
daily_stats = algo_obj.run(data)

I will dig into tweaking the code (mostly remove what is not needed) today and I was just wondering at what stage you left it 12 days ago, was the .run(data) returning you the portfolio value?

cheers,
Laur

Fabian Braennstroem

unread,
Feb 26, 2015, 3:11:05 PM2/26/15
to zip...@googlegroups.com
Hello,

this sounds very interesting! Actually I would like to use deap for the optimization.
Unfortunately it is not clear to me how to install two different zipline versions.
Do you have a hint for me?

Thanks in advance!
Best Regards
Fabian

Ken Hersey

unread,
Mar 4, 2015, 10:51:48 PM3/4/15
to zip...@googlegroups.com
A safe way might be to use virtualenv to set up a separate python environment.

-Ken

Fabian Braennstroem

unread,
Mar 5, 2015, 2:50:26 PM3/5/15
to zip...@googlegroups.com
Hello Ken,

yes, good info, I forgot this.
Thanks!

Best Regards
Fabian

Florent Chandelier

unread,
Mar 6, 2015, 9:01:11 PM3/6/15
to zip...@googlegroups.com
Hi all,
I've looked at Lucas suggestions. and worked it out so it's now a parameter in Zipline, that can be kept alongside the further improvements in Zipline.

I've added one parameters to the 'sim_params' variable, and propagate to the relevant classes as suggested by Lucas.

# Performance Backtest: bypassing any performance and risk metrics
# -> track variable 'fast_backtest' to see implications
# -> Modified files:
#        -> algorithm.py
#        -> tracker.py
#        -> tradesimulation.py

fast_backtest = True
algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data, capital_base = 10000, fast_backtest=fast_backtest)

I have reached a 19 times decrease in processing time (from 60 seconds to 3secs) with my demo code.
The demo code I've used is here: https://github.com/florentchandelier/zipline2quantopian/tree/master/example/paired_switching_strategy

The modified zipline code is here (checking out the relevant branch): https://github.com/florentchandelier/zipline/tree/wip-ImproveBacktestSpeed

Thanks Lucas.

Florent Chandelier

unread,
Mar 17, 2015, 9:42:25 PM3/17/15
to zip...@googlegroups.com
As a simple follow-up, using the modification suggested below, and aggregating portfolio_value in a pandas.Series in Handle_data(), enabled the speed improvements while enabling the processing of multiple strategy metrics, that is the whole range provided naturally by Zipline. I will release an entire example for it on my github.

Florent Chandelier

unread,
Mar 17, 2015, 10:11:27 PM3/17/15
to zip...@googlegroups.com
Done: https://github.com/florentchandelier/zipline/tree/wip-ImproveBacktestSpeed
So this branch tracks the current Zipline repo, and simply adds the fast_backtest parameter as input that increase backtest speed by a factor of 15/19 from my current tests.
I will publish the code for tracking portf_value through a pd.Series and post-processing it for cagr, sharpe, drawdown analysis ...

Thomas Wiecki

unread,
Mar 18, 2015, 1:19:39 PM3/18/15
to Florent Chandelier, zipline
This looks great, thanks for putting that up.

We just discussed internally that such a flag would be a great addition to zipline and the required code change doesn't look that bad.

Florent Chandelier

unread,
Mar 18, 2015, 8:20:39 PM3/18/15
to zip...@googlegroups.com, flo.cha...@gmail.com
Hi thomas,
Indeed, the change is minimal, and speed improvement is crazy good.

Then I simply track context.portfolio.portfolio_value in def handle_data(context, data), in a pd.Series that I use to further post-process during risk analysis after backtesting is done ... which would be a great default behavior for algo.run to return when fast_backtest is active.

I will not trigger a pull request as I'm pretty sure your team will find a cleaner way for integration. Let me know when it's officially out.

HTH

Florent Chandelier

unread,
Mar 18, 2015, 8:26:59 PM3/18/15
to zip...@googlegroups.com, flo.cha...@gmail.com
I have created this issue as a tracker:
https://github.com/quantopian/zipline/issues/538


On Wednesday, March 18, 2015 at 1:19:39 PM UTC-4, twi...@quantopian.com wrote:

John Fawcett

unread,
Mar 18, 2015, 8:42:55 PM3/18/15
to Florent Chandelier, zip...@googlegroups.com
Slick idea - thanks for contributing this to zipline!


Lucas Serpa Silva

unread,
Mar 25, 2015, 3:25:13 AM3/25/15
to zip...@googlegroups.com
Thanks that is super!
Are you using the history or TA?

My next step in the optimization was to cache the history/TA. During strategy optimization most of the time the history and TA are computed multiple times for the same parameter. I wanted to create a huge hash table so we compute the TA/History only one time.

Of course if the History and TA parameters are dynamic it won't work that well.

I will also check how much work it is to replace dictionary to one of Pandas data structure so we can avoid the to_dict calls they are eating some CPU cycles.

I will fork your change and try to chop the run time down a bit more.

Cheers,
Lucas

Florent Chandelier

unread,
Mar 25, 2015, 1:51:51 PM3/25/15
to zip...@googlegroups.com
;-) .. thanks for suggesting that path in the first place !
using History().
Reply all
Reply to author
Forward
0 new messages