Advice on Optimizing

130 views
Skip to first unread message

Marcus Williford

unread,
Jan 15, 2013, 6:01:01 PM1/15/13
to jbook...@googlegroups.com
I wanted to see if anyone would post additional advice on optimization workflows that work, and perhaps help me explore ways to improve my own.  Here are some things that work for me so far, but I'm not yet satisfied with the efficiency.  

What works for me:
-  I come up with model theory in my head, about what might improve my trades.  I do this by observation of prior trades (backtest utility), and I try to see obvious things the Strategy didn't get right.  For this, I love the "chart" feature of the backtest data. 
-  I then code some changes, and watch backtest again, knowing that it isn't optimized, but I look for some improvement with any algo changes.
-  Then I run divide-conquer optimization, across a broad range of min/max.  But, I never can tell at what scale I want to operate on.  How fast is a fast ema period, etc....  I'm always a bit unsure if even my broad range is broad enough, but I eventually settle on something.
-  If the divide and conquer method started to look promising, I note (and hope to see) hot spots in the heat map.  Unfortunately, it is usually too spares to really see and understand what is going on. So, I take a guess at some ranges for each param.
-  Finally, if all went well, I setup an overnight run (usually around 3,000,000+ "combinations", which takes forever on my blazing fast  new macbook retina with 16GB ram, etc, etc..  Usually 20 hours with the fan on high.  Maybe I need Applecare after-all.
-  Now maybe I have something, then I take the best island area in this process, back-test it, and start to study trades again.  Mostly, this puts me back into a loop of trying to make another Strategy change to improve it further.

So, this is the process I came up with so far.  It works, I get improvements, but it is very slow going.

What might need improvement in either my process or the jbooktrader:
-  I wish to see good/bad trades fast!  The chart graphs everything, and hence takes forever.  I am thinking of a "trade view" chart, which shows only the area around the trade, in great detail, then maybe advance to the next trade fast.  This would speed up my review of trades.
-  Optimizer, figuring out why it takes so long.  I know, 3,000,000 combinations is a lot of work.  But, I feel like maybe studying this in a profiler, and trying to make some improvements.  Nevermind the wild ideas I have about farming out work to a dynamic cluster of EC2 servers using hadoop.  
-  Indicator graphing.  I have placed indicators I don't even use into Strategies, just to see them in the graph, and get a feel for if they make sense.  Maybe people use some external program for this?  If so, how to tell if your indicator logic is correct.  So, I dirty up my strategies with unused indicators, just to see them on the graphs.
-  I considered trying to wire up scalalab (and learn it) to my java strategies, so i can play with more advanced analysis without rolling my own code for math routines.  For example, write scalalab adapter to run my strategies, and use it like matlab to make improvements.  I could then use the same java code for both math software package, and trading.  A cool idea, but I only got as far as installing scalalab, nevermind learning it.
-  Move this onto a giant linux box in my home, and offload this optimization for now.

As you can see, I'm all over the place, any advice?  Did anyone else have any of these ideas?  Maybe you did, and already executed on it?  Since I have so many thoughts, each which require a lot of work, I'm seeking some feedback from people who have been doing this for years.  I'm a newbie.  Maybe I just need to learn how to use what we have better?  

Meanwhile, running a giant CL optimization, maybe I'll have something to trade soon.

Marcus

 




Eugene Kononov

unread,
Jan 15, 2013, 8:01:31 PM1/15/13
to jbook...@googlegroups.com
Marcus,

My optimization workflow is the same as yours. Here are some comments:

1. If the backtest chart takes too long to render, it's probably because you set the "bar size" at a high resolution. On my machine, rendering the 2-year chart with the "1-hr" bar size takes a few seconds. If you'd like a higher resolution, then you can specify a specific backtesting period period and then create the chart with that. 

2. With regards to the speed of optimizer, I profiled and optimized the hell out of it. If I run it through a profiler now, the only bottlenecks that it shows are evenly distributed in the low level JDK libraries. In other words, I don't think there is anything left to optimize. Both the "divide-and-conquer" and the "brute force" optimizer engage all CPU cores. On my middle-level machine (i7, 4 cores, Windows 64bit, 8Gb RAM), the optimizer speed for typical strategies is about 150 million samples per second. That is to say, if my data file has 150 million 1-second samples, and I optimize 60 strategies, it would take 60 seconds. Just like you, I do overnight runs to optimize millions of strategies. Yes, that's a lot of work, and a lot of trial and error. One thing worth experimenting is the "strategies per processor" setting in the advanced optimization dialog. The default is 50, but on a specific hardware and OS, so other number is likely to improve the throughput. Some people also explored the parallel optimization using GridGain. JBT optimization can be easily distributed to different machines. All in all, the historical market depth data file have a *lot* of data, so a lot of computational power is required to crunch through all the numbers. I've done so much optimization with JBT over the years, that I can tell what the optimizer is doing by just listening to the fan speed. I am not kidding!

3. Optimization itself is a delicate process. The dangers of over-optimizations (or over-fitting) are well publicized. I look for several things in the optimization results:
a) What you call a "best island" (I call it a "high plateau") must be broad enough in all dimensions. That is, instead of a random "spike" in performance, it must be a wide area of elevated performance. Try to set the parameter ranges to 15% away from the center of the plateau. How much degradation in performance is there as you move away from the center?
b) The number of trades must be high enough. It's very easy to get infinitely high performance metrics (such as PF and PI), if the number of trades is low. This is just a game of permutations. The larger the number of trades, the more significant are the results. In fact, I think this statistical significance scales as the square root of the number of trades.
c) The data file must cover sufficiently long period of time. I prefer at least 1 year. 
d) The number of parameters must be low. I prefer below 5.  
e) Almost exclusively, I use PI as my optimization selection criteria. I believe this metric is superior than the other ones in JBT.

4. After the optimization job is completed, I select the optimal set of parameters, and run a backtest with that set of parameters. Next, I pop up the chart, and look at all the *losing* trades. Here is where I attempt to improve the candidate strategy. Is there a commonality between all the losing trades? Is there a single pre-condition in my strategy which would have prevented this losers. Next, I add a precondition, and run the next optimization job. And so it goes. 

5. Often times, there are too many indicators and preconditions. The strategy performs well, but what contributes to the good performance? Here is where I often to the factor analysis. Eliminate an indicator or a condition, and run the optimization job again. By how much did the performance degrade?. If it's only by a little, then this particular indicator or a condition is probably worthless.

6. After I identify the candidate for trading, I typically forward test it, to make sure I have not missed anything gross.

7. Now the candidate is ready for trading. You think you are done? Nope! Now comes the part where you need to continuously monitor the live performance over time, and decide when to re-optimize, and when to discontinue this strategy, in case the market has shifted away from the mode for which the strategy was well suited. 

Hope this is not very discouraging. I am not aware of an easy way around it.

--
You received this message because you are subscribed to the Google Groups "JBookTrader" group.
To post to this group, send email to jbook...@googlegroups.com.
To unsubscribe from this group, send email to jbooktrader...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jbooktrader?hl=en.

Marcus Williford

unread,
Jan 15, 2013, 8:33:18 PM1/15/13
to jbook...@googlegroups.com
No, this is not discouraging, but it does underscore the reality that it really does take some effort (measured in CPU and man-hours) to get a Strategy that works for "real reasons", not just some hacked over-optimized junk against too small of a dataset.

Thanks for your insight, it just helps me confirm that the hours I'm spending are in the right direction based on your experience.

Marcus

Eugene Kononov

unread,
Jan 15, 2013, 8:49:34 PM1/15/13
to jbook...@googlegroups.com
For the long-running optimization runs, you may squeeze another 15% performance improvement by eliminating these lines of code in OptimizerRunner.java: 

                    if (worker % divider == 0) {
                        Collections.sort(optimizationResults, resultComparator);
                        optimizerDialog.setResults(optimizationResults);
                    }

The intent of this code is indicate the progress of optimization by incrementally updating the results in the optimization results table. But if you are running millions of strategies, it takes a lot of CPU cycles to update a JTable with 3 million rows. If you comment out this code, the optimizer will work exactly as before, but no interim progress would be shown.

Another easy performance improvement is to specify "profitable strategies" in the "inclusion criteria". This will lighten the workload for the JTable updater. Other than that (and the "strategies per processor" setting that I mentioned above), there is little that you can do, short of distributing the workload. However, feel free to profile to see if you can find something. I used JProfiler for this.

new_trader

unread,
Jan 16, 2013, 2:24:14 AM1/16/13
to jbook...@googlegroups.com
technical advise:
the Java implementation from Oracle/Sun on Apple OSX is very slow compared to Windows, on Linux I have no experience.
 
Last year or so I did a performance benchmark, and the JVM running on Windows was nearly twice as dast as running on OSX on the same machine.

Borg Alexander

unread,
Jan 16, 2013, 8:12:18 AM1/16/13
to jbook...@googlegroups.com
Try to reduce the number of parameters.

I recently checked out GC and found the there is, for all promising islands, a linear relationship between entry and exit, and between scale and entry, and therefore between scale and exit, but not between period and scale. Besides the fact that I really don't know yet what that really means and whether this has any significance pertaining to the market, I could theoretically reduce the 4 parameters to 2 (period and scale) and calculate the other 2 (entry and exit).

Klaus

unread,
Jan 16, 2013, 2:06:12 PM1/16/13
to jbook...@googlegroups.com
I also did some time back and this does not correspond to my results.  I found the Mac VM basically to be identical to WIndows (which makes sense as it should be the same implementation, except for some low-level calls)

However, some parameters have significant impact on the performance. Be sure to always use the server (i.e., hotspot) version of the VM, otherwise the performance may easily deviate by a factor of two. 

Eugene Kononov

unread,
Jan 16, 2013, 2:22:44 PM1/16/13
to jbook...@googlegroups.com
Marcus, are you using the "aggressiveHeap" option?

Marcus Williford

unread,
Jan 16, 2013, 3:59:27 PM1/16/13
to jbook...@googlegroups.com
Thanks for all the feedback.  

I am not using "server" VM, I may try that.  I am using aggressiveHeap.  I think I am just running with too many parameters for my own good.  
I may need to get a separate linux box for running optimizations, cheaper and less wear and tear on my brand new MacBook Retina.  I'm not sure if Apple tested these running full-cpu for days!

Marcus


On Wed, Jan 16, 2013 at 11:22 AM, Eugene Kononov <eugene....@gmail.com> wrote:
Marcus, are you using the "aggressiveHeap" option?

--
You received this message because you are subscribed to the Google Groups "JBookTrader" group.
Reply all
Reply to author
Forward
0 new messages