NHL Trend Aggregator

958 views
Skip to first unread message

summerlink

unread,
Jan 28, 2016, 10:05:23 AM1/28/16
to SportsDataBase
Hi everyone,

I've collected a set of 37 NHL trends ( some made by me, some taken from the SBR NHL Situational plays thread - > see nhlsdql attachment). All of them are profitable in their own right.
I thought I did something to see how they work together.

So I put together a small tool that takes all the valid plays in the sports database suggested by all the trends and calculates the ROI and Yield for the "aggregated" super system.
A play in this aggregated system is any play of a particular trend that is not contradicted by another trend.

The ROI is calculated based on 100$ bets and all the plays are made on the ML. The results below are from the beginning of the database up until today ( 27th January 2016 included )

Period        YIELD     ROI     Win   Loss

2006-2007  20.31    14244    329    214
2007-2008  18.64    12722    334    241
2008-2009    9.76      6945    308    272
2009-2010  11.40      7881    316    263
2010-2011  14.89    10753    346    260
2011-2012  14.98      9913    321    232
2012-2013  15.99      6768    208    150     ( lock-out shortened season, that's why there are only 358 plays )
2013-2014  16.43    11628    340    249
2014-2015  12.17      8497    319    249   
2015-2016    1.50        563    159    151    ( see attachment for the list of plays )

Overall       14.11    89914    2980 2281   ( 0.5664 % hit rate , not necessarily that relevant since all the plays are on the ML )


As it can be seen, after 9 strong years of almost 100 unit ROI profit/year and better than 10% Yield, came this year, with just 1.5% Yield and 5.6 u profit.

I'm sure that the set of trends I am using is not perfect, and I kinda need some of your guys help to polish the trends I use ( either add yours or improve existing ).

I will get back sometime in the next couple of days with a mini version of the tool you guys can use for yourself.

Of course, with what I've done so far its easy to transition to other sports and under/over categories.
I will most likely be doing something similar for MLB before the season starts and then for NFL and NBA in the summer. So again, your input there will be valuable.


Cheers everyone
20152016plays.txt
nhlsdql

JJ 21

unread,
Feb 4, 2016, 2:49:02 PM2/4/16
to SportsDataBase
I'd be interested to take a look at the tool you're building that helps rule out contradictory trends/systems. Does it come in the form of an SDQL code that we're able to use with any of our own trends?

Pcg

unread,
Feb 4, 2016, 7:44:15 PM2/4/16
to SportsDataBase
The key thing now will be to see how this mass aggregation does moving forward.

A little trick to save some time is to remove random samples from your "research" pool, find your systems and then just enter the sample you randomly removed previously to sort of move forward in time.

summerlink

unread,
Feb 6, 2016, 5:25:45 PM2/6/16
to SportsDataBase
Its code-based.

The aggreagting algorithm is this:

1) Im taking each trend, one at a time, and populate two maps ( a "on" map and a "fade" map ). More on these two in the notes section

2) Im building an auxiliary map, exact duplicate of the "on" map.

3) for each game in the "on" map I'm searching if it's ID exists also in the fade map
If it does, I am removing it from the auxuliary map

4) After all the on map is iterated through, the auxiliary map contains the list of all the games that only exist in the "on" map and do not exist in the fade map

Noes: :
- a gameID is, for example : "20160206:Devils-Capitals"
- the fade map is built by reversing each game id from the "onMap".
For example if the first trend says "20160206:Devils-Capitals" is a play, then I add "20160206:Capitals-Devils" to the fade map and "20160206:Devils-Capitals in the on map

Im almost sure there must be a more efficient way to filter out the contradictory games ( or optimize my own ), but the method I just described does what its supposed to and it doesn't have a performance penalty.

Any decent junior high computer science student should be able to not only come up with the steps above, but also translate them into actual code.

I have tried to remove the worst performing trends from the set, but invariably there was an ROI and Yeild penalty. Even the worst performing trends in the set I use have 7% ROI and give a couple of dozen units profit.

I'm more concerned with not having the best trends or having too much correlation between them. The main idea here is to come up with the correct set of trends. Thats why I posted here, a place with SDQL gurus.Thats why I need your help.

ps: I am aware there can be other types of aggregating strategies ( like only considering trend consensus ( at least 2 separate trends that give the same play and are not contradicted by some other trend ) ) and staking plans that can be used,

JJ, I am working at making it as easy to use as possible, I will get back when I have a version I am comfortable in sharing
 

Sydney

unread,
Feb 7, 2016, 5:16:00 PM2/7/16
to SportsDataBase
Did you try to calculate the Sharpe ratio for your trends? The Sharpe Ratio is from the financial domain. Calculate the average return per season and the standard deviation. Then divide the average return by the standard deviation. Everything greater than 1 is pretty good. You can use my tool to calculate it: https://blooming-stream-4451.herokuapp.com/ (might be slow to start).

summerlink

unread,
Feb 10, 2016, 10:47:44 AM2/10/16
to SportsDataBase
- Please read How To Use before getting your hands dirty

- Only NHL and MLB are supported so far ( only moneyline performance calculation is done so far, I still need to do spread based calculations for NBA and NFL )

- Please get back with feedback or problems you find

- Dont forget to share your own trends, as much as you are willing.

Enjoy. :)

ps: @Sidney: I have yet to even scratch the surface with all the available statistical tools that can show the actual value of the trends. That part will come in the future of course


TrendAggregator.jar
How to Use.txt

vitor marcos

unread,
Mar 19, 2016, 2:58:20 PM3/19/16
to SportsDataBase
Summer,

can you share mlb trendsyouhave on your file agregator?

summerlink

unread,
Mar 20, 2016, 4:56:36 AM3/20/16
to SportsDataBase
I'm just in the midst of crunching down some 4K + posts on SBR Forums MLB situational plays threads from the past 2 years. I already have about 60 trends and I hope I will have as many as possible to then be able to select the very very best. I will include the Sharpe Ratio and Z score to each trend and supply a new version of the aggregator in the following days.

I will most likely not be sharing my findings if they are extremely valuable in terms of ROI.
I have already given you the tool that does the "hardest" part ( integrating with SportsDatabase API and aggregating the trends ). I gave some decent NHL trends. I even told you where I'm mining the trends from.

Your job is simple, put the puzzle together. Feed your trends to the machine, and hope for positive ROI over big sample sizes.

Vitor, you will see that in life the biggest joy you will get from the things you worked the hardest for.
 



SystemSeeker

unread,
Mar 24, 2016, 5:52:15 PM3/24/16
to SportsDataBase
So am I correct in assuming this Portfolio takes all of the qualifying teams from multiple queries and gives you the total results for each play?
 For example, If I have 3 systems and they all 3 pick team A, and one system picks team B and one picks team C, and A, B and C all win, it will show a total of 5 wins (A,A,A,B,C), or just 3 wins (A,B,C)? Really liking what I've seen so far and want to make sure I have the correct instructions on how to use it.

Thanks



On Thursday, January 28, 2016 at 10:05:23 AM UTC-5, summerlink wrote:

SystemSeeker

unread,
Mar 24, 2016, 5:59:33 PM3/24/16
to SportsDataBase
Guess in a sense, what I am trying to say is, does it count the same team in multiple systems as multiple wins, or just as one win?




On Thursday, January 28, 2016 at 10:05:23 AM UTC-5, summerlink wrote:
Message has been deleted

Sydney

unread,
Mar 26, 2016, 11:43:01 AM3/26/16
to SportsDataBase
Can you tell what kind of adjustment you made? Are these trends impacted by the 3-on-3 overtime rule change?

Thanks

On Saturday, March 26, 2016 at 1:49:52 PM UTC+1, DrFill28 wrote:
This season is a severe outlier when it comes to several trends. A quarter of the way through the season my ROI was similar to yours, I was able to make the necessary adjustments after extensive trouble shooting and my roi  is at 8% this season. I would be happy to show you some of my analysis and trends.

By the way, you say you extract your trends from SBR situational threads? Could you link that forum, I have developed all of my own trends I would like to compare. 


On Thursday, January 28, 2016 at 10:05:23 AM UTC-5, summerlink wrote:

summerlink

unread,
Mar 27, 2016, 11:42:45 AM3/27/16
to SportsDataBase
@ SystemSeeker : just 3 wins.

I mentioned in a previous post that the staking plan is the simplest possible :

No matter how many trends suggest a particular play, as long as there is consensus ( that means no other trend contradicts the prediction ) the prediction will be played with a flat bet of 1 unit. Simply put, if 100 trends all say team A will win, and no other trend says it will lose, the stake on team A is 1 unit.

@ drfill : I cannot link to any other forum. Just google SBR Forum and you should be able to find it easily.

Since posting the original set of NHL trends, I have adjusted them also, I have extracted and polished the best of the best ( now there are only 24 of them )
My set of trends outperformed yours ( 17.3% ROI this season ), although I suspect the sample size and P/L are smaller ( just 232 plays all season so far ).

The results are below :

Aggregated System Stats ( all time )                               23,16    68376    1497    1065    0
-----------------------------------------------------------
Aggregated System Time Frame (20150707 - 20160707)   17,37      4618     133        99    0 


The best part of this 24 trend powerset isthe fact that since the 2010/2011 season, every single year, the ROI was somewhere between 16-26%.



SystemSeeker

unread,
Mar 27, 2016, 11:53:59 AM3/27/16
to SportsDataBase
Thanks for the clarification.
Message has been deleted

summerlink

unread,
Mar 28, 2016, 1:44:58 AM3/28/16
to SportsDataBase
Yes DrFill, I dont even consider more than 80% out of the trends from SBR. In particular the ones that filter out bad seasons ( e.g. "... and season > 2010" ), those that contain player names and team names, those that have waay too low sample size, those that are overfitted ( too many query params ) and last but not least, the ones that dont make sense.

usually when I see something I like, I try to slightly modify it to see if it drops off. If for example "some condition + p:runs>3" gives ROI 11%, and "some condition + p:runs>4"
gives ROI -2% , thats likely going to be discarded by me. Another thing I do is "relax" the trend. That means, I try to increase the sample size and see if the ROI is still good.  I found that to be possible something like 20% of the times. An example here is relaxing p:AFL to one of p:AL, p:AF, or p:FL.

The powerset of trends has all of them with at least 15% ROI (
2562 occurences ).
It has 16 trends with at least 20%ROI, which combine for 1722 occurences ( out of the total 2562 ).
The >25
%ROI group has 839 occurences and the >30 group has 305.
Message has been deleted

summerlink

unread,
Mar 29, 2016, 7:45:48 AM3/29/16
to SportsDataBase
Pretty heavy stuff you have going on DrFill :)

Some more thoughts of mine below, which I really think can be game changers when it comes to perfecting trend sets:

First, the likelihood problem. I would want to know what is the probability that a random data set would give at least x% ROI over y samples.

My proposal on how to calculate it:

For ATS betting I see the following benchmark : Coin-Flip betting on -110 odds ( both sides, equal wagers ), this would give a mean of exactly -0.05 units/bet. Since all the data points are on either -110 or +100 and all individual deviations are 105, the variance is always 105, which would make the St. deviation value = rootSquare(105) ~= 10.2.
The next step would be to use my particular model and calculate the sum for all "n" data points of (xi-mean)/stDev. Each such data point will have a weight of 1/n in the final score ( Z score )
Finally, I would need to calculate the probability for a random data set of size "n" to be inside Z score standard deviations from the mean. For the normal distribution its done here : https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule. I would need to come up with the correct mathematical model for this ATS distribution.

For ML betting, I cant really use coin flips, since the flipping has to be skewed for the favorite. So I would set the benchmark to be "optimal staking to lose minimum amount of money", something like how one will split the stake for arbitrage betting. For Example a 2:1 favorite ( -200 vs +190 ) will have close to 2/3 of the stake.
Of course, everything I said above for ATS will become significantly more complex for ML models.

The second ( clearer but much longer ) path to a correct answer would be to generate all 2^n possible combinations of game outcomes and calculate how many of these produce a smaller ROI than the one my particular model has. That would give the % of random models my set just beat. Needless to say, quite a computationally heavy approach, especially for large sample sized sets.

My second idea is to build a random trend generator. Crazy as it seems, why not have a silent process run in the background which randomly uses sdql parameters & values and spits out only >10% ROI trends ? :)







Message has been deleted

summerlink

unread,
Mar 30, 2016, 1:52:58 AM3/30/16
to SportsDataBase
Yes, I have extensively studies maths during high school and then computer science in college. Now I'm a full time software developer. And part time number cruncher. :)
Message has been deleted

summerlink

unread,
May 1, 2016, 3:46:22 AM5/1/16
to SportsDataBase
@drfill:

sumer...@yahoo.com

@ rest of you degens out there : I know its a long post, but its an interesting read, I promise :)

A month has passed in MLB and the 65 trend set I use went 41-41 with 2.44 units of profit and 2.61% ROI. A far far cry of the lifetime results of 6156-3423 (30.65% ROI &   3414.55 Units profit) 

NHL trend set ( which I actuall refined a bit since the first post ) is 141-116 this season, with 37.95 units up (12.97% ROI). Lifetime its 1425-1059 with 636 units up and 22% ROI.

If you are curious, I did followed the MLB system by placing bets at the beggining of the season ( first couple of weeks ). After being aprox 10 unit on the negative ( MLB poor start combined with NHL bad stretch, and with negative start on over/under plays in MLB given by the trends ( yes, I tried THAT too ) ), I reverted to my old habits of poor bankroll management and diverted from the SDQL systems. You kinda now what happened next... ;) ( damn you OKC :) )
I should have been only following the MLB season, and only with moneyline plays. Im not prepared yet to follow a system ( not disciplined enough ) to the bone, but hope one day will be.

As a note, I found out some problems I should have anticipated with following such a system to the nail. There are days in which I place bets at 5PM local time ( with first games starting at 8PM ) , and by the time the games actually start, the line movement makes those plays inelligible. Reversely, there are situations where line movement makes a non-play to be active. Of course, in the long run, such situations should even out to 50-50 ( wins-losses ), but I was at the bad end of multiple such situations in the beggining of the season, which added to the frustration.

Also, line shopping is critical (to say the least) ( I did anticipated this ). I recommend having at least 3 bookies ( 2 are simply not enough IMO ) where you can find the best line.

As far as the project goes, I havent worked on it much in the past few weeks, But I have some tricks up my sleeves I want to try. Coming soon... :)

Best regards
Remus


summerlink

unread,
May 23, 2016, 9:17:10 AM5/23/16
to SportsDataBase
quick update:

I started working on the random trend generator. Its easier than I first thought... ;)
Using a micro set of possible query parameters ( see below ), MLB as a sport of choice and letting the "beast" run for 30 minutes....

String[] elements1 = {"A", "H", "F", "D"};
String[] elements2 = {"p:H", "p:A", "p:W", "p:L", "p:D", "p:F"};
String[] elements3 = {"p:DAY", "p:NGT", "p:X"};
String[] elements4 = {"SG=1", "SG=2", "SG=3", "1<SG<4", "1<=SG<4", "n:SG=1"};
String[] elements5 = {"DIV", "C", "not DIV", "not C"};

 ... returned no fewer than 107 different trends with ROI of > 4% ! ( albeit 7 of them had tiny sample sizes of less than 12 ). 
39 of them have ROI > 10%

The "beast" randomly picks an element from each of the arrays ( elements1 .... elements5 ) and concatenates them to form a query. So all queries have 5 parameters. 
For Example : 
F and p:F and p:DAY and 1<SG<4 and not C 
ROI : 4.03 Profit : 1343 ( 134u ) 137W - 82L

There are 1728 total combinations for these 5 arrays and I suspect that most of them were covered since my initial result list had a lot of duplications ( :-) )

Next steps ... cover all the query parameters supported by SDQL, and introduce Z scores as filter, alongside ROI. Also will need a way to introduce trend correlation calculation into the fold.

summerlink

unread,
May 26, 2016, 4:20:03 AM5/26/16
to SportsDataBase
I want to add more aggregation strategies, so I'm rewriting the code that performs the aggregation logic.

I'm thinking about the following strategies :

1) Unanimity of n ( Un ) , which means there has to be a consensus among all trends that have an "opinion" on a game, and the consensus should be of at least n trends. 
2) Majority of p ( Mp ) , which means there has to be a majority of at least p% among all trends that have an "opinion" on a game. This means trends that contradict the majority are allowed as long as they dont surpass the 1-p percentage limit .  

Currently, the strategy used is U1 ( at least 1 trend "for" and no trends against )

What other aggregation strategies do you think can be valuable , besides Un and Mp ?

ps: Ive added average Line calculation in the mix, so Z score is next, (based on the thread that discussed Z scores).

summerlink

unread,
May 28, 2016, 9:46:08 AM5/28/16
to SportsDataBase
Things are getting really interesting... :)

After successfully implementing Un and Mp, adding Z scores to the output and enhancing the random trend generator for MLB ( now it has 7 params it considers ), I can safely chillax while the Laptop is running and doing the scavanging work. In ~ 30 minutes it spit out a half dozen trends with Z score of > 2.5, which is not bad at all in my estimations.

Of course, I would like something better than just random generation. So my next idea will be to somehow make this dummy generator a bit smarter, to be able to self-refine/adjust trends that show potential.

And the search for a way to automatically generate optimal trend sets can and should have another feature : for a given set of trends with big Z scores ( TZ ), find the optimal subset. Which means the one containing the optimal trends along with the optimal strategy ( Un for a given n, or Mp for a given p )

This is computationally extensive to say the least ( there are (2^k)*(n+p) possible combinations ). I figure that the partial trend set TZ will have at least a couple dozen trends ( k ~=40, n would take values from 1 to 5 and p in { 51, 60, 70, 80, 90} , which means somewhere along the lines of 10^13 ( does 10 trillion sound better  ?) subsets of TZ to be analyzed. To find the very "fucking" best one of course.


summerlink

unread,
Jun 1, 2016, 9:08:09 AM6/1/16
to SportsDataBase
After 2 months of MLB, my trend set performed way below the expected results from previous years. YoY results are below ( 2016 includes games up until and including 31st of May ). Is this set due for a regression to the mean ? Or did something happen in 2014 with the game itself, and most trends that held up until then start reversing ?
Who knows.... 

Season ROI Zscore AVG Profit Wins Losses Pushes
2005 29.92 8.79 103 25800 473 269 0
2006 30.68 8.74 103.4 24915 451 251 0
2007 21.93 6.57 -101.5 19895 460 301 0
2008 26.44 7.77 102 22995 469 280 0
2009 30.69 8.99 101 26485 478 258 0
2010 31.85 9.24 103.6 26815 461 251 0
2011 23 6.68 -100.8 19430 444 276 0
2012 30.52 8.82 -100.9 25697 464 247 0
2013 21.68 6.36 104.1 18756 448 303 0
2014 9.68 2.84 102.9 8451 414 350 0
2015 8.57 2.49 101.5 7367 402 343 0
2016 -6.72 -1.02 -103 -1664 96 111 0
 

Ognj3n

unread,
Jun 1, 2016, 2:28:32 PM6/1/16
to SportsDataBase
Well You seem to have picked up a lot of stuff ( queries ) that used to work, but the bookies have caught up to over time.
The betting market is a dynamic environment and if something works for a while, more and more people start using it until it influences the market prices ( lines ).

I will give You an example of a simple system that got outdated :

http://sportsdatabase.com/nhl/query?output=default&sdql=A+and+wins%3Co%3Awins+and+season&submit=++S+D+Q+L+!++


( sort it by season )

Plain betting the worse team in an away situation worked flat out until 2009 then the market caught up til 2011 and it got very unreliable.
This system can be optimized, for example if You add the opponent coming off the loss :

http://sportsdatabase.com/nhl/query?output=default&sdql=A+and+wins+%3C+o%3Awins+and+op%3AL+and+season&submit=++S+D+Q+L+!++

The ROI jumps but if You add and season it still shows the same unreliability after the lines adjusted slowly 2009>>2011.

That seems to have happened to Your dataset. The cumulative ROI goes from 30ish in 2005>>2012
to 9-10 in 2014>>15 to going kerplunk this year.

"and season" is a very important parameter to use in addition to the z-score to see if stuff makes sense in the end, and if it is valid.




Sydney

unread,
Jun 1, 2016, 5:24:38 PM6/1/16
to SportsDataBase
@summerlink: Try to calculate the Sharpe Ratio for your system by adding and season, then the formula is pretty simple: average profit by season / standard deviation profit by season. A Sharpe Ratio > 1 is considered pretty good.

For the first trend Ognj3n posted
-2203
-4886
-294
-729
-754
1158
2646
2975
3783
7259
Average: 895.5
Standard Deviation: 3433.340947
Sharpe Ratio: 0.260824664
which is poor.

The second trend is better but not great with a Sharpe Ratio of 0.55

Ognj3n

unread,
Jun 2, 2016, 4:23:59 AM6/2/16
to SportsDataBase
To make it clearer let's add "and p:H" then :

http://sportsdatabase.com/nhl/query?output=default&sdql=A+and+p%3AH+and+wins%3C%3Do%3Awins+and+op%3AL+and+season&submit=++S+D+Q+L+!++

I was trying to point out the decline over time, in the first five years You 'll find a Sharpe ratio of ~1.3, in the last five much less. That was the point.
We had to cut the sample size in half, though. Here it makes sense, but that is not often the case.

JJ 21

unread,
Jun 29, 2016, 2:06:20 PM6/29/16
to SportsDataBase
Wow, this stuff is intense guys. Very impressive. I've been following along and once in awhile just like to read back through and try to absorb the theories. One thing that can't be left out of the equation is good old fashioned handicapping. Knowledge of the game, line shopping, 'feel', homework (checking injuries, etc) -- When all of the numbers point one direction and then the capping notes lineup, that's something that (hopefully) goes above what the bookies and consensus bettors are leaning on. 

Sometimes when we see a once-strong angle that worked from say, 1989-2009, and then fell off, we also have to consider how the game has changed. There have been some pretty big shifts in the way NFL and NHL games are played between 2000 and 2016. 

Anyway, continued success in your hard work. I'm inspired to want to take a computer science course and/or programming course one of these days to advance my knowledge. 

Cheers!

summerlink

unread,
Jul 6, 2016, 11:00:11 AM7/6/16
to SportsDataBase
Completely agree with what JJ said above. 
 
Just an example : I just cannot ( in good faith ) put money on Cincy @ Cubbies, when Cody Reed ( one of the biggest fade in recent memory ) goes up against a prolific offense and arguably the best team in MLB, which just happens to plays at home after a road sweep. All these arguments led me to CHC ( which I correctly backed ), even though my Trend Aggregator said to happily put money on Cincy at +200. 
 
Nevertheless, I want to finish what I started. There will be a tool made available to this forum sometime in the near future. 
But big cautions/warnings will have to be addressed before believing it will be (anything remotely like) a money-making machine.

Steve S

unread,
Oct 24, 2016, 11:09:43 PM10/24/16
to SportsDataBase
Apologies for such a late reply, but I just tried the TrendAggregator, and got a java exception:

$ java -version

java version "1.8.0_112"

Java(TM) SE Runtime Environment (build 1.8.0_112-b16)

Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)


$ java -jar TrendAggregator.jar nhl nhltrends 

java.net.UnknownHostException: proxy.houston.hp.com

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)

        ...


Suggestions?


Thanks.

Steve

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages