Realistic long-term winning percentage using SDQL based trends

1,012 views
Skip to first unread message

Coach Mike

unread,
Jan 9, 2017, 11:07:59 AM1/9/17
to SportsDataBase
Hello all,

I'm curious what people who have actually used these trends over 300+ plays have found to be a realistic winning percentage on standard plays (spreads and totals). I have just started messing with the SDQL stuff in the last couple of months, and so far I am hovering around 55-56% in NBA and CBB, but the sample size is still small (just over 100 games).

Thanks

Ognj3n

unread,
Jan 10, 2017, 8:56:46 AM1/10/17
to SportsDataBase
It depends on the particular query, really.

There have been other threads on that topic, try a search for "z-score" in this group, You will also find it is a bit controversial :).
The sportsdatabase is a research tool, the best there is, it gives You info about how certain criteria have performed in the past,
but the queries do not necessarily translate into future performance/hold predictive value.

There are in my experience two different approaches/schools of thought trying to solve predictivity in a dynamic environment,
keep in mind You have to evaluate minds and organisations competing, not fruitflies: 
(ok, apart from Ray Rice, Tyreek Hill ... but that's another topic)

A) Approach :small-sample size but sound logic from observing knowing the sport :

example 1 :  A friend told me he noticed Tom Coughlin who was in the 1990s on the same staff with Parcells and Belichik
seems to have their number due to being familiar with their playbook and coaching style. Let's check :
http://sportsdatabase.com/nfl/query?output=default&sdql=coach%3DTom+Coughlin+and+o%3Acoach+in+%5B+Bill+Belichick%2C+Bill+Parcells%5D&submit=++S+D+Q+L+%21++

example 2 : Andy Reid is known for being too pedantic (the kid who after the test is over still keeps on writing until the paper gets pulled away from him)
and hates time pressure, on several occasions forgetting that he has still timeouts on the last drive of the game.
But when he is prepared, he is prepared, performing exceptionally well off bye-weeks. Let's check :
http://sportsdatabase.com/nfl/query?output=default&sdql=coach%3D+Andy+Reid+and+rest%3E10%3Eo%3Arest&submit=++S+D+Q+L+%21++

example 3 : Due to both teams playing wishbone offenses, because they cannot recruit giant linemen for pass protection,
( it is not offensive, u cant put a 6-6, 330 guy into a tank or having him blocking paths on a submarine ) so they have to rely on cut-blocking running schemes.
As a result the play clock gets used up really fast, reason to expect an under. Let's check :
http://sportsdatabase.com/ncaafb/query?output=default&sdql=team%3DARMY+and+o%3Ateam%3DNAVY&submit=++S+D+Q+L+%21++

Although very small sample sizes, You cannot neglect these histories/queries when capping that particular game.
And there are bettors /  cappers who have a collection of those and have success with this.

B) Approach : large sample size, high z-score, I will use large sample sizes here for a purpose of differentiating :
When constructing a query making sure that every parameter has predictive value and when You link them, they are not antagonizing :
Let's start with say college basketball. One of many differences to the NBA  is that there are not enough athletic 7-footers for every of the 300+ teams.
Say You want Your team to have size, translating into recent :
more blocks : http://sportsdatabase.com/ncaabb/query?output=default&sdql=p%3Ablocks%3Eop%3Ablocks%2B1&submit=++S+D+Q+L+%21++
allowing fewer offensive rebounds by the opponent: http://sportsdatabase.com/ncaabb/query?output=default&sdql=po%3Aoffensive+rebounds%3Copo%3Aoffensive+rebounds&submit=++S+D+Q+L+%21++
some rim protection : http://sportsdatabase.com/ncaabb/query?output=default&sdql=po%3Afield+goals+made%3Copo%3Afield+goals+made&submit=++S+D+Q+L+%21++

So You have three criteria and we want to make sure that each of their three combinations correlates in a positive way, means translating into a higher percentage:
http://sportsdatabase.com/ncaabb/query?output=default&sdql=p%3Ablocks%3Eop%3Ablocks%2B1+and+po%3Aoffensive+rebounds%3Copo%3Aoffensive+rebounds&submit=++S+D+Q+L+%21++
http://sportsdatabase.com/ncaabb/query?output=default&sdql=p%3Ablocks%3Eop%3Ablocks%2B1+and+po%3Afield+goals+made%3Copo%3Afield+goals+made&submit=++S+D+Q+L+%21++
http://sportsdatabase.com/ncaabb/query?output=default&sdql=po%3Aoffensive+rebounds%3Copo%3Aoffensive+rebounds+and+po%3Afield+goals+made%3Copo%3Afield+goals+made&submit=++S+D+Q+L+%21++

And in the final step You want all three combined to translating into a higher percentega than all queries before :
http://sportsdatabase.com/ncaabb/query?output=default&sdql=p%3Ablocks%3Eop%3Ablocks%2B1+and+po%3Aoffensive+rebounds%3Copo%3Aoffensive+rebounds+and+po%3Afield+goals+made%3Copo%3Afield+goals+made&submit=++S+D+Q+L+%21++

You see how the number of checks You have to make is the factorial of the criteria number, i.e. 6 checks for 3 criteria, 24 for four, 120 for 5 ...
This is why people using this approach is incompatible with a high number of criteria,in practical use 4 or 5 is the limit.
To cut the story short You can get queries like these ( line!=None is just for cleanup ) :
http://sportsdatabase.com/ncaabb/query?output=default&sdql=line%21%3DNone+and+A+and+o%3Arank%3DNone+and+p%3Ablocks%3Eop%3Ablocks%2B1+and+po%3Afield+goals+made%3Copo%3Afield+goals+made+and+po%3Aoffensive+rebounds%3Copo%3Aoffensive+rebounds+++&submit=++S+D+Q+L+%21++
sample size 2100 , approximate z-score 1137-918 / sqrt ( 1137+918 ) = 4,8
Further for reliability break it down for a season by season check :
http://sportsdatabase.com/ncaabb/query?output=default&sdql=season+and+line%21%3DNone+and+A+and+o%3Arank%3DNone+and+p%3Ablocks%3Eop%3Ablocks%2B1+and+po%3Afield+goals+made%3Copo%3Afield+goals+made+and+po%3Aoffensive+rebounds%3Copo%3Aoffensive+rebounds+++&submit=++S+D+Q+L+%21++

I have seen both approaches work, so the proper answer is :
the more You know the better, in the sense of risk diversification, the side that has the advantage always looks for ways to diversify risk.
In the long term 300 plays are nothing, queries that have worked for years, have stopped because books have caught up, and there are new ones that have come up.

And a better knowledge of the SDQL certainly helps,
For example if You have a NBA overs query ... now that the average total has skyrocketed in two years from 200 to 2004 last year to 208,
knowledge of summatives will make it easy for You to adjust it against the A(total@season), instead of a fixed number like 200.


Russ

unread,
Jan 10, 2017, 9:07:21 AM1/10/17
to sportsdatabase
That was an extremely good piece, and enjoyed reading it. Thanks for sharing, although I don't agree with all of it, I wouldn't totally disagree with any of it. probably the best, and most detailed post I have seen here, regarding this subject, well written, well thought, well illustrated. Great job!


From: "Ognj3n" <ogn...@gmail.com>
To: "sportsdatabase" <sportsd...@googlegroups.com>
Sent: Tuesday, January 10, 2017 8:56:46 AM
Subject: Re: Realistic long-term winning percentage using SDQL based trends
--
You received this message because you are subscribed to the Google Groups "SportsDataBase" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sportsdatabas...@googlegroups.com.
To post to this group, send email to sportsd...@googlegroups.com.
Visit this group at https://groups.google.com/group/sportsdatabase.
For more options, visit https://groups.google.com/d/optout.

Coach Mike

unread,
Jan 10, 2017, 6:28:58 PM1/10/17
to SportsDataBase

Thanks, Ognj3n. That was very helpful. I will look through previous posts on z-score, but what would generally be considered good? Also, can you elaborate what you mean by checks?

For anyone that has tracked their actual record following their SDQL trends for a long period of time, I would still like to know what your recorded winning percentage has been.

Ognj3n

unread,
Jan 11, 2017, 7:20:38 AM1/11/17
to SportsDataBase
To follow on, this z-score ( https://en.wikipedia.org/wiki/Standard_score ) we use simplified check if a query carries predictive value along with sample size.
It is not mathematically precise as we do not examine fruit flies as I said, but roughly right is good enough here for starters.
As what is considered good, eh, the more the better, try to have 3+ z and 300-400+ samples for starters.

I will walk You through these checks using Your own queries from the other post :  10< p:ou margin < 27 and conference = AAC 
We got three parameters here looking for unders :
A) conference=AAC
B) 10< p:ou margin
C) p:ou margin < 27

First checks percentages have to be below the database average of 49,7% ( http://sportsdatabase.com/ncaabb/query?output=default&sdql=W&submit=++S+D+Q+L+%21++ )
A)     45,9%                        good
B)     49,5%                        good ( not great but ok )
C)     49,8%                        bad ( discard )

Next round of checks would be if the following combinations increase the percentage :
AB)   38,4%                       good
AC)                discarded
BC)                discarded

I would cut the query to 10< p:ou margin and conference = AAC
It is a good query but i would look for better parameters than p:ou margin>10, there are better ones out there.
Also the sample size is kind of small but definitely something to keep an eye on, the best thing about the database is, it is growing in sample size from year to year.

10 < p:ou margin < 19 and conference = A10 and date > 20110808
date>20110808 to me looks like pure backfitting,
conference = A10 comes down essentialy to non-conference games if You look at the ATS
intra-conference games are doubled giving 50%
so let's have a look : http://sportsdatabase.com/ncaabb/query?output=default&sdql=++conference+%3D+A10+and+o%3Aconference%21%3Dconference&submit=++S+D+Q+L+%21++
49,7% ... discard , try to construct a query for the Big East instead, for example.

>I would still like to know what your recorded winning percentage has been
Honestly, i don't think You will get a trustworthy answer neither here nor anywhere else on the internet ( superheros all over the place, as we all know ).
It is roughly equivalent to asking someone to go to the ATM and post a scan of the balance ...
The closest thing to an independently recorded win pct. would be the Covers contests, maybe, but not everyone is betting flat.

Don't get discouraged ever, the more research You invest, the more You learn, the more confident You will be in the market, and.

@Russ

Thanks a lot, I wish i took the time correct punctuation, some of it is ayayayay :/

Coach Mike

unread,
Jan 11, 2017, 4:32:26 PM1/11/17
to SportsDataBase
Thanks a lot. Very helpful stuff here.

Edwin Meyer

unread,
Jan 12, 2017, 5:34:41 AM1/12/17
to SportsDataBase
Coach Mike,  

Here is my two cents...

The SDQL is a tool and the results will vary widely depending on the skill of the user.  Blaming the SDQL for a bad result is like blaming the English language for a terrible poem.

The key to handicapping success is to COMBINE analytical skills with the SDQL.

At the SDQL seminar in Vegas, Joe Meyer said something like, "Every handicapper is looking for gold; the SDQL doesn't give you the gold, it just gives you a better shovel than everyone else."

The SDQL tagline says it well, "Agile Access to Sports Data"

Cheers!

Ed

Ognj3n

unread,
Jan 12, 2017, 6:32:33 PM1/12/17
to SportsDataBase
I agree with the post above, but i must add  here :
... "skill of the user" ... "analytical skills" ...

I am confident and sure  that the "handicapping skill" is a learned skill, and can be transfered as such, to someone willing to learn.
Nobody is born as a natural handicapper, I do not believe in geniuses, all I know some people are more driven by curiosity than others.
Some of that "skill" can be attributed to experience, but experience is almost like a large database of saved trends.
So for the transfer of that "skill", knowledge of the SDQL by both sides can be very helpful.

You can write a half page article about the Coors field in MLB, the altitude and the thinner air,
and how the management of such a club is inclined to acquire power hitters and pitchers with higher ground ball/fly ball ratios to use that advantage ...
Or query "team=Rockies and H" and save the trend, see whether it is still over/underrated by a season-by-season breakdown,
and use it as a LEGO brick for a more sophisticated query.
For example :
We got Coors field : http://sportsdatabase.com/mlb/query?output=default&su=1&ou=1&sdql=H+and+team%3DRockies&submit=++S+D+Q+L+%21++
and another over trend, where both teams combined have left a lot of players on base in the last game :
http://sportsdatabase.com/mlb/query?output=default&su=1&ou=1&sdql=H+and+p%3ATLOB%2Bop%3ATLOB%3E18++&submit=++S+D+Q+L+%21++
and see if when You combine trends the percentage increases :
http://sportsdatabase.com/mlb/query?output=default&su=1&ou=1&sdql=H+and+team%3DRockies+and+p%3ATLOB%2Bop%3ATLOB%3E18++&submit=++S+D+Q+L+%21++

In handicapping You want several trends to points strongly to Your pick.
I will use an anecdote, last World Cup final in soccer.
Unfortunately no SDQL available, but You will be able to follow.
In conversations with friends, I got asked what is the smart play,
and i said look since 1990 the WC finals results were 1-0,0-0,3-0,2-0,1-1,0-0.
Strong under trend, because its a 1-game playoff, who makes the first mistake loses,
such is the nature of the sport. Soccer is a fantastic sport to play, but a pain to watch.
Teams coming from different continents/sport cultures, unfamiliar with the other team, makes them even more cautious.
Besides, the most dangerous striker on the field , Leo Messi, proabably won't show up,
He has developed a choking history in big games, Argentina is now at, hm, a 7 lost finals run.
Taking all into account the under seemed to me to be the right play.
My answer was play u2.5 at 1.60, u1.5 looks good, Messi not to score at 1.50, whatever feels good for You.
So after the game (0-0 in regular time) i asked around, who's up for a drink? (It's summer, You watch it downtown with friends)
Guess what, out of 20ppl ten played on Argentina, ten on Germany to win in regular time anyway.
Of course they did, and You can't blame them.

My point is, what people call "skill" is actually curiosity.
Being curious to know , what really works and what doesn't, sooner or later You'll learn or find out.
The queries can point You to trends that You will understand at a closer look,
and confirm observations You made by watching. It is closely related.

smc chandler

unread,
Mar 11, 2019, 9:31:17 PM3/11/19
to SportsDataBase
wow, this is great stuff......... thank you :)

i think people over-think things. i can't see any reason why people would go OVER on army-navy. seems like free money the way they play and the key as mentioned by great poster is it's "2 teams playing clock-eating, lowish scoring style.. stout run D too)

reminds me of FSU-MF for a fair number of years. they couldn't really get their offenses going at all (against each other and to a large degree overall). games went UNDER even with alot of special teams big plays or points...... but people tell me it's square analysis to suggest "this game hasn't sniffed this year's TOTAL in prior years even with miami having devin hester as best KR/PR ever - and he's gone)

smc chandler

unread,
Mar 12, 2019, 10:46:34 PM3/12/19
to SportsDataBase

something that's related to this that i think is interesting is this,

on almost every sport i've looked at, it seems like this is universally true. LOW TOTALS go UNDER. HIGH TOTALS go OVER..... not sure how strong this is statistically but i see it over and over again. 

but basically my gut instinct would be to go the other way. it seems much more natural to go the opposite i.e. OVER on a 40 point college total.

anyway, i think with army-navy Total that the market would think it was weird if the total came out at 32 points or something like that. so it comes out at 47 (i think it did for a few minutes) and works it way down very fast at first, followed by a slow decline to game day) 
Reply all
Reply to author
Forward
0 new messages