Here is the reddit thread that prompted the creation of this group:
http://www.reddit.com/r/dfsports/comments/2qtwmm/looking_to_team_up_on_a_dfs_statisticswebsite/
Beyond projections
In the DFS world, everyone thinks they have what it takes to come up with better projections than everyone else. As a result, the market is saturated with projections. Many of these are publicly available. Therefore, any value added by coming up with projections is only marginal, as it must be compared to the value you could get for free from others' projections. Furthermore, while the advantage of more accurate projections is self-evident, I think it is smaller than people realize. Having a slightly more accurate projection will lead to a higher win rate, but given the volatility of fantasy sports, it may not be that much higher. For these reasons, I want to pioneer a strategy that goes beyond projections. I want projections to be not the endpoint of a DFS strategy, but rather the starting point.
What is beyond projections, you ask? Probability distributions. Whereas projections may predict the most likely outcome, a probability distribution estimates the likelihood of all possible outcomes. Having probability distributions on hand allows one to come up with more advanced strategies than the status-quo approach of maximizing a lineup's projected score. In particular, I believe a distribution-based approach holds particular promise for constructing portfolios consisting of many unique lineups in a single slate. I also believe this holds more promise for GPPs than for cash games.
Note: I don't intend to discourage anyone from working on projections (in the traditional sense of producing a single number per player). I can certainly see the appeal in that. But I want to make it clear that better projections is explicitly not the purpose of this group. If you produce better projections, we will use them, but producing better projections is not the reason I put this group together.
A note about different sports
I believe this strategy is sports-independent. However, as a disclaimer I will add that my interest lies mainly in the NFL. While I believe we can keep many of the analytical tools sport-agnostic, this will not always be possible. For example, when it comes to procuring data, there will not be a sports-independent solution. In these cases, my personal efforts will focus on NFL. However, I don't mean that to be part of this group's mission; I encourage everyone to produce sport-agnostic work where possible, and otherwise to focus on their personal interests.
A portfolio strategy for GPPs
A portfolio strategy is the approach of entering many unique lineups in a single slate (not necessarily on a single site). There are many informal ways of describing this strategy. The way I like to think of it is that we are searching for players that have huge games. On any given slate, there are only so many of these players. To win a GPP, you need multiple in a single lineup. So, a portfolio strategy might be to consider all the players that could potentially have huge games, and enter all (or many) possible combinations of these players. The hope is that eventually you will "hit" with at least one lineup. You only need one first-place finish to cover thousands of entries.
That is a basic approach. Of course, it will get much more complicated, as you estimate probability distributions for every player. It is no longer just about just having some pool of players that could hit big, but taking into account each player's particular likelihood of doing so. And, importantly, understanding the way different players' outcomes are correlated with each other.
Goals
The dream of many an analytically-minded DFS player is to automate lineup creation. That is not our immediate goal, for two reasons: first, to stay realistic. If we start with a goal that is too far off, we will get discouraged along the way. Better to work incrementally toward smaller goals. Second, the flip side of that, is to mesh organically with our existing DFS workflows. We're not looking to replace these workflows, but rather to augment them. With that in mind, my general, long-term goals are
- to develop strategies for estimating the probability distributions of player outcomes;
- to use these distributions to generate useful information, in table and graphical form, as aids in lineup construction;
- to develop strategies for constructing lineup portfolios; and
- to generate useful information, in table and graphical form, about lineup portfolios.
Steps for action
I will post a separate discussion thread for each of these.
- Secure data. There are four categories of data I have considered, in order of importance:
- Performances. The statistics from the players' actual performances. In many cases the fantasy-point scores will be sufficient, but complete data sets should be sought in order to produce tools that can be applied to any arbitrary fantasy-point system.
- Projections. If we are to consider projections as the starting point, we need projections. Current projections are easy enough, but if we want to fit our model, we'll also need historical projections. This is the challenge. Since they are of limited availability, it is of critical importance to begin archiving as many projections as possible.
- Salaries. Past salaries, for this year at least, seem to be available on a few different sites. Still, we should archive them so we're not at the mercy of these sites.
- Owned percentages from past games. This is also something that may not be possible to get historically, but it is probably possible to begin archiving them. I haven't yet brought up ownership percentages, but it is something we should think about at some point.
- Develop statistical models for estimating fantasy-point probability distributions. This is the hardest part, I think, and the part that will require the most brainstorming and trial-and-error. This is where we need people with training in statistics.
- Develop and test strategies for constructing portfolios.
- Develop a programming API for manipulating data and hypothetical lineups. My language of choice is Python, and there is already some infrastructure in the Python ecosystem. But I don't think there's wrong with exploring other directions if someone wants to.
- Develop visualizations and interactive tools. I've already started this with IPython widgets, although the code is too messy at this point.
- Put these interactive tools online. The other action items I mentioned in no particular order, but this one I think goes last. I don't want to put anything online until we have figured out what we're offering to the casual user. But there's no harm in putting up prototypes of various tools for our internal use, so that people who are interested in helping out but are not developers can see what we're doing. Not a priority for me, but someone may be interested in setting this up.
I'll leave this thread open to discuss any of these points I've raised or have other big-picture discussions.