Made this post on another forum. Thought I should share it here.
TL;DR: I now have a system in mind for generating a GPP portfolio, using stochastic integer optimization. All I need to continue is a model that accurately predicts not just the means/medians but the distribution of FPs.
This week didn't end so well... something from AP would have been a boost to my top lineup but still, multiple lineups in the top 1K out of 100,000 is fine by me. Finished just above even for the week.
Still, a bit frustrating as I felt I had a lot of hits on value plays, but most of my lineups had some big misses as well. I feel like intuitively I should have done better; what I'm thinking is that I need a way to spread out my combinations of players more evenly. Creating lineups by hand, one at a time, I get multiple players in my head at once. I'm making lineups and I'm trying to increase my exposure to Benny Cunningham at the same time as I'm trying to get more Frank Gore, for example. Hit on one but it's canceled out. Even if I'm aware of these combinations and trying to ensure variety I think I end up with more "clumpiness" than is optimal.
So, I've been thinking about developing a better strategy. At the same time I've been watching the developments of various DFS strategy sites over the past year and the proliferation of "lineup optimizer" tools. These all seem to be doing linear programming: given a set of linear constraints (for example, sum of salary must be less than or equal to X, sum of # of RBs must be exactly 2, etc). This type of problem can be solved easily and so the application of linear programming to DFS is cropping up as a web service available at many sites that offer fantasy-point projections.
However, the true problem of setting lineups is not a linear optimization problem, it is a stochastic optimization problem. There is uncertainty in some of the elements of the problem: the salaries are known, but the fantasy points are not. Going off the mean or median expected fantasy points produces good lineups for cash games but offers no way to truly take advantage of the tails of the expected point distributions, which is where you need to hit to win a GPP.
I have been doing some reading on stochastic linear optimization and I think I finally figured out a system for creating an optimal portfolio of GPP lineups. It is easy to implement and I am excited to try it out. The one piece missing, though, is projections in terms of probability distributions as opposed to medians or means (I need to generate Monte Carlo simulations of all fantasy points scores). I am firmly a believer in taking advantage of the work of others (not to say being a leech---standing on the shoulders of giants and all that), so I have little desire to make my own projection model when there are so many out there. Even though I have academic experience with mixed-effect regression I don't think my time would be well spent putting together my own model of fantasy point performances. So many people are doing it, that even if they aren't doing it right I think there's little edge there. The edge, I think, is in working out a better method for how to use projections; that is how to go from the projections to the lineups. If at all possible I'd like to get into this area. (and just to be clear, I am only thinking about implementing a stochastic optimization strategy for myself and collaborators, maybe posting lineups it generates, but I have no intention of making a web service in the foreseeable future).
Basically, my approach will formalize the intuition behind a "spray and pray" GPP strategy.
So, here are my options:
- Make my own model. Of course I have thought about doing this but for the reasons above it is a last resort. I do have some ideas in this area that I think are unique but I don't think it's worth my time to perfect them.
- Make a model of probability distributions using other projections scraped from the web as input. Basically, model the variance but not the means.
The following sites provide some index of variability along with their projections:- numberfire.com offers a confidence interval around their projections. This seems to be based on a normal distribution. Not empirically accurate but at least there is a metric of variability that could be used.
- bayesff.com offers full probability distributions. This is a promising approach but I'm not happy with the lack of progress from the site. All players at the same position have the same shape distribution (only the scale parameter is fitted to the player) and the week 1 projections are based only on ADP. Sure, they will get better as the season progresses, but still not quite good enough.
- fantasycruncher.com projections include columns Floor, Ceiling, and Consistency columns. This seems to be the most helpful thing out there, as far as I can tell.
I think my best approach would be the second one, and use as inputs the variability columns from fantasycruncher and numberfire. The latter I saved for every player from every week last year so there at least I have a reasonable size dataset.
Why am I posting this? I'm wondering if anyone wants to collaborate. I'm not sure if anyone else is interested in going to this level, but if you already have a projection model that does reasonably well, we could work together to create probability distributions from it (if it doesn't generate them already). Alternatively I could use help to make a more realistic model of the expected points distributions using the numberfire SD as a predictory of variability (among others). I've got a lot going on in my life so I only have the weekends really to work on this. With the right help I could get this done quickly, I think. The stochastic optimization method I want to use is deceptively simple. The hard part is getting the projections in the form of distributions in the first place.
If anyone is interested in helping, please PM me. Please don't PM me if you just want get onboard but don't plan on contributing. I promise I will come back and share the strategy once it is coded up and paired with a model.
Also, I mentioned in the other thread that I have been working on tools to scrape lots of data sources, mostly for archival purposes (what I would need the data for is only just becoming clear to me). I don't want to make this code public but I have no problem sharing it with you all. PM me a GitHub username and I will give you read access to the repository. Currently I have the following modules. Each runs from the command line and saves to a CSV or HDF file.
code:
fantasypros.season
fantasypros.week
numberfire.season
numberfire.week
4for4.season*
4for4.week*
bayesff.week*
rotoworld.depth
rotoworld.injuries
* requires login information. I would like to add FantasyCrunchers but $20/month is a bit steep for my budget. I at least grabbed the HTML from their week 1 pages for archival purposes, so I have another week to decide.
Anyways, those are my thoughts and plans after week 1. Looking forward to hear from some of you, even if it's just to mock me and my nerdiness (and my abysmal dollars per hour return of the time I've spent working on NFL DFS).