The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified except by numbers assigned for the contest.
The competition was held by Netflix, a video streaming service, and was open to anyone who is neither connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) nor a resident of certain blocked countries (such as Cuba or North Korea).[1] On September 21, 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%.[2]
For each movie, the title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of the customers, "some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates."[2]
There was some controversy as to the choice of RMSE as the defining metric. It has been claimed that even as small an improvement as 1% RMSE results in a significant difference in the ranking of the "top-10" most recommended movies for a user.[6]
Prizes were based on improvement over Netflix's own algorithm, called Cinematch, or the previous year's score if a team has made improvement beyond a certain threshold. A trivial algorithm that predicts for each movie in the quiz set its average grade from the training data produces an RMSE of 1.0540. Cinematch uses "straightforward statistical linear models with a lot of data conditioning."[7]
Using only the training data, Cinematch scores an RMSE of 0.9514 on the quiz data, roughly a 10% improvement over the trivial algorithm. Cinematch has a similar performance on the test set, 0.9525. In order to win the grand prize of $1,000,000, a participating team had to improve this by another 10%, to achieve 0.8572 on the test set.[2] Such an improvement on the quiz set corresponds to an RMSE of 0.8563.
As long as no team won the grand prize, a progress prize of $50,000 was awarded every year for the best result thus far. However, in order to win this prize, an algorithm had to improve the RMSE on the quiz set by at least 1% over the previous progress prize winner (or over Cinematch, the first year). If no submission succeeded, the progress prize was not to be awarded for that year.
To win a progress or grand prize a participant had to provide source code and a description of the algorithm to the jury within one week after being contacted by them. Following verification the winner also had to provide a non-exclusive license to Netflix. Netflix would publish only the description, not the source code, of the system. (To keep their algorithm and source code secret, a team could choose not to claim a prize.) The jury also kept their predictions secret from other participants. A team could send as many attempts to predict grades as they wish. Originally submissions were limited to once a week, but the interval was quickly modified to once a day. A team's best submission so far counted as their current submission.
Once one of the teams succeeded to improve the RMSE by 10% or more, the jury would issue a last call, giving all teams 30 days to send their submissions. Only then, the team with best submission was asked for the algorithm description, source code, and non-exclusive license, and, after successful verification; declared a grand prize winner.
The contest would last until the grand prize winner was declared. Had no one received the grand prize, it would have lasted for at least five years (until October 2, 2011). After that date, the contest could have been terminated at any time at Netflix's sole discretion.
By October 15, there were three teams who had beaten Cinematch, one of them by 1.06%, enough to qualify for the annual progress prize.[9] By June 2007 over 20,000 teams had registered for the competition from over 150 countries. 2,000 teams had submitted over 13,000 prediction sets.[3]
On November 13, 2007, team KorBell (formerly BellKor) was declared the winner of the $50,000 Progress Prize with an RMSE of 0.8712 (8.43% improvement).[14] The team consisted of three researchers from AT&T Labs, Yehuda Koren, Robert Bell, and Chris Volinsky.[15] As required, they published a description of their algorithm.[16]
The 2008 Progress Prize was awarded to the team BellKor. Their submission combined with a different team, BigChaos achieved an RMSE of 0.8616 with 207 predictor sets.[17]The joint-team consisted of two researchers from Commendo Research & Consulting GmbH, Andreas Tscher and Michael Jahrer (originally team BigChaos) and three researchers from AT&T Labs, Yehuda Koren, Robert Bell, and Chris Volinsky (originally team BellKor).[18] As required, they published a description of their algorithm.[19][20]
This was the final Progress Prize because obtaining the required 1% improvement over the 2008 Progress Prize would be sufficient to qualify for the Grand Prize. The prize money was donated to the charities chosen by the winners.
On July 25, 2009 the team "The Ensemble", a merger of the teams "Grand Prize Team" and "Opera Solutions and Vandelay United," achieved a 10.09% improvement over Cinematch (a Quiz RMSE of 0.8554).[21][22]
On June 26, 2009 the team "BellKor's Pragmatic Chaos", a merger of teams "Bellkor in BigChaos" and "Pragmatic Theory", achieved a 10.05% improvement over Cinematch (a Quiz RMSE of 0.8558). The Netflix Prize competition then entered the "last call" period for the Grand Prize. In accord with the Rules, teams had thirty days, until July 26, 2009 18:42:37 UTC, to make submissions that will be considered for this Prize.[23]
The final standing of the Leaderboard at that time showed that two teams met the minimum requirements for the Grand Prize. "The Ensemble" with a 10.10% improvement over Cinematch on the Qualifying set (a Quiz RMSE of 0.8553), and "BellKor's Pragmatic Chaos" with a 10.09% improvement over Cinematch on the Qualifying set (a Quiz RMSE of 0.8554).[25][26] The Grand Prize winner was to be the one with the better performance on the Test set.
On September 18, 2009, Netflix announced team "BellKor's Pragmatic Chaos" as the prize winner (a Test RMSE of 0.8567), and the prize was awarded to the team in a ceremony on September 21, 2009.[27] "The Ensemble" team had matched BellKor's result, but since BellKor submitted their results 20 minutes earlier, the rules award the prize to BellKor.[22][28]
The joint-team "BellKor's Pragmatic Chaos" consisted of two Austrian researchers from Commendo Research & Consulting GmbH, Andreas Tscher and Michael Jahrer (originally team BigChaos), two researchers from AT&T Labs, Robert Bell, and Chris Volinsky, Yehuda Koren from Yahoo! (originally team BellKor) and two researchers from Pragmatic Theory, Martin Piotte and Martin Chabbert.[29] As required, they published a description of their algorithm.[30]
On March 12, 2010, Netflix announced that it would not pursue a second Prize competition that it had announced the previous August. The decision was in response to a lawsuit and Federal Trade Commission privacy concerns.[31]
Although the data sets were constructed to preserve customer privacy, the Prize has been criticized by privacy advocates. In 2007 two researchers from The University of Texas at Austin (Vitaly Shmatikov and Arvind Narayanan) were able to identify individual users by matching the data sets with film ratings on the Internet Movie Database.[32][33]
On December 17, 2009, four Netflix users filed a class action lawsuit against Netflix, alleging that Netflix had violated U.S. fair trade laws and the Video Privacy Protection Act by releasing the datasets.[34] There was public debate about privacy for research participants. On March 19, 2010, Netflix reached a settlement with the plaintiffs, after which they voluntarily dismissed the lawsuit.
Netflix awarded a $1 million prize to a developer team in 2009 for an algorithm that increased the accuracy of the company's recommendation engine by 10 percent. But it doesn't use the million-dollar code, and has no plans to implement it in the future, Netflix announced on its blog Friday. The post goes on to explain why: a combination of too much engineering effort for the results, and a shift from movie recommendations to the "next level" of personalization caused by the transition of the business from mailed DVDs to video streaming.
Earlier this week, Netflix, the online movie rental service, announced it will award $1 million to anyone who can come up with an algorithm that improves the accuracy of its movie recommendation service.
JB: We trained Cinematch on 100 million ratings and asked it to predict what the other 3 million would be. We compared ours with the actual answers. We do that every day. We get about 2 million ratings per day and we track the daily fluctuations of the system. We expect to measure submissions to the contest [the same way]. The actual prize dataset is 103 million ratings, but we only released 100 million of them.
JB: If you go to the website and rate 100 movies for us, the red stars shown under each movie are personalized for you. We use these ratings to adjust the prediction away from the average recommendation, according to your taste. A three-percent difference, for instance, might make a difference of one-quarter star. We have millions of people rating millions of DVDs, and that quarter-star difference helps us sort the list. The individual movie recommendation might not get so much better, but, overall, the set of recommended movies is very different. Move a battleship a little bit, and it makes a huge difference.
A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. And, they gave us the source code. We looked at the two underlying algorithms with the best performance in the ensemble: Matrix Factorization (which the community generally called SVD, Singular Value Decomposition) and Restricted Boltzmann Machines (RBM). SVD by itself provided a 0.8914 RMSE (root mean squared error), while RBM alone provided a competitive but slightly worse 0.8990 RMSE. A linear blend of these two reduced the error to 0.88. To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.
90f70e40cf