is an Elo-based formula that uses Bayesian inference (Elo being what chess uses). It takes into the account the rank of the players in addition to the outcome of the game to affect their new respective ranks. Our ranking implementation is based off of the
done by the online Dominion server, isotropic. We implemented our own matching implementation based off of the formulas provided by Microsoft Research.
How do I interpret the score and rating numbers?I will use my own bot as an example:

My rating is 54.323
± 0.481. The 0.481 is the uncertainty about my rating while 54.323 is what my estimated rating is. We give the score as the conservative estimated rating, so 54.323 - 0.481 = 53.842. Ranks are based on scores. I am currently ranked 187th currently on the leaderboard.
How do you use this information to create matches?
First, we take the bot that has played the fewest number of games. Then we take the
n bots that are higher than it in rank and the
n bots that are lower than it in rank. We use the
TrueSkill formula to calculate a match quality score for each possible opponent. These scores range from 0 to 1. A game is then played between the bot with the fewest games and the bot with the highest match quality score that it was within
n ranks away from. We have several threads going pulling opponents and playing games constantly. Even if several new bots are created, they quickly catch up to the other bots in number of games, bubbling into their rank. The older bots keep continuing to get new games as well.
Why did you move away from your old ranking system?
You
may need to consider a better way to rank players - percentage of games
won is problematic (as was discovered during the Tron Google AI
challenge) and something like TrueSkill or bayeselo would be more
robust.
We received many comments along the lines of this one from day one. Honestly,
we didn't expect so many people to submit bots; win percentage seemed
to level out properly with enough games played given a smaller set of
submissions. We quickly realized that we needed to switch systems to make the rankings make sense. With TrueSkill, fewer games are required before a bot is reasonably ranked.