Chess Engine Vs Human

0 views

Skip to first unread message

Tijuana Strauhal

unread,

Aug 4, 2024, 1:38:16 PM8/4/24

to rayworjelo

Maiais an AlphaZero/Leela-like deep learning framework that learns from online human games instead of self-play. Maia is trained on millions of games, and tries to predict the human move played in each position seen.

We trained 9 versions of Maia, one for each Elo milestone between 1100 and 1900. Maia 1100 was only trained on games between 1100-rated players, and so on. Each version learned from 12 million human games, and learns how chess is typically played at its specific level.

We tested each Maia on 9 sets of 500,000 positions that arose in real human games, one for each rating level between 1100 and 1900. Every Maia made a prediction for every position, and we measured its resulting move-matching accuracy on each set.

As a comparison, we looked at how depth-limited Stockfish does on the same prediction task. We ran various depth limits, ranging from only considering the current board (D01) to letting it search 15 plies ahead (D15). Depth-limited Stockfish is the most popular engine to play against for fun (e.g. the "Play with the Computer" feature on Lichess).

Stockfish and Leela models don't predict human moves as well as Maia. Equally importantly, they don't match a targeted skill level: the curves in the graph are relatively flat across a wide range of human skill levels.

Maia is particularly good at predicting human mistakes. The move-matching accuracy of any model increases with the quality of the move, since good moves are easier to predict. But even when players make horrific blunders, Maia correctly predicts the exact blunder they make around 25% of the time. This ability to understand how and when people are likely to make mistakes can make Maia a very useful learning tool.

You can play against Maia yourself on Lichess! You can play Maia 1100, Maia 1500, and Maia 1900. Maia is an ongoing research project using chess as a case study for how to design better human-AI interactions. We hope Maia becomes a useful learning tool and is fun to play against. Our research goals include personalizing Maia to individual players, characterizing the kinds of mistakes that are made at each rating level, running Maia on your games and spotting repeated, predictable mistakes, and more.

We are going to be releasing beta versions of learning tools, teaching aids, and experiments based on Maia (analyses of your games, personalized puzzles, Turing tests, etc.). If you want to be the first to know, you can sign up for our email list here.

The code for training Maia can be found on our Github Repo. Abstract As artificial intelligence becomes increasingly intelligent--in some cases, achieving superhuman performance--there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance. We pursue this goal in a model system with a long history in artificial intelligence: chess. The aggregate performance of a chess player unfolds as they make decisions over the course of a game. The hundreds of millions of games played online by players at every skill level form a rich source of data in which these decisions, and their exact context, are recorded in minute detail. Applying existing chess engines to this data, including an open-source implementation of AlphaZero, we find that they do not predict human moves well. We develop and introduce Maia, a customized version of Alpha-Zero trained on human chess games, that predicts human moves at a much higher accuracy than existing engines, and can achieve maximum accuracy when predicting decisions made by players at a specific skill level in a tuneable way. For a dual task of predicting whether a human will make a large mistake on the next move, we develop a deep neural network that significantly outperforms competitive baselines. Taken together, our results suggest that there is substantial promise in designing artificial intelligence systems with human collaboration in mind by first accurately modeling granular human decision-making.

All our data is from the wonderful archive at database.lichess.org. We converted the raw PGN raw data dumps into CSV, and have made the CSV we used for testing available at csslab.cs.toronto.edu/datasets.

Many thanks to Lichess.org for providing the human games that we trained on and hosting our Maia models that you can play against. Ashton Anderson was supported in part by an NSERC grant, a Microsoft Research gift, and a CFI grant. Jon Kleinberg was supported in part by a Simons Investigator Award, a Vannevar Bush Faculty Fellowship, a MURI grant, and a MacArthur Foundation grant.

I find it hard and even counterproductive to learn chess by playing a computer, since I get used to a limited range of computer responses to my moves. After a year of practicing with my computer, I was startled to play actual people on Chess.com because I found that humans make moves that my computer chess games would never, ever make. I obviously need to play against human beings more often, but I wondered if there is any computer software that is good at playing surprising movies that a human would actually make, rather than always responding to my every move in a very predictable way? Thanks.

While no engine is human (duh!), a small number of engines play much more believable moves than most engines. The more natural engines tend to be commercial engines (that is, not free). I tended to like HIARCS and Delfi Trainer (DT is no longer available, except for the free demo that only plays at 1000 elo and full strength). There are a few free ones that aren't too bad, if you're willing to hunt them down and tweak them.

Something that I've heard about (but haven't tried) is the Leela Chess Zero engine (Lc0) with small neural nets made from actual amateur games. Maybe I'll check it out someday when I'm willing to mess around in Windows again.

What's especially irritating are the opening situations in which some games ALWAYS respond the same way to a given move. Whenever I separate the opposing bishop from my king with a defended knight in 3D Chess, the bishop will ALWAYS take the knight, and never choose to back off and/or stay where it is. I would think that, instead of programming every move to be played at a certain level, like say 1200, the game could be programmed to play some moves at an 1800 level and some at a 800 level (to balance out on average to a 1200 game) -- thereby maybe inserting a little variety into the performance. But I guess it's hard to make the program smart enough to play stupidly, or even inconsistently.

I wasn't able to get the Lc0/Maia combo working in Linux, so, against my better judgement, I set the combo up in Windows and got it working there in the Arena GUI. There are 9 different Maia nets (or weights), ranging from Maia 1100 to Maia 1900. Three of the Maia bots can be played on Lichess (1100, 1500, and 1900). The Maia 1100 net is probably somewhere close to the strength of Delfi Trainer 5.4 set between 1000 and 1100 elo, and Stockfish set at level 0 or 1.

Lc0 is the closest you will come to seeing a computer play like a human. The calculation-based engines (e.g. Stockfish, Komodo, Houdini) calculation millions of positions a second to ensure they never make a tactical mistake (or rather, any tactical mistake is so far in the future that no human would see it). Leela tends to favor piece activity, which leads to more human-ish maneuvers.

But to the core of your question: you should not play computers in hopes of getting better at playing humans. Computers do not play at an 800 or 1000 level. They play at a 3800-level with random odd blunders thrown in. Humans do not play like that. An 800-level human is going to make a lot of moves that do not fit well (logically) together. In general, you are not going to have someone play 15 moves of theory in the KID and then leave their queen, rook, bishops, and knights all hanging on consecutive moves.

The "bad" calculation-based engines might play like that, but the better designed "dumbed down" engines never play at a 3800 level. They implement an algorithm that simply looks at a small number of nodes (based on the strength being simulated), then they also throw in a "randomness" factor to simulate occasional blunders. I think Crafty was one of the early engines to do this.