Web Analytics Made Easy -

Maia Chess

A human-like neural network chess engine

Capturing human style in chess

Maia’s goal is to play the human move — not necessarily the best move. As a result, Maia has a more human-like style than previous engines, matching moves played by human players in online games over 50% of the time.

observer diagram
During training, Maia is given a position that occurred in a real human game and tries to predict which move was made. After seeing hundreds of millions of positions, Maia accurately captures how people at different levels play chess.

Maia is an AlphaZero/Leela-like deep learning framework that learns from online human games instead of self-play. Maia is trained on millions of games, and tries to predict the human move played in each position seen.

We trained 9 versions of Maia, one for each Elo milestone between 1100 and 1900. Maia 1100 was only trained on games between 1100-rated players, and so on. Each version learned from 12 million human games, and learns how chess is typically played at its specific level.

We measure “move-matching accuracy”, how often Maia’s predicted move is the same as the human move played in real online games.

Because we trained 9 different versions of Maia, each at a targeted skill level, we can begin to algorithmically capture what kinds of mistakes players at specific skill levels make – and when people stop making them.

In this example, the Maias predict that people stop playing the tempting but wrong move b6 at around 1500.

example board
In this position, Maia levels 1100–1400 correctly predict that White will play the tempting but wrong move b6 (the move played in the game). It threatens the Queen, but after …Qxc5 White’s big advantage is mostly gone. Maia levels 1500–1900 predict that, on average, players rated 1500 and above will play the correct bxa6, forcing the Queenside open to decisive effect.

Maia captures human style at targeted skill levels

We tested each Maia on 9 sets of 500,000 positions that arose in real human games, one for each rating level between 1100 and 1900. Every Maia made a prediction for every position, and we measured its resulting move-matching accuracy on each set.

Each Maia captures human style at its targeted skill level. Lower Maias best predict moves played by lower-rated players, whereas higher Maias predict moves made by higher-rated players.

As a comparison, we looked at how depth-limited Stockfish does on the same prediction task. We ran various depth limits, ranging from only considering the current board (D01) to letting it search 15 plies ahead (D15). Depth-limited Stockfish is the most popular engine to play against for fun (e.g. the "Play with the Computer" feature on Lichess).

We also compared against a variety of Leela chess models, ranging from the very weak 800-rated version to a 3200-rating version.

Stockfish and Leela models don't predict human moves as well as Maia. Equally importantly, they don't match a targeted skill level: the curves in the graph are relatively flat across a wide range of human skill levels.

Predicting mistakes

Maia is particularly good at predicting human mistakes. The move-matching accuracy of any model increases with the quality of the move, since good moves are easier to predict. But even when players make horrific blunders, Maia correctly predicts the exact blunder they make around 25% of the time. This ability to understand how and when people are likely to make mistakes can make Maia a very useful learning tool.

Personalizing to Individual Players

Transfer Maia
By targeting a specific player we can get even higher move prediction accuracy compared to Maias targeting just the player's rating.

In current work, we are pushing the modeling of human play to the next level: can we predict the moves a particular human player would make? It turns out that personalizing Maia gives us our biggest performance gains. We achieve these results by fine-tuning Maia: starting with a base Maia, say Maia 1900, we update the model by continuing training on an individual player’s games. This plot shows that personalized Maias achieve up to 65% accuracy at predicting particular players' moves.

The paper for this work is available as a preprint here.

Play Maia and more

You can play against Maia yourself on Lichess! You can play Maia 1100, Maia 1500, and Maia 1900.

Maia is an ongoing research project using chess as a case study for how to design better human-AI interactions. We hope Maia becomes a useful learning tool and is fun to play against. Our research goals include personalizing Maia to individual players, characterizing the kinds of mistakes that are made at each rating level, running Maia on your games and spotting repeated, predictable mistakes, and more.

This is work in progress and we’d love to hear what you think. Please let us know if you have any feedback or questions by email or Twitter.


Read the full research paper on Maia , which was published in the 2020 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2020).

You can read a blog post about Maia from the Computational Social Science Lab or Microsoft Research.

We are going to be releasing beta versions of learning tools, teaching aids, and experiments based on Maia (analyses of your games, personalized puzzles, Turing tests, etc.). If you want to be the first to know, you can sign up for our email list here.

If you want to see some more examples of Maia's predictions we have a tool here to see where the different models disagree.

The code for training Maia can be found on our Github Repo.


As artificial intelligence becomes increasingly intelligent--in some cases, achieving superhuman performance--there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance. We pursue this goal in a model system with a long history in artificial intelligence: chess. The aggregate performance of a chess player unfolds as they make decisions over the course of a game. The hundreds of millions of games played online by players at every skill level form a rich source of data in which these decisions, and their exact context, are recorded in minute detail. Applying existing chess engines to this data, including an open-source implementation of AlphaZero, we find that they do not predict human moves well. We develop and introduce Maia, a customized version of Alpha-Zero trained on human chess games, that predicts human moves at a much higher accuracy than existing engines, and can achieve maximum accuracy when predicting decisions made by players at a specific skill level in a tuneable way. For a dual task of predicting whether a human will make a large mistake on the next move, we develop a deep neural network that significantly outperforms competitive baselines. Taken together, our results suggest that there is substantial promise in designing artificial intelligence systems with human collaboration in mind by first accurately modeling granular human decision-making.


All our data is from the wonderful archive at database.lichess.org. We converted the raw PGN raw data dumps into CSV, and have made the CSV we used for testing available at csslab.cs.toronto.edu/datasets.


Picture of Reid McIlroy-Young

University of Toronto, Computer Science

Picture of Ashton Anderson

University of Toronto, Computer Science

Picture of Siddhartha Sen

Microsoft Research

Picture of Jon Kleinberg

Cornell University, Computer Science

Picture of Russell Wang

University of Toronto, Computer Science


Many thanks to Lichess.org for providing the human games that we trained on and hosting our Maia models that you can play against. Ashton Anderson was supported in part by an NSERC grant, a Microsoft Research gift, and a CFI grant. Jon Kleinberg was supported in part by a Simons Investigator Award, a Vannevar Bush Faculty Fellowship, a MURI grant, and a MacArthur Foundation grant.