I think Avalon is the best hidden identity game ever created, so much so that I ran an AI tournament for it, and wrote up some thoughts on the state of AI research around it given a recent academic paper.
With the Merlin/Assassin variant in Avalon, both teams have something to hide and something to figure out, even if your identity is known you can still be trying to solve the puzzle up until the last minute. Merlin knows the Spies and is trying to help the other Resistance players identify them, but if the Assassin can guess which player is Merlin at the end, the Spies still win.
Mark is the Assassin, trying to guess the identity of the character Merlin at the end of the game, who knew all the Spies. Mark mentions to the other Spies that Henry went out of his way to avoid having him on a team, and a battle of wits begins. Henry blurts out: "Oh, actually the reason I did that was because... oh shoot, I shouldn't talk to help the Spies." This is, of course, a classic Merlin quadruple-bluff gambit.
Bluff Level One: Henry could have genuinely forgotten he shouldn't talk in the endgame when it can only help the Spies guess who Merlin is, and accidentally made an admission that only a non-Merlin player would make, so Mark should conclude Henry is not Merlin.
Bluff Level Two: Henry correctly deduced that Mark would not believe for a minute that Henry would make a mistake like Level One, therefore Henry must have made the statement on purpose. Because the nave interpretation of the statement makes it less likely Henry is Merlin, he must actually be Merlin trying to pretend to have different reasons than his actual reasons, so Mark should guess Henry as Merlin.
Bluff Level Three: Henry correctly deduces that Mark would not believe that Henry would make a mistake like Level One, and also deduces that Mark would correctly deduce that Henry would expect him to see the action as deliberate. This means both parties are aware that their counterpart knows that Henry planted the statement as a trap, meaning we can not be in Level Two, where one party hoped the other was unaware of the trap being set. Once you know a trap was planted with the expectation of being spotted, the obvious conclusion is that Henry leaked the statement on purpose to look like a Merlin trying to avoid suspicion, so Mark should not guess Henry as Merlin.
Bluff Level Sixteen: Mark starts with the initial conditions of the universe and quickly constructs a high fidelity simulation leading up until the present moment and deduces, perhaps incorrectly, that Henry was self-evidently planting a fifteen level-deep bluff to convince Mark that he was not Merlin, and that the winning play is to go sixteen levels deep and see through the bluff, so Mark correctly assassinates Henry for the win.
The complexity of the hidden information characters are trying to uncover and share can also be ratcheted up, as seen in the following diagram (this diagram skips the Morgana Spy role, which gives Percival another ambiguous option for who Merlin could be):
The game can also be played with the base Resistance set with no special roles other than the Spies, with Plot cards that give information to team leaders at different points of the game, and with some extra modules like the Hunter module, but the Avalon variant is the most balanced and strategically deep of the variants, in my experience.
Most of this piece is going to be spent talking about what we can learn about the game from AI simulations exploring its fundamentals, which is more objective, but also limited in what we can say definitively.
Their strongest AI, DeepRole, evolves strategies through counterfactual regret minimization. It was tested against bots that play randomly, a LogicBot programmed to avoid making teams as Resistance that must have Spies, and bots that derived a strategy from a Monte Carlo Tree Search. The DeepRole AI proved superior to the other AIs.
Rejecting the fifth proposal. Rejecting the fifth proposal is a game-losing condition for Resistance. In one of their sample games, the Resistance team loses as a result of DeepRole Resistance bots voting this way.1
Rejecting teams the entire table knows are good. This can happen in situations where if any Spies had been on the team, the game should already be over due to the Spies being able to fail a third mission.
Taking actions as a Spy that are never optimal for Spies. In one game, the AI succeeds as the only Spy on a mission that ended up being the third successful mission, forcing the game to a Merlin guess.
It sometimes makes sense to bluff a success on early missions or when paired with another Spy, but there is zero benefit to bluffing over failing here, as the game is over the moment the bluff could have any impact.
The ProAvalon website describes some of these AI behaviors as being bannable offenses if done by a human player, as they are considered to be game-throwing. At the very least, I would predict that an AI which did not make visibly poor Resistance plays would fare better alongside human players than one that did, holding all else equal, and that allowing a third mission to succeed as a lone Spy should fall out of a dominant strategy long-term.
Given the existence of a shared public communication channel, a Resistance player could suggest that all players make use of a public/private key encryption algorithm, generate a public/private key pair for themselves, and announce their public keys over the shared public channel.
Using full-blown cryptography to break the game is obviously a degenerate example, but I think there is a conceivable Nash Equilibrium where the Resistance team is aware of this strategy, exploits it ruthlessly, and no one has an incentive to switch off it as it always benefits the majority.
This is optimal against nave Spies that always fail, but questionable in other contexts. My intuition is that the benefits of changing successful teams (outside of the two fail mission) are usually outweighed by the risks of switching off a good team. There may be outlier scenarios, like Percival scrapping a team by executive fiat that has to contain both Merlin and Morgana.
Despite avoiding some of the unusual failure modes of DeepRole, the RandomPlayer bot clearly underperforms DeepRole in its ability to coordinate a Resistance team or exploit information in the 5 player Avalon variant.
This also demonstrates that the game is most favorable to Resistance at 5/7/9 players, where mission sizes and the ratio of Spies to Resistance are most favorable. Our group tends to add additional mechanics like Lady of the Lake or the Trapper module at 8 or 10 players, to nudge the balance back towards the Resistance team.
Also for reference, for a while I recorded data on games I played in real life with different modules and game sizes, in a group with a fairly stable group of players. None of these variants were covered by either simulation discussed in this article.
10 players having the highest winrate is counter-intuitive, and may be due to self-selecting variants for balance; a 10 player game is strictly harder for Resistance than the 9 player variant due to the presence of an additional Spy.
Combines the strategies of AggressiveMerlin and MerlinDetector normally. If it is currently the most accurate player at opposing the Spies, it makes an incorrect decoy vote or proposal so someone else ends up as the most likely Merlin target at the end of the game.
I coded all of the above implementations for this analysis with insights from other strategies, but this bot is taken unmodified from our AI tournament champion and fellow Avalon enthusiast Arman Erfanar, who also reviewed this post.
The interaction between some of these bots, like MerlinDetector and AggressiveMerlin, is understated by these overall win rates which average out impacts on both teams because team composition is mixed between the bots. When a single bot type controls an entire team, the differences can be quite large. The table below shows Resistance win rates for the bot in the row when playing the bot in the column as Spies, in fully partitioned games by team.
AggressiveMerlin players bump that Resistance win rate up to 43.1% of the time against SelfPromotingPlayers who are Spies. A single player always voting and proposing teams correctly as Merlin causes a big swing in win rates over the course of the game, even when no other Resistance player is paying attention to who might be Merlin, and just leaning into successful teams.
When AggressiveMerlin players are pitted against MerlinDetectorPlayers as Spies, AggressiveMerlin players only win as Resistance 4.2% of the time due to the Spies correctly guessing Merlin. AggressiveMerlin bots playing Merlin are essentially annihilated by an Assassin bot that simply tracks the most accurate Merlin candidate.
The ArmanBot annihilates every other bot type when playing as Spies solely due to being an AggressiveSpy player that pushes teams with Spies; there would be a similar dynamic of a detection arms race with bots trying to detect this behavior (which the ArmanBot itself does). Its Resistance win rates are more complicated, discussed later.
An important caveat: The bots above are mostly proof of concept implementations, I think the algorithms are very exploitable (if you know Merlin is trying to hide, do you just always take the second most likely player, or detect votes made to duck detection?), but I would suspect that optimal play approaches something similar in a mixed strategy, where Merlin is in an arms race against the Assassin to hide their actions, and the Assassin is trying to guess who is acting within the threshold of random noise.
7fc3f7cf58