In the previous post, we looked at a heuristic-based general game player, which worked for single player and two player zero-sum games. There are a few problems with this approach though: We need to come up with a good heuristic for the game at hand but more importantly, heuristics exploit local properties of states (properties that do not depend on the game tree as a whole) and for many games, there may not be any correlation between these local properties and global success. It would be better if we didn’t have to come up with a heuristic; if the heuristic was somehow built into the search algorithm itself i.e., the algorithm were itself capable of ordering the moves, and the ordering depended on the whole game tree instead of local state properties.
In this post, we discuss Monte Carlo Search (MCS), the simplest algorithm in the class of probabilistic search algorithms. The basic idea behind MCS is quite simple: To estimate the value of a non-terminal state, we make some probes (by selecting random moves for all players) from that state till the end of the game. The estimated utility for that state is then taken as the average reward across all probes. These estimated utilities can then be used for comparing states and selecting actions.
Typically, MCS is used as an evaluation function for heuristic search i.e., you explore the tree until some fixed depth is reached and then estimate the value of non-terminal states by making a fixed number of random probes from each non-terminal to a terminal state and computing the average rewards.
Assuming the same game interface that we used in the previous post:
Let’s see how MCS can be implemented:
Let’s take a moment to appreciate what this algorithm does. It works for all games (true general game playing) which can implement the given interface and it doesn’t require us to supply a heuristic function i.e., it just works out of the box. Effectively it chooses the move which has the highest chance of leading to a win, if everyone was playing randomly.
However, that comes with its own problems. A major problem with MCS is that it assumes that the other players are playing randomly, which is obviously a false assumption. Another problem is that it does not consider the structure of the game tree while making random probes. However, even with these problems, MCS is a very powerful technique and provides significant improvements in gameplay. It can even be combined with other heuristics to get the best of both worlds.
In the next post, we’ll discuss a more powerful variation of MCS, called Monte Carlo Tree Search (MCTS), which addresses some of its problems.