It’s been a long time since my last post. Lets hope this one goes alright. Alpha Beta Search is typically used for two-player competitive (fixed sum) games and is a variant of naive Minimax search. It can save almost 99.9% of the work while still computing the same move as found by the minimax search. Lets look at how it works.
There are two players, lets call them max (maximizing player: you) and min (minimizing player: your opponent). Instead of using separate position evaluation scores for both players, it is quite common to use a single score with greater values being advantageous for max player and lower values being advantages for min player. The max player aims to maximize this score while the min player tries to minimize it. Since in a two-player game, each of your decision must depend on all the choices your opponent can make, the players take a pessimistic approach, assuming the worst from their opponent and deciding accordingly. The analysis of game tree proceeds in levels alternating between play for max player and min player.
Alpha Beta Search attempts to decrease the number of nodes that it must examine by stopping to evaluate a move as soon as at least one possibility has been found that proves the move to be worse than a previously examined move. It does so by maintaining two dynamically computed bounds: alpha and beta.
- Alpha is the minimum value that the max player is guaranteed (regardless of what the min player does) through another path through the game tree. This value is used to perform cutoffs (pruning) at the minimizing levels. When the min player has discovered that the score of a min node would necessarily be less than alpha, it need not evaluate any more choices from that node because the max player already has a better move (the one which has value alpha).
- Beta is the maximum value that the min player is guaranteed and is used to perform cutoffs at the maximizing levels. When the max player has discovered that the score of a max node would necessarily be greater than beta, it can stop evaluating any more choices from that node because the min player would not allow it to take this path since the min player already has a path that guarantees a value of beta.
Alpha Beta Search is typically implemented recursively as follows:
This algorithm works for any two-player zero-sum game. However, game specific knowledge has to be supplied in the form of functions for next move generation, checking terminal states, computing heuristic values for non-terminal states etc.
Improvements and Optimizations:
If almost all computer players use the same algorithm, then how come some of them are much better than the others? Clearly, there must be scope for improvements:
- Improving the heuristic (or evaluation) function: A better heuristic function means a better player. A heuristic function is the single most important part of a computer player, which distinguishes a good player from a bad one. It allows to incorporate game-specific knowledge into the player for guiding the search process. For all but the most trivial games, it is impossible to search the entire game tree starting from the initial state. A heuristic function provides a proxy for the value of a game node (the value which that node would have if the entire subtree rooted at that node could be evaluated). Heuristic functions should be fast to compute yet accurate measures of the actual values, involving a tradeoff between the two.
- Improving the game state representation: Game state representation provides the second major scope of improvement. A game state should contain all the information required to test terminality, generate successor states and compute heuristic value (without having to traverse the path from this state to the root). A bitboard representation can provide significant speedup, while reducing memory footprint, over a more natural representation.
- Improving utilization of the available time: In typical competition settings, computer players are given a fixed amount of time for each move. Choosing the appropriate depth for which the implementation will never timeout can be a headache and since we are only considering the worst possible scenario for deciding this depth cutoff, we would be wasting a lot of precious available time for the majority of other moves (under-utilization). In order to utilize all the available time and never timeout, we can use a technique called iterative deepening, wherein we call the same move selection procedure for increasing depths starting from depth=1 until time remains. This looks like a wastage because we are searching the same nodes over and over again for different depths. The key insight is that majority of the work is done at the last level and iterative deepening does only twice as much work for the same largest depth. Iterative deepening provides another benefit: the value of each node traversed can be stored and used as a heuristic during the next higher depth traversal. This value, called history heuristic, is likely to be more accurate than the heuristic evaluation of a node alone. Another important improvement is that we don’t necessarily have to start the iterative deepening procedure at depth=1. We can start it at any depth value for which the implementation is guaranteed never to timeout.
- Move ordering: The order in which alpha beta search considers moves decides the pruning efficiency and ultimately the performance of the player. If better moves are considered first, moves which are worse off can be quickly pruned away. Instead of using a fixed move ordering, we can order the moves based on the (decreasing) heuristic values of nodes resulting from them. This is again a tradeoff between spending more time in heuristic evaluation and saving more time by pruning. To get the best of both worlds, we should use a different heuristic function which is fast to evaluate but not necessarily as accurate. Values of nodes computed during lower-depth traversals in iterative deepening are typically used as ordering heuristics. If no such heuristic is available, it’s still a good idea to use a random move order i.e., randomize the order of next moves.
- Searching for Quiescence: Better computer players do not use a fixed depth cutoff but a variable cutoff depending upon how quiescent and beneficial a particular move sequence is. Quiescence means inactivity or quietness i.e., not much change is going on the heuristic values of successive states. It is desirable to reach quiescent positions before applying the depth cutoff in order to prevent horizon effects (being unable to see a bad move just because it lies just ahead of the depth cutoff). For example, it is unwise to cause a depth cutoff in the middle of a capturing sequence in chess. The player may think that it is winning a queen but if it could only look 2 moves ahead, it would have seen that it loses a queen and a rook in return.
Here’s a great video on the step by step working of Alpha Beta Pruning.
Some examples of good heuristic functions:
- Heuristic Function for Othello
- Heuristic Function for Tic Tac Toe
- Heuristic Function for Nine Mens’ Morris
Build players for different interesting games and compete against other computer players on HackerRank.