 So far we’ve talked about uninformed and informed search algorithms. That should be it, right? Both of them combined should cover the space of all search algorithms, no? Well, unfortunately not. We’ve made some strong assumptions in everything discussed so far. We had assumed that search problems had following properties:

• Single agent: There is only one agent (player).
• Deterministic actions: There is no randomness i.e., an action in a given state has the same effect every time.
• Fully observed state: The agent has perfect information about the world.
• Discrete state space: States evolve only in response to the actions taken by the agent and not as a function of time.

Let’s relax one of these constraints and allow multiple agents instead of only one. A major class of multiple-agent search problems is the space of fixed-sum games, which are necessarily adversarial (since the total utility is fixed, the players must compete with each-other to maximize their utilities). An important special case of adversarial games is two-player, zero-sum, deterministic, perfect-information games, e.g., chess, checkers, tic-tac-toe etc.

We can formally define a deterministic, perfect-information game as follows:

• A set of states: S (with start state s0)
• A set of players: P = {1, 2, …, N}, who usually take turns
• A set of actions: A (may depend on player/state)
• A transition function: S x A → S
• A terminal test: S → {t, f}
• A terminal utility function: S x P → R
• Solution for a player is a policy: S → A

This is very similar to the formulation for single-agent search problems, with the following differences:

• We now have a set of players, who will take turns to make moves, and the set of actions may depend on the player. This can be simplified by having an extra action “no-op” and requiring all players to make a move at each step. Every player, except the one with the turn, must return “no-op”. An action will then be an N-tuple, specifying the move chosen by each player. This formulation has the added benefit of supporting simultaneous move games like rock-paper-scissors, prisoners’ dilemma etc.
• The successor function is now called the transition function and the goal test has been called the terminal test.
• There is a terminal utility function, which accepts a state and a player and returns utility (reward) for that player if the state is terminal, and 0 otherwise.
• Each player has to come up with a policy of what action to take in a given state.

A zero-sum game will have two players with opposite utilities. We can think of the utility as being a single value, which one will try to maximize and the other minimize.

Let’s define the value of a state as the best achievable outcome (utility) from that state. For terminal states, the value is part of the problem description. For a non-terminal state s, V(s) = max {s’ ∈ successor(s): V(s’)}, i.e., the maximum value of any successor state. In adversarial search problems, it’s not just in our hands to achieve this value because our opponent would not allow it. What we can achieve, however, is called the minimax value, which is the best achievable outcome (utility) assuming that our adversary plays optimally (i.e., always chooses the action which minimizes our utility).

In the above game tree, assume that you are the green player and your opponent is the red player. The value of the root state for you is +5 but you can’t achieve it because if you take the right action, your opponent will take the left action, leaving you with a utility of -7. The minimax value of this state for you is -5 and you can achieve it by taking the left action.

This gives rise to the following recursive definition of minimax value:

• For terminal states, V(s) is known.
• For states under agent’s control, V(s) = max {s’ ∈ successor(s): V(s’)}
• For states under opponent’s control, V(s) = min {s’ ∈ successor(s): V(s’)}

Minimax Search

[gist https://gist.github.com/kartikkukreja/99417a4262433e456697 /]

It’s easy to prove that this algorithm computes the minimax value. However, it’s infeasible for anything except toy problems. If the game has reversible moves, the corresponding search graph will have loops and the search tree will be infinite. So this algorithm is not complete. Typically, we limit the search depth to a certain number of moves and use an evaluation function to evaluate non-terminal states. We can also use iterative deepening for an any-time algorithm.

Properties of minimax search:

• Completeness: No.
• Optimality: Against an optimal player, yes. Fixed depth version, however, is not optimal.
• Time complexity: O(bm), where b is the branching factor and m is the maximum depth or depth limit.
• Space complexity: O(bm).

Alpha Beta Search

[gist https://gist.github.com/kartikkukreja/d61f9f4c571157604dee /]

This algorithm prunes the search tree by not exploring nodes for which a better alternative has already been observed. This pruning has no effect on the minimax value computed for the root. The value of intermediate nodes, however, might be wrong. In particular, children of the root may have the wrong value so a naive implementation won’t let us do action selection. Typically, we solve this issue by running alpha beta search for each successor of the root and then choose the action that leads to the successor with the maximum value.

Similar to minimax search, alpha beta search, as described above, works only for very small problems that don’t contain loops. Typically, we limit the search depth to a certain number of moves and use an evaluation function to evaluate non-terminal states. We can also use iterative deepening for an any-time algorithm.

Properties of alpha beta search:

• Completeness: No.
• Optimality: Against an optimal player, yes. Fixed depth version, however, is not optimal.
• Time complexity: With perfect child ordering, O(bm/2), where b is the branching factor and m is the maximum depth or depth limit. Child ordering has a huge impact on the effectiveness of pruning.
• Space complexity: O(bm).

I’ve written a more detailed post on alpha beta search and its various optimizations.

Here’s a Java implementation of iterative-deepening alpha beta search that I wrote for my general game player.

Evaluation Functions

Evaluation functions score non-terminal states in depth-limited search. They are often also called heuristic functions but that can be confused with the heuristic functions used in single-agent search problems. Heuristic functions used in single-agent search have a different meaning; they return an estimate of the distance of a given state from a goal. Evaluation functions, on the other hand, return an estimate of the minimax value of a state. Typically, they are implemented as a weighted linear sum of some features of the state.

Some examples of evaluation functions:

In the future posts, we’ll relax more constraints on the type of search problems we can solve and start to look at general games.

References: