In the last two posts, we talked about adversarial search and built a bot for checkers. It’s now time to relax yet another constraint on the type of search problems we are able to solve, in particular, the determinism of actions i.e., we’ll no longer require that an action from a given state leads to the same state each time and will allow some randomness/uncertainty in the states that can be reached from a given state by taking a given action.
You may be wondering why we wouldn’t know the next state, given a state and an action, and what could we possibly accomplish by introducing randomness in the choice of next state. Well, this may happen for the following reasons:
- Perhaps the search problem (e.g., a game of backgammon) has an element of chance, like a coin toss or a dice throw. We’d be overly pessimistic to model this chance node as an adversary (e.g., in minimax/alpha-beta).
- Actions are rarely deterministic in the real world and always have some inherent uncertainty in the outcome e.g., a robot arm, when given the command “move left”, may only move left 99% of the time and fail to move the rest 1%. We’ll be able to model the real world much better if we incorporate these uncertainties in our model.
We’ll model this uncertainty as a chance node (as opposed to min/max nodes) and think of it as computing the expected (weighted average) value (as opposed to the min/max value).
In the above search tree, we can take two actions: left and right, both leading to a chance node. If we assume that successors of the chance nodes are equiprobable, then the action “right” leads to greater utility (0.5 * 1 + 0.5 * 10 = 5.5) than the action “left” (0.5 * 5 + 0.5 * 5 = 5). If we had used min nodes instead of chance nodes, we would’ve selected the action “left” instead of “right” and missed out on the greater expected utility.
Expectimax (or Expectiminimax) Search
We can use max, min and chance nodes to model the player, the adversary and the environment/uncertainty respectively, to give rise to what is called the expectiminimax tree.
This looks very similar to minimax and unsurprisingly has the same properties:
- Completeness: No.
- Optimality: Against an optimal player, yes. Fixed depth version, however, is not optimal.
- Time complexity: O(bm), where b is the branching factor and m is the maximum depth or depth limit.
- Space complexity: O(bm).
Problems with Expectimax search:
- Pruning does not work. All successors of a chance node must be evaluated.
The unevaluated node could be < 10, in which case we didn’t need to evaluate it, or it could be >= 10 and should have been evaluated but we wouldn’t know which case is it unless we evaluate it. Hence, we can’t prune away nodes easily.
- In practice, we have to use a depth-limited version of Expectimax with an evaluation function. The evaluation function, however, has special requirements for Expectimax to work. With minimax, the scale of values didn’t matter (as long as the nodes maintain the same ordering of utility). We could, for example, add a constant value to the utility of each node or square the utility of each node and minimax would still work. Expectimax, however, requires the values to be meaningful.
In the search tree on the left, the optimal action is left (expected utility 6 instead of 5.5) but if we square the utilities, the optimal action becomes right (with expected utility 50.5 instead of 37). So the evaluation function must closely model the terminal utilities if Expectimax is to work.