In the previous post, we talked about general game playing. Now let’s build a simple general game player, using the algorithms that we’ve already seen (Iterative deepening DFS and Iterative Deepening Alpha Beta Search).
If the games are small enough, we could solve them completely during the start clock and just play the cached moves during the play clock. However, not all games will be small enough to be completely solvable. We’ll have to terminate the search after some search depth or after a certain amount of time has passed. If we are searching up to some depth cutoff, we’ll need a suitable heuristic function to evaluate the non-terminal nodes and choose the best move. A common strategy is to use the start clock to come up with a heuristic function and then use that heuristic function, combined with an iterative deepening algorithm, to choose the next move, in each play clock.
There are several approaches to finding a suitable heuristic function:
- Some heuristics (like action mobility – a measure of the number of actions available in a given state) are applicable across a majority of games. We could use a linear combination of such heuristics, with weights selected by playing multiple games with different random combinations during start clock.
- Discovering heuristics with propositional nets
- Discovering heuristics with Answer set programming
Let’s use the simplest of these approaches (a linear combination of general heuristics) in this post. Some of the general heuristics are:
- Action Mobility
- Mobility is a measure of the number of things a player can do. Action mobility is a measure of the number of actions available in a given state or n steps away from the given state, and can be calculated as (ActionsCount * 100) / MAX_POSSIBLE_ACTIONS.
- State Mobility
- It’s a measure of the number of unique states that can be reached from a given state in n steps, and can be calculated as (UniqueStatesCount * 100) / MAX_POSSIBLE_STATES.
- Action Focus
- Focus is a measure of the narrowness of the search space. It is the inverse of mobility. Sometimes it’s good to focus to cut down on search space. Action focus can be calculated as (100 – ActionMobility).
- State Focus
- State Focus can be calculated as (100 – StateMobility).
- Goal Value
- Goal values are part of the game description. We’ll only get goal values for terminal states. For non-terminals, the goal value would be 0.
We can build a linear combination of these 5 heuristics by playing multiple matches with randomly selected weights during the start clock and then choosing the weights that yield the maximum utility. We can then use this composite heuristic to evaluate nodes during actual play.
Let’s assume the following simplified interface for a game:
We can now implement a linear combination heuristic as follows:
The weights can be found out by playing multiple matches with different random weight combinations during the start clock and choosing the combination that performs the best. Here’s a Java implementation for this that I wrote for my general game player.
We can now plug-in this heuristic function in iterative deepening DFS (for single player games) and iterative deepening Alpha Beta search (for two-player zero-sum games). Our player can now play, although somewhat poorly, two large classes of games. In the next posts, we’ll learn about some algorithms that perform much better, don’t require any heuristics and are able to guide the search on their own.