A simple heuristic-based General Game Player

In the previous post, we talked about general game playing. Now let’s build a simple general game player, using the algorithms that we’ve already seen (Iterative deepening DFS and Iterative Deepening Alpha Beta Search).

If the games are small enough, we could solve them completely during the start clock and just play the cached moves during the play clock. However, not all games will be small enough to be completely solvable. We’ll have to terminate the search after some search depth or after a certain amount of time has passed. If we are searching up to some depth cutoff, we’ll need a suitable heuristic function to evaluate the non-terminal nodes and choose the best move. A common strategy is to use the start clock to come up with a heuristic function and then use that heuristic function, combined with an iterative deepening algorithm, to choose the next move, in each play clock.

There are several approaches to finding a suitable heuristic function:

Let’s use the simplest of these approaches (a linear combination of general heuristics) in this post. Some of the general heuristics are:

Action Mobility
Mobility is a measure of the number of things a player can do. Action mobility is a measure of the number of actions available in a given state or n steps away from the given state, and can be calculated as (ActionsCount * 100) / MAX_POSSIBLE_ACTIONS.
State Mobility
It’s a measure of the number of unique states that can be reached from a given state in n steps, and can be calculated as (UniqueStatesCount * 100) / MAX_POSSIBLE_STATES.
Action Focus
Focus is a measure of the narrowness of the search space. It is the inverse of mobility. Sometimes it’s good to focus to cut down on search space. Action focus can be calculated as (100 – ActionMobility).
State Focus
State Focus can be calculated as (100 – StateMobility).
Goal Value
Goal values are part of the game description. We’ll only get goal values for terminal states. For non-terminals, the goal value would be 0.

We can build a linear combination of these 5 heuristics by playing multiple matches with randomly selected weights during the start clock and then choosing the weights that yield the maximum utility. We can then use this composite heuristic to evaluate nodes during actual play.

Let’s assume the following simplified interface for a game:

We can now implement a linear combination heuristic as follows:

The weights can be found out by playing multiple matches with different random weight combinations during the start clock and choosing the combination that performs the best. Here’s a Java implementation for this that I wrote for my general game player.

We can now plug-in this heuristic function in iterative deepening DFS (for single player games) and iterative deepening Alpha Beta search (for two-player zero-sum games). Our player can now play, although somewhat poorly, two large classes of games. In the next posts, we’ll learn about some algorithms that perform much better, don’t require any heuristics and are able to guide the search on their own.


One thought on “A simple heuristic-based General Game Player

  1. Pingback: GGP: Monte Carlo Search | Everything Under The Sun

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s