Early Access

3.4 Extensions to the core model

So far we have seen how to construct a probabilistic model of games played between pairs of players in which each game results in a win for one of the players. However, in order to build an industrial-strength system for Xbox Live, applicable to a wide variety of games, we need to extend our model to deal with a number of additional complexities, as discussed in the introduction to this chapter. In particular, real games can end in draws, they can involve more than two players, and they can be played between teams of people. We will now show how our simple model can easily be extended to take account of these complexities. This flexibility nicely illustrates the power of a model-based approach to machine learning.

The original Xbox online gaming service used the Elo algorithm to estimate the skills of players. However, in spite of its popularity, the Elo system suffers from some significant limitations. In particular:

  • Elo does not handle draws
  • Elo does not handle team games
  • Elo does not handle games with more than two players

Various modifications to the basic Elo system, mostly heuristic in nature, have been proposed to deal with the issues of draws, team games, and multiple players. The fundamental problem is that Elo is an algorithm rather than a model, and so it is difficult to see what assumptions are being made and therefore difficult to extend the algorithm to new situations. We therefore turn to extensions of TrueSkill, to see how a model-based approach allows such extensions to be incorporated in a transparent way, giving rise to a solution which can handle all of the above complexities.

What if a game can end in a draw?

In our current model, the player with the higher performance value on a particular game is the winner of that game. For games which can also end in a draw, we can modify this assumption by introducing the concept of a draw margin, such that a player is the winner only if their performance exceeds that of the other player by at least the value of the draw margin. Mathematically this can be expressed as

 \mbox{if} & \var{Jperf} > \var{Fperf} + \var{drawMargin} & \mbox{Jill wins} \nonumber \\ \mbox{else if} & \var{Fperf} > \var{Jperf} + \var{drawMargin} & \mbox{Fred wins} \nonumber  \\ \mbox{else} & & \mbox{game drawn} \label{eq:draw-factor}

This is illustrated in Figure 3.32.

Figure 3.32Schematic illustration of the regions in performance space where Jill is the winner, where Fred is the winner, and where the game ends in a draw.

We have therefore modified Assumption 3 to read:

  1. The player with the higher performance value wins the game, unless the difference between their performance and that of their opponent is less than the draw margin, in which case the game is drawn.

The value of the draw margin represents a new parameter in our model, and we may not know the appropriate value. This is particularly true if we introduce a new type of game, or if we modify the rules for an existing game, where we have yet to see any game results. To solve this problem we simply treat the draw margin as a random variable drawMargin whose value is to be learned from data. Because drawMargin is a continuous variable, it is chosen to be a Gaussian. This can be expressed as a factor graph, as shown in Figure 3.33.

observed int variable
Whether Jill wins, Fred wins, or the game is a draw
outcome
double variable
Jill's skill
Jskill
double variable
Fred's skill
Fskill
double variable
Jill's performance
Jperf
double variable
Fred's performance
Fperf
double variable
The maximum performance difference which gives rises to a draw
drawMargin
Gaussian(120, 40²)Gaussian(100, 5²)Gaussian(•, 5²)Gaussian(•, 5²)Gaussian(1,10)WinLoseDraw
Figure 3.33TrueSkill model for a game between two players which includes the possibility of a draw.

The variable Jwins is replaced by outcome which is a discrete variable that takes one of the values JillWins, Draw, or FredWins. The WinLoseDraw factor is simply a function whose value is 1 if the three values of Jperf, drawMargin, and Fperf are consistent with the value of outcome and is 0 otherwise. We need to make some modifications to the evaluation of messages sent out from this factor. These will not be discussed in detail here, and instead the interested reader can refer to this excellent blog post and Herbrich et al. [2007].

In order to simplify the subsequent discussion of other extensions to the core model, we will ignore the draw modification in the remaining factor graphs in this chapter, although all subsequent models can be similarly modified to include draws if required.

What if we have more than two players in a game?

Games with more than two players require a more complex model

Suppose we now have more than two players in a game, such as in the Halo game ‘Free for All’ in which eight players simultaneously play against each other. The outcome of such a game is now an ordering amongst the players involved in the game. With our model-based approach, incorporating a change such as this involves just making a suitable assumption, constructing the corresponding factor graph and then running our inference algorithm. Our new Assumption 3 can be stated as

  1. The order of players in the game outcome is the same as the ordering of their performance values in that game.

If there are N players in the game then this assumption can be captured in a factor graph using N-1 GreaterThan factors to describe the player ordering. This is illustrated for the case of three players in Figure 3.34.

(D)(A) (B)(C)
observed bool variable
Whether player1 beat player2
p1Wins
observed bool variable
Whether player2 beat player3
p2Wins
double variable
Player 1's skill
p1Skill
double variable
Player 2's skill
p2Skill
double variable
Player 3's skill
p3Skill
double variable
Player 1's performance
p1Perf
double variable
Player 2's performance
p2Perf
double variable
Player 3's performance
p3Perf
Gaussian(120, 40²)Gaussian(120, 40²)Gaussian(120, 40²)Gaussian(•, 20²)Gaussian(•, 20²)Gaussian(•, 20²)>>
Figure 3.34Factor graph for a game involving three players. Also shown are some of the messages which arise in the use of expectation propagation applied to this graph.

Note that we could have introduced a separate ’greater-than’ factor for each possible pair of players. For N players there are N(N-1)/2 such factors. However, these additional factors contain only redundant information and lead to an un-necessarily complex graph. The ordering of N players can be expressed using N-1 greater than factors, provided these are chosen to connect the pairs of adjacent players in the ordering sequence. In effect, because we know the outcome of the game, we can choose a relatively simple graph which captures this.

Inference deep-dive

In this optional section, we show why the use of expectation propagation, even for a tree-structured graph, can require iterative solution. If you want to focus on modelling, feel free to skip this section.

The extension to more than two players introduces an interesting effect related to our expectation propagation algorithm. We saw in section 2.2 that if our factor graph has a tree structure then belief propagation gives exact marginal distributions after a single sweep through the graph (with one message passed in each direction across every link). Similarly, if we now apply expectation propagation to the two-player graph of Figure 3.10 this again requires only a single pass in each direction. This is because the ‘context’ messages for the expectation propagation approximation are fixed. However, the situation becomes more complex when we have more than two players. The graph of Figure 3.34 has a tree structure, with no loops, and so exact belief propagation would require only a single pass. However, consider the evaluation of outgoing message (A) using expectation propagation. This requires the incoming message (D) to provide the ‘context’ for the approximation. However, message (D) depends on message (C) which itself is evaluated using expectation propagation using message (B) as context, and message (B) in turn depends on message (A). Expectation propagation therefore requires that we iterate these messages until we reach some suitable convergence criterion (in which the changes to the messages fall below some threshold). We therefore modify our message-passing schedule so that we first pass messages downwards from the skill nodes to the performance nodes (as before), then we perform multiple passes back and forth amongst the performance nodes until we achieve convergence, and then finally pass messages upwards to the skill nodes in order to evaluate posterior skill marginals.

Remember that in Figure 3.29 and Figure 3.30 we saw how the shift of the distributions between prior and posterior were larger in the case where the weaker player (Fred) won the game. Now we repeat the experiment, except with a third player (Steve), whose prior skill distribution is Gaussian(140,40^2) (keeping Jill as Gaussian(120,20^2) and Fred as Gaussian(100,40^2) as before), and run the multi-player TrueSkill model on a single game with the outcome Jill 1st, Fred 2nd, Steve 3rd. The results of this are shown in Figure 3.35. Firstly, note that since Steve was expected to be the strongest player, but in fact came last, his posterior mean has moved markedly downwards (to below the other two players). Secondly, note that the changes in the means of Jill and Fred are in the same direction as in Figure 3.29, but are more pronounced than before. This again is because the overall game result is more surprising.

CSV
Figure 3.35The result of applying the TrueSkill model for a three player game between Jill (blue), Fred (red), and Steve (green) for the case where Jill is the winner, Fred comes second and Steve comes last. The prior distributions are shown as dashed curves, and the corresponding posterior distributions are shown as solid curves.

In Figure 3.36 we begin with the same priors, but now observe a single game with outcome Jill 2nd, Fred 1st, Steve 3rd. Note here that since Fred and Steve’s prior skills were equidistant from Jill’s, and their prior variances were equal, their posterior means have shifted by the same amount, and so they appear to have “swapped” places. A further consequence of this symmetry is that since Jill neither won nor lost, her skill mean has not moved, although the variance of her skill is reduced due to the new information provided by the game outcome.

CSV
Figure 3.36As in Figure 3.35 but for the case where Fred is the winner, followed by Jill and then Steve came last.

What if the games are played by teams?

Many of the games available on Xbox Live can be played by teams of players. For example in Halo another type of game is played between two teams each consisting of eight players. The outcome of the game simply says which team is the winner and which team is the loser, and the challenge is to use this outcome information to revise the skill distributions for each of the individual players. This is an example of a credit assignment problem in which we have to work out how the credit for a victory (or blame for a defeat) should be attributed to individual players when only the outcome for the overall team is given. The solution is similar to the last two situations: we make an assumption about how the individual player skills combine to affect the game outcome, we construct a probabilistic model which encodes this assumption and then run inference to update the skill distributions. There is no need to invent new algorithms or design new heuristics.

The performance of a team depends on the skills of the individual players.

Here is one suitable assumption which we could use when modelling team games, which would replace Assumption 3:

  1. The performance of a team is the sum of the performances of its members, and the team with the highest performance value wins the game.

We can now build a factor graph corresponding to this assumption. For example, consider a game between two teams, each of which involves two players. The factor graph for this is shown in Figure 3.37.

observed bool variable
Whether Team 1 won the game
team1Wins
double variable
Player 1's skill
skill1
double variable
Player 2's skill
skill2
double variable
Player 3's skill
skill3
double variable
Player 4's skill
skill4
double variable
Player 1's performance
perf1
double variable
Player 2's performance
perf2
double variable
Player 3's performance
perf3
double variable
Player 4's performance
perf4
double variable
Team 1's performance
team1Perf
double variable
Team 2's performance
team2Perf
Gaussian(120, 40²)Gaussian(120, 40²)Gaussian(120, 40²)Gaussian(120, 40²)Gaussian(•, 20²)Gaussian(•, 20²)Gaussian(•, 20²)Gaussian(•, 20²)++>
Figure 3.37A factor graph of the TrueSkill model for two teams. The first team consists of players 1 and 2, and the second team consists of players 3 and 4.

The performance of a team is determined by the performance of the players who comprise that team. Our assumption above was that the team performance is given by the sum of the performances of the individual players. This might be appropriate for collaborative team games such as Halo. However, other assumptions might be appropriate in other kinds of game. For example, in a race where only the fastest player determines the team outcome, we might make the alternative assumption

  1. The performance of a team is equal to the highest performance of any of its members, and the team with the highest performance value wins the game.

In this section we have discussed various modifications to the core TrueSkill model, namely the inclusion of draws, the extension to multiple players, and the extension to team games. These modifications can be combined as required, for example to allow a game between multiple teams that includes draws, by constructing the appropriate factor graph and then running expectation propagation. This highlights not only the flexibility of the model-based approach to machine learning, but also the ease with which modifications can be incorporated. As long as the model builder is able to describe the process by which the data is generated, it is usually straightforward to formulate the corresponding model. By contrast, when a solution is expressed simply as an algorithm it may be far from clear how the algorithm should be modified to account for changes in the problem specification. In the next section we conclude our discussion of the online game matchmaking problem by a further modification to the model in which we relax the assumption that the skills of the players are fixed.

Self assessment 3.4

The following exercises will help embed the concepts you have learned in this section. It may help to refer back to the text or to the concept summary below.

  1. Sketch out a factor graph for a model which allows draws, two-player teams and multiple teams. You will need to combine the factor graphs of Figure 3.33, Figure 3.34 and Figure 3.37. Your sketch can be quite rough – for example, you should name factors (e.g. “Gaussian”) but there is no need to provide any numbers for factor parameters.
  2. Extend your Infer.NET model from the previous self assessment to have three players and reproduce the results from Figure 3.35 and Figure 3.35.
  3. [Project idea] There is a wide variety of sports results data available on the web. Find a suitable set of data and build an appropriate model to infer the skills of the teams or players involved. Rank the teams or players by the inferred skill and decide if you think the model has inferred a good ranking. If not, diagnose why not and explore modifications to your model to address the issue.

Review of concepts introduced on this page

credit assignment problemThe problem of allocating a reward amongst a set of entities, such as people, all of which have contributed to the outcome.

References

[Herbrich et al., 2007] Herbrich, R., Minka, T., and Graepel, T. (2007). TrueSkill(TM): A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems 20, pages 569–576. MIT Press.