3.4 Extensions to the core model
So far we have seen how to construct a probabilistic model of games played between pairs of players in which each game results in a win for one of the players. However, in order to build an industrial-strength system for Xbox Live, applicable to a wide variety of games, we need to extend our model to deal with a number of additional complexities, as discussed in the introduction to this chapter. In particular, real games can end in draws, they can involve more than two players, and they can be played between teams of people. We will now show how our simple model can easily be extended to take account of these complexities. This flexibility nicely illustrates the power of a model-based approach to machine learning.
The original Xbox online gaming service used the Elo algorithm to estimate the skills of players. However, in spite of its popularity, the Elo system suffers from some significant limitations. In particular:
- Elo does not handle draws
- Elo does not handle team games
- Elo does not handle games with more than two players
Various modifications to the basic Elo system, mostly heuristic in nature, have been proposed to deal with the issues of draws, team games, and multiple players. The fundamental problem is that Elo is an algorithm rather than a model, and so it is difficult to see what assumptions are being made and therefore difficult to extend the algorithm to new situations. We therefore turn to extensions of TrueSkill, to see how a model-based approach allows such extensions to be incorporated in a transparent way, giving rise to a solution which can handle all of the above complexities.
What if a game can end in a draw?
In our current model, the player with the higher performance value on a particular game is the winner of that game. For games which can also end in a draw, we can modify this assumption by introducing the concept of a draw margin, such that a player is the winner only if their performance exceeds that of the other player by at least the value of the draw margin. Mathematically this can be expressed as
This is illustrated in Figure 3.32.
We have therefore modified Assumption 3 to read:
- The player with the higher performance value wins the game, unless the difference between their performance and that of their opponent is less than the draw margin, in which case the game is drawn.
The value of the draw margin represents a new parameter in our model, and we may not know the appropriate value. This is particularly true if we introduce a new type of game, or if we modify the rules for an existing game, where we have yet to see any game results. To solve this problem we simply treat the draw margin as a random variable drawMargin whose value is to be learned from data. Because drawMargin is a continuous variable, it is chosen to be a Gaussian. This can be expressed as a factor graph, as shown in Figure 3.33.
The variable Jwins is replaced by outcome which is a discrete variable that takes one of the values JillWins, Draw, or FredWins. The WinLoseDraw factor is simply a function whose value is 1 if the three values of Jperf, drawMargin, and Fperf are consistent with the value of outcome and is 0 otherwise. We need to make some modifications to the evaluation of messages sent out from this factor. These will not be discussed in detail here, and instead the interested reader can refer to this excellent blog post and Herbrich et al. .
In order to simplify the subsequent discussion of other extensions to the core model, we will ignore the draw modification in the remaining factor graphs in this chapter, although all subsequent models can be similarly modified to include draws if required.
What if we have more than two players in a game?
Suppose we now have more than two players in a game, such as in the Halo game ‘Free for All’ in which eight players simultaneously play against each other. The outcome of such a game is now an ordering amongst the players involved in the game. With our model-based approach, incorporating a change such as this involves just making a suitable assumption, constructing the corresponding factor graph and then running our inference algorithm. Our new Assumption 3 can be stated as
- The order of players in the game outcome is the same as the ordering of their performance values in that game.
If there are players in the game then this assumption can be captured in a factor graph using GreaterThan factors to describe the player ordering. This is illustrated for the case of three players in Figure 3.34.
Note that we could have introduced a separate ’greater-than’ factor for each possible pair of players. For players there are such factors. However, these additional factors contain only redundant information and lead to an un-necessarily complex graph. The ordering of players can be expressed using greater than factors, provided these are chosen to connect the pairs of adjacent players in the ordering sequence. In effect, because we know the outcome of the game, we can choose a relatively simple graph which captures this.
In this optional section, we show why the use of expectation propagation, even for a tree-structured graph, can require iterative solution. If you want to focus on modelling, feel free to skip this section.
The extension to more than two players introduces an interesting effect related to our expectation propagation algorithm. We saw in section 2.2 that if our factor graph has a tree structure then belief propagation gives exact marginal distributions after a single sweep through the graph (with one message passed in each direction across every link). Similarly, if we now apply expectation propagation to the two-player graph of Figure 3.10 this again requires only a single pass in each direction. This is because the ‘context’ messages for the expectation propagation approximation are fixed. However, the situation becomes more complex when we have more than two players. The graph of Figure 3.34 has a tree structure, with no loops, and so exact belief propagation would require only a single pass. However, consider the evaluation of outgoing message (A) using expectation propagation. This requires the incoming message (D) to provide the ‘context’ for the approximation. However, message (D) depends on message (C) which itself is evaluated using expectation propagation using message (B) as context, and message (B) in turn depends on message (A). Expectation propagation therefore requires that we iterate these messages until we reach some suitable convergence criterion (in which the changes to the messages fall below some threshold). We therefore modify our message-passing schedule so that we first pass messages downwards from the skill nodes to the performance nodes (as before), then we perform multiple passes back and forth amongst the performance nodes until we achieve convergence, and then finally pass messages upwards to the skill nodes in order to evaluate posterior skill marginals.
Remember that in Figure 3.29 and Figure 3.30 we saw how the shift of the distributions between prior and posterior were larger in the case where the weaker player (Fred) won the game. Now we repeat the experiment, except with a third player (Steve), whose prior skill distribution is Gaussian (keeping Jill as Gaussian and Fred as Gaussian as before), and run the multi-player TrueSkill model on a single game with the outcome Jill 1st, Fred 2nd, Steve 3rd. The results of this are shown in Figure 3.35. Firstly, note that since Steve was expected to be the strongest player, but in fact came last, his posterior mean has moved markedly downwards (to below the other two players). Secondly, note that the changes in the means of Jill and Fred are in the same direction as in Figure 3.29, but are more pronounced than before. This again is because the overall game result is more surprising.
In Figure 3.36 we begin with the same priors, but now observe a single game with outcome Jill 2nd, Fred 1st, Steve 3rd. Note here that since Fred and Steve’s prior skills were equidistant from Jill’s, and their prior variances were equal, their posterior means have shifted by the same amount, and so they appear to have “swapped” places. A further consequence of this symmetry is that since Jill neither won nor lost, her skill mean has not moved, although the variance of her skill is reduced due to the new information provided by the game outcome.
What if the games are played by teams?
Many of the games available on Xbox Live can be played by teams of players. For example in Halo another type of game is played between two teams each consisting of eight players. The outcome of the game simply says which team is the winner and which team is the loser, and the challenge is to use this outcome information to revise the skill distributions for each of the individual players. This is an example of a credit assignment problem in which we have to work out how the credit for a victory (or blame for a defeat) should be attributed to individual players when only the outcome for the overall team is given. The solution is similar to the last two situations: we make an assumption about how the individual player skills combine to affect the game outcome, we construct a probabilistic model which encodes this assumption and then run inference to update the skill distributions. There is no need to invent new algorithms or design new heuristics.
Here is one suitable assumption which we could use when modelling team games, which would replace Assumption 3:
- The performance of a team is the sum of the performances of its members, and the team with the highest performance value wins the game.
We can now build a factor graph corresponding to this assumption. For example, consider a game between two teams, each of which involves two players. The factor graph for this is shown in Figure 3.37.
The performance of a team is determined by the performance of the players who comprise that team. Our assumption above was that the team performance is given by the sum of the performances of the individual players. This might be appropriate for collaborative team games such as Halo. However, other assumptions might be appropriate in other kinds of game. For example, in a race where only the fastest player determines the team outcome, we might make the alternative assumption
- The performance of a team is equal to the highest performance of any of its members, and the team with the highest performance value wins the game.
In this section we have discussed various modifications to the core TrueSkill model, namely the inclusion of draws, the extension to multiple players, and the extension to team games. These modifications can be combined as required, for example to allow a game between multiple teams that includes draws, by constructing the appropriate factor graph and then running expectation propagation. This highlights not only the flexibility of the model-based approach to machine learning, but also the ease with which modifications can be incorporated. As long as the model builder is able to describe the process by which the data is generated, it is usually straightforward to formulate the corresponding model. By contrast, when a solution is expressed simply as an algorithm it may be far from clear how the algorithm should be modified to account for changes in the problem specification. In the next section we conclude our discussion of the online game matchmaking problem by a further modification to the model in which we relax the assumption that the skills of the players are fixed.
The following exercises will help embed the concepts you have learned in this section. It may help to refer back to the text or to the concept summary below.
- Sketch out a factor graph for a model which allows draws, two-player teams and multiple teams. You will need to combine the factor graphs of Figure 3.33, Figure 3.34 and Figure 3.37. Your sketch can be quite rough – for example, you should name factors (e.g. “Gaussian”) but there is no need to provide any numbers for factor parameters.
- Extend your Infer.NET model from the previous self assessment to have three players and reproduce the results from Figure 3.35 and Figure 3.35.
- [Project idea] There is a wide variety of sports results data available on the web. Find a suitable set of data and build an appropriate model to infer the skills of the teams or players involved. Rank the teams or players by the inferred skill and decide if you think the model has inferred a good ranking. If not, diagnose why not and explore modifications to your model to address the issue.
credit assignment problemThe problem of allocating a reward amongst a set of entities, such as people, all of which have contributed to the outcome.
[Herbrich et al., 2007] Herbrich, R., Minka, T., and Graepel, T. (2007). TrueSkill(TM): A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems 20, pages 569–576. MIT Press.