How can machine learning solve my problem?
During the last few years, the field of machine learning has moved to centre stage in the world of technology. Today, thousands of scientists and engineers are applying machine learning to an extraordinarily broad range of domains. However, making effective use of machine learning in practice can be daunting, especially for newcomers to the field. Here are some of the principal challenges encountered when trying to solve real-world problems using machine learning:
“I am overwhelmed by the choice of machine learning methods and techniques. There’s too much to learn!”
“I don’t know which algorithm to use or why one would be better than another for my problem.”
“My problem doesn’t seem to fit with any standard algorithm.”
In this book we look at machine learning from a fresh perspective which we call model-based machine learning. This viewpoint helps to address all of these challenges, and makes the process of creating effective machine learning solutions much more systematic. It is applicable to the full spectrum of machine learning techniques and application domains, and will help guide you towards building successful machine learning solutions without requiring that you master the huge literature on machine learning.
What is model-based machine learning?
Over the last five decades, researchers have created literally thousands of machine learning algorithms. Traditionally an engineer wanting to solve a problem using machine learning must choose one or more of these algorithms to try, or otherwise attempt to invent a new one. In practice, their choice of algorithm may be constrained by those algorithms they happen to be familiar with, or by the availability of specific software, and may not be the best choice for their problem.
By contrast the model-based approach seeks to create a bespoke solution tailored to each new application. Instead of having to transform your problem to fit some standard algorithm, in model-based machine learning you design the algorithm precisely to fit your problem.
The core idea at the heart of model-based machine learning is that all the assumptions about the problem domain are made explicit in the form of a model. In fact a model is just made up of this set of assumptions, expressed in a precise mathematical form. These assumptions include the number and types of variables in the problem domain, which variables affect each other, and what the effect of changing one variable is on another variable. For example, in the next chapter we build a model to help us solve a simple murder mystery. The assumptions of the model include the list of suspected culprits, the possible murder weapons, and the tendency for particular weapons to be preferred by different suspects. This model is then used to create a model-specific algorithm to solve the specific machine learning problem. Model-based machine learning can be applied to pretty much any problem, and its general-purpose approach means you don’t need to learn a huge number of machine learning algorithms and techniques.
So why do the assumptions of the model play such a key role? Well it turns out that machine learning cannot generate solutions purely from data alone. There are always assumptions built into any algorithm, although usually these assumptions are far from explicit. Different algorithms correspond to different sets of assumptions, and when the assumptions are implicit the only way to decide which algorithm is likely to give the best results is to compare them empirically. This is time-consuming and inefficient, and it requires software implementations of all of the algorithms being compared. And if none of the algorithms tried gives good results it is even harder to work out how to create a better algorithm.
Models versus algorithms
Let’s look more closely at the relationship between models and algorithms. We can think of a standard machine learning algorithm as a monolithic box which takes in data and produces results. The algorithm must necessarily make assumptions since it is these assumptions that distinguish a particular algorithm from the thousands of others out there. However, in an algorithm those assumptions are implicit and opaque.
Now consider the model-based view. The model comprises the set of assumptions we are making about the problem domain. To get from the model to a set of predictions we need to take the data and compute those variables whose values we wish to know. This computational process we shall call inference. There are several techniques available for doing inference, as we shall discuss during the course of this book. The combination of the model and the inference procedure together define a machine learning algorithm, as illustrated in Figure 0.1.
Although there are various choices for the inference method, by decoupling the model from the inference we are able to apply the same inference method to a wide variety of models. For example, most of the case studies discussed in this book are solved using just one inference method.
Model-based machine learning can be used to do any standard machine learning task, such as classification (Chapter 4) or clustering (Chapter 6), whilst providing additional insight and control over how these tasks are performed. Solving these tasks using model-based machine learning provides a way to handle extensions to the task or to improve accuracy, by making changes to the model – we will look at an example of this in Chapter 4. Additionally, the assumptions you are making about the problem domain are laid out clearly in the model, so it is easier to work out why one model works better than another, to communicate to someone else what a model is doing, and to understand what’s happening when things go wrong. Using models also makes it easier to share other people’s solutions in order to adapt, extend, or combine them.
An example: predicting skills
Suppose you wish to track the changing skill of a player in an online gaming service (this is the problem we will explore in detail in Chapter 3). A machine learning textbook might tell you that there is an algorithm called a Kalman filter [Kalman, 1960] which can be used for these kinds of problems. Suppose you decide to try and make use of some Kalman filter software to predict how a player’s skill evolves over time. First you will have to work out how to convert the skill prediction task into the form of a standard Kalman filter. Having done that, if you are lucky, the software might give a sufficiently good solution. However, the results from using an off-the-shelf algorithm often fail to reach the accuracy level required by real applications. How will you modify the algorithm, and the corresponding software, to achieve better results? It seems you will have to become an expert on the Kalman filter algorithm, and to delve into the software implementation, in order to make progress.
Contrast this with the model-based approach. You begin by listing the assumptions which your solution must satisfy. This defines your model. You then use this model to create the corresponding machine-learning algorithm, which is a mechanical process that can be automated. If your assumptions happen to correspond to those which are implicit in the Kalman filter, then your algorithm will correspond precisely to the Kalman filtering algorithm (and this will happen even if you have never heard of a Kalman filter). Perhaps, however, the model for your particular application has somewhat different assumptions. In this case you will obtain a variant of the Kalman filter, appropriate to your application. Whether this variant already exists, or whether it is a novel algorithm, is irrelevant if your goal is to find the best solution to your problem. Suppose you try your model-based algorithm, and the results again fall short of your requirements. Now you have a framework for improving the results by examining and modifying the assumptions to produce a better model, along with the corresponding improved algorithm. As a domain expert it is far easier and more intuitive to understand and change the assumptions than it is to modify a machine learning algorithm directly. Even if your goal is simply to understand the Kalman filter, then starting with the model assumptions is by far the clearest and simplest way to derive the filtering algorithm, and to understand what Kalman filters are all about.
Tools for model-based machine learning
The decomposition of algorithms into a model and a separate inference method has another powerful consequence. It becomes possible to create a software framework which will generate the machine learning algorithm automatically given only the definition of the model and a choice of inference method. This allows the applications developer to focus on the creation of the model, which is domain-specific, and frees them from the need to be an expert on the inner workings of the inference procedure.
For more than ten years we have been working on such a software framework at Microsoft Research, called Infer.NET [Minka et al., 2014]. Because a model consists simply of a set of assumptions it can be expressed in very compact code, which is relatively easy to understand and modify. The corresponding code for the algorithm, which is generally much more complex, is then produced automatically. All of the models in this book were created using Infer.NET, and the corresponding model source code is available online. However, these solutions could equally be implemented by hand or by using an alternative model-based framework – they are not specific to Infer.NET. Examples of alternative software frameworks that implement the model-based machine learning philosophy include BUGS [Lunn et al., 2000], Church, [Goodman et al., 2008], and Stan [Stan Development Team, 2014].
Who is this book for?
This book is rather unusual for a machine learning text book in that we do not review dozens of different algorithms. Instead we introduce all of the key ideas through a series of case studies involving real-world applications. Case studies play a central role because it is only in the context of applications that it makes sense to discuss modelling assumptions. Each chapter therefore introduces one case study which is drawn from a real-world application that has been solved using a model-based approach. The exception is the first chapter which explores a simple fictional problem involving a murder mystery.
Each chapter also serves to introduce a variety of machine learning concepts, not as abstract ideas, but as concrete techniques motivated by the needs of the application. You can think of these concepts as the building blocks for constructing models. Although you will need to invest some time to understand these concepts fully, you will soon discover that a huge variety of models can be constructed from a relatively small number of building blocks. By working through the case studies in this book you will learn how to use these components, and will hopefully gain a sufficient appreciation of the power and flexibility of model-based approach to allow you to solve your machine learning problem.
This book is intended for any technical person who wants to use machine learning to solve a real-world problem – the focus of the book is on designing models to solve problems. However, some readers will also want to understand the mathematical details of how models are turned into inference algorithms. We have separated these parts of the book, which require more advanced mathematics, into inference deep-dive sections, which will be marked with panels like this one.
Deep-dive sections are optional – you can read the book without them. If you are planning on using a software framework like Infer.NET or just want to focus on modelling, you can skip these sections.
How to read this book
Each case study in this book describes a journey from problem statement to solution. You probably do not want to follow this journey in a single sitting. To help with this, each case study is split into sections – we recommend reading a section at a time and pausing to digest what you have learned at the end of each section. To help with this, the machine learning concepts introduced in a section will be highlighted like this and will be reviewed at the end of each section (as you can see below). We aim to provide enough details of each concept to allow the case studies to be understood, along with links to external sources, such as Bishop , where you can get more details if you are interested in a particular topic.
Now, on to the first case study!
model-based machine learningAn approach to machine learning where all the assumptions about the problem domain are made explicit in the form of a model. This model is then used to create a model-specific algorithm to learn or reason about the domain. The algorithm creation part of this process can be automated.
modelA set of assumptions about a problem domain, expressed in a precise mathematical form, that is used to create a machine learning solution.
Infer.NETA software framework developed at Microsoft Research Cambridge which can do model-based machine learning automatically given a model definition. Available for download at the Infer.NET website.
[Kalman, 1960] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions of the American Society for Mechanical Engineering, Series D, Journal of Basic Engineering, 82:35–45.
[Minka et al., 2014] Minka, T., Winn, J., Guiver, J., Webster, S., Zaykov, Y., Yangel, B., Spengler, A., and Bronskill, J. (2014). Infer.NET 2.6. Microsoft Research Cambridge. http://research.microsoft.com/infernet.
[Lunn et al., 2000] Lunn, D., Thomas, A., Best, N., and Spiegelhalter, D. (2000). WinBUGS – a Bayesian modelling framework. Statistics and Computing, 10:325–337. MRC Biostatistics Unit. http://www.mrc-bsu.cam.ac.uk/software/bugs.
[Goodman et al., 2008] Goodman, N., Mansinghka, V. K., Roy, D. M., Bonawitz, K., and Tenenbaum, J. B. (2008). Church. MIT. http://projects.csail.mit.edu/church/wiki/Church.
[Bishop, 2006] Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.