Early Access

Chapter 4

Uncluttering Your Inbox

More and more people are becoming overwhelmed by their email. It is not unusual for a busy person to receive many hundreds of new emails every day. As a result, people are having to spend longer processing their emails – and are more likely to miss an important email because it gets lost in a constant stream of new emails. Can model-based machine learning help to reduce this information overload?

The average office worker spends almost three hours a day processing their email. About 90% of this time is spent either reading incoming email or managing existing email – only the remaining 10% is spent writing or replying to emails [Outlook team, 2008]. An automatic tool to speed up reading and managing email would free up a lot of people’s time, allowing them to focus on important tasks and avoid the stress of information overload.

Microsoft Exchange is an email server used to power more than 300 million mailboxes worldwide [Radicati and Hoang, 2010]. The Exchange team are keen to use machine learning to help people to manage their mail and improve their productivity. In this chapter, we will look at how model-based machine learning was used by the Exchange team to separate out the clutter from a user’s inbox, allowing users to focus on their important emails and reducing the time taken to process incoming email.

The idea was to decide if a user thinks an email was clutter or not, based on the actions the user takes on the email. For example, emails that are never read or quickly deleted are likely to be considered as clutter by the user. Now suppose we had a machine learning system that could predict what actions a user would take on a new email – for example, the system would predict whether a user would reply to an email, delete it or leave it unread. Given such a machine learning system we could then remove from the inbox emails that are unlikely to be read or acted upon. Such clutter emails could then be placed in a separate location where they could be easily reviewed and processed in one go, at a convenient time for the user.

To achieve this goal, the team needed a system that could take a number of older emails that a user had already taken action on and learn which actions the user would be likely to take on emails with different characteristics. The system was to consider many aspects of the email: who sent the email, who was on the To and Cc lines, what the subject was, what was written in the email, whether there were any attachments and so on. The trained system was then to be applied to incoming mails to predict the probability of the user performing various actions on each email. The Exchange team considered it essential that the system make personalised predictions. Unlike junk mail, which emails are clutter is a personal thing: what is clutter for one user might not be clutter for another. For example, a project update email might be clutter for someone not on the project but might be important to read for someone who is working on the project.

In this chapter, we’ll use model-based machine learning to develop a personalised system that meets the needs of the Exchange team. We will focus on building a system to predict whether a user will reply to an incoming email. However, the resulting system will be general enough to predict many other kinds of actions and so can be used to predict whether or not a user will consider an email to be clutter. In particular, we will see how to:

  • Manage email data and privacy issues,
  • Develop a model for predicting actions personalized to each user,
  • Use information about an email to drive the model,
  • Evaluate the model both in numerical terms and in terms of user experience,
  • Extend the model to address various problems as they arise.


[Outlook team, 2008] Outlook team (2008). Internal email study.

[Radicati and Hoang, 2010] Radicati, S. and Hoang, Q. (2010). Microsoft Exchange Server and Outlook Market Analysis, 2010-2014. Technical report, The Radicati Group, Inc.