A Prediction Model for Sports Match Results – Part 2

Introduction

As outlined in the first post, this installment of the series will be about the model’s estimates and predictions. I will also touch on how to compare predictions between models. For all that to work, I will have to introduce a few mathematical definitions. In a similar way to the first post, I will give both an intuitive explanation of every concept and a mathematical explanation.


The Estimates section will begin with an explanation of how the model fits the data (with a few graphs to show the process of finding the estimates for last season’s NBA), followed by a brief introduction on Bayesian inference. Next, the Model Comparisons section will define the predictive likelihood of a model, and how to calculate it. I will also discuss a few different ways to compare predictions. Finally, in the Discussion section, I will offer critiques of the work presented in this post.

Estimates

To begin, let’s quickly review what we are supposed to estimate with this model: the quality of offense, quality of defense, and home court advantage for every team in the league. I will derive those estimates trough Bayesian estimation, which is a part of Bayesian inference.


The Wikipedia article contains an adequate explanation of Bayesian inference is, but is too wordy for the purposes of this post. I will therefore define Bayesian inference in a simpler, faster way:

Bayesian inference consists of using the knowledge we have before analyzing our dataset (known as prior knowledge, or just “prior”) combined with what we learn from analyzing the data (which is brought out by the likelihood function). We combine the two in order to generate the greatest level of certainty we can have about our data although we are constrained, sometimes severely, by the assumptions we make about the data and by the model we use.

Having defined Bayesian inference, let’s move on to the actual estimates. For each team, there are three parameters (offense, defense and home court advantage) and, therefore, three estimates. The estimates are posterior estimates made after computing the number of points scored by both teams in a game (as well as who was the home team). Figures 1 through 3 show last season’s estimates for every team (Fig. 1 for offense, Fig. 2 for defense, and Fig. 3 for home court advantage). The estimates appear in increasing order, meaning that the first has the weakest point estimate, while the last one in the order has the strongest point estimate.

After looking through the graphs, we find some interesting outtakes such as the Raptors having both a top 10 defense and a top 10 offense – the only team to do so. We also have the Cavaliers and the Suns as the only teams in the bottom 10 for all three parameters. We also have teams that are weak scoring both at home and away, but that made up for it with their defense, like the Heat and the Pacers. On the other end of the spectrum there are two teams that were top 10 on offense and in home court, but weaker on defense: the Clippers and the Sixers.

There are many other points of interest, but let’s save them for the Discussion section. I would like to touch on the convergence of the the chains in this model. This isn’t a statistical paper, so, I will not dive too deeply into convergence analysis (but this link will help those interested). I will, however, show one good example of convergence and one bad, with images from parameter models.

The counterexample comes from the Pacers’ offense, in Figure 4, where we can see some evidence to doubt that the Markov chain has converged. Figure 5, showing the Spurs’ defense, provides enough evidence that we can probably go forward with the assumption that the the chain has converged.

Model Comparisons

With estimates for the qualities of each team, we can try and predict game results. However, in order to achieve good results, it is not enough to simply generate predictions from one model; it is necessary to have a way to compare the quality of its predictions with those of others. The most common way to do this is to see if a model “hit” on the outcome of the game, and then count how many “hits” the model had in a given week (the model “hits” when the result that happened was the result to which it had given the highest odds).

There are, though, several other ways to measure the predictive quality of a model. One alternative method is to compare the probability given by a model for what happened, based on what we knew. For those interested in the mathematical definition of this method of comparison in which the probability given by the model for what happened is called predictive likelihood, an explanation can be found here. It is worth pointing out that, though I will be using the phrase “predictive likelihood” throughout this piece, I will actually be referring to the approximate predictive likelihood, since we can’t calculate the exact likelihood in this model.

Now, to illustrate how these methods work, a small set of games will be used:

Home TeamAwayTeamScore
AB100 x 99
CD100 x 80
EF90 x 100

Let’s define three models to compare. The first one will be a regular prediction model. The second (which we will call the simple model) defines the odds of each team winning based on league averages as of the time that the games occured; that is, the probability that the model gives that a home team will win their game is equal to the proportion of home wins in the league so far. Thus, the odds given by the simple model for wins of A, C and E are equal to each other.

The simple model is so named because it is clear from the way the way model calculates its probabilities that the only factor influencing its outcome prediction process is the definition of home team and away team for each game.

These are the probabilities for the prediction model:

Home Team AwayTeamHome Team wins AwayTeam wins
AB55%45%
CD70%30%
EF45%55%

And these are for the simple model:

Home Team AwayTeamHome Team wins AwayTeam wins
AB60%40%
CD60% 40%
EF60% 40%

The third and last model, which we will term the null model, is one where we decide things by flipping a coin. Both results have probabilities equal to 50%. It is called the null model because it is based on the idea that there is no need for prediction models, because game results are defined without taking into account anything about the quality of the teams.

Now to the model comparison: when using the first method, there is no way for the null model to “hit” on a result, since its probabilities are all the same. Thus, it can only be used on the other two models. The predictive likelihood (which appears in the PL column), however, can be calculated for all three. Since the null model is, by definition, the presumptive worst model at predicting anything, I will use the null model as the standard, and divide every model’s predictive likelihood by the null model’s predictive likelihood (meaning any value above 1 will mean the model was better at predicting the studied values than the null model), and will call that new number the normalized predictive likelihood (the NPL column).

ModelHitsPLNPL
Prediction Model20.211751.694
Simple Model20.1441.152
Null ModelNA0.1251

Comparing the models according to the values on the table is straightforward. Looking at only the Hits column, wesee no difference between the prediction model and the simple one. Looking at the predictive likelihood, though, shows what can be seen without any calculation: that the prediction model is better, seeming to evaluate the inherent qualities of each team better, even seeming to point towards team F being the better team despite while playing away from home.

With the explanations done, we can now compare models. For this section, I will compare my model to the null model, to the simple model, and to FiveThirtyEight’s predictions for their (now obsolete) CARMELO model.

In the following table, there will be a row for each team, as well as a column for how many games were analyzed (997 for all of them, since I took the games of the first five weeks of the season to train my model), one for how many “hits” each model got (divided by the number of games analyzed, so I will label the column as “Hit rate”), another two for the predictive likelihood (PL) and the normalized predictive likelihood (NPL) and a final column which I will call standardized predictive likelihood (SPL). Standardized progressive likelihood is the predictive likelihood of each model to the power of the inverse of the number of games played, so they are all on the same scale.

ModelGames analyzedHit ratePLNPLSPL
Static Model9970.6553.08 x 10^{-269} 4.125 x 10^{31} 0.538
CARMELO9970.6752.739 x 10^{-265}3.668 x 10^{35}0.543
Simple Model9970.5878.07 x 10^{-295}1.081 x 10^{6}0.507
Null Model997NA7.466 x 10^{-301}10.5

And, for good measure, here is the table I showed at the end of the last post, now better explained (it is important to remember that the models in this table were applied to soccer data).

ModelGames analyzedHit ratePLNPLSPL
Static Model4190.5116.268 x 10^{-181} 5.14 x 10^{19} 0.3715
FiveThirtyEight2600.5271.139 x 10^{-112}1.283 x 10^{12}0.3711
Simple Model4190.5011.585 x 10^{-189}1.3 x 10^{11}0.3543
Null Model419NA1.22 x 10^{-200}10.333

Discussion

What doe the results show us? First of all, a word about the home court and offense estimates: Readers may have noticed that some of the better offensive teams (like, say, the Denver Nuggets) don’t have the best offensive estimates (though they do have good home court advantage estimates). The problem arises from the fact that, as the model is defined, it is partially identifiable. In layman’s terms, that means the estimation process – because of the way both offense and home court advantage positively impact the number of points a team scores – can’t really tell if a team is good at scoring points because of its offense or because of its home court advantage.

Therefore, teams that score many more points at home than away from home (even if they still score a lot of points as the away team) will see their estimates for home court advantage much higher than their offensive estimates. The opposite is true for teams with more balanced scoring. This lack of identifiability is something that can be improved on in the future, though this is not the place for that discussion.

This is the place, however, to critique the performance of the model as shown by the last two tables. I chose to once again compare its predictions to FiveThirtyEight’s not only because it tied in well with my original project, but because at the time I was choosing a model to compare mine with, FiveThirtyEight’s predictions are among the best. Readers should feel free to suggest other prediction models for comparison as well.

In general, FiveThirtyEight had a better hit rate in both basketball and soccer data. The number I focus on is the standardized predictive likelihood, where, when it came to the soccer data, my model barely edged out FiveThirtyEight’s, but narrowly lost when using NBA data.

It is important, though, to point out that whether you look to the hit rate or to the standardized predictive likelihood, you can see that my model got about 80% of FiveThirtyEight’s improvement. That means that a model with very simple inputs, restricted to only one season of data, with very weak priors, was worth roughly 80% of what the vastly more complex CARMELO was worth.

The comparison shows the importance of mathematical robustness when defining models, as well as the importance of using Probability Theory, since a very simple (and not perfectly identifiable) model can compete with a much more complex one while also being much better than both the null and simple models.

Before ending a post that is already quite long, I’d like to remind readers that this is the second in a series of posts. The next article will deal with relating the parameters of the model to statistics commonly used by the basketball analytics community. Until then, leave your thoughts in the comments and check back soon!

 

Leave a Reply