A Prediction Model for Sports Match Results – Part 4


The first three parts of this series of articles were focused on a static model for predicting match results. With that model, our measures of team quality (offensive quality, defensive quality, and home field advantage) were static, meaning that they were not allowed to change as time progresses. In other words, a team was considered as having the same quality whether you were trying to predict the team’s results in week 1 or its results in the last week of the season.

That doesn’t seem like a reasonable assumption, though. Players get injured, traded, or have their performances fall off during a season. A new coach may be brought in as well, or the team’s system may work differently. It does not matter what the reason is, assuming that a team stays the same throughout the season is simply inadvisable.

In this post, then, I will introduce a dynamic adaptation to the original model. I will refer to this new model as the “Dynamic Model”, while the old one will be termed the “Static Model”. In the Methodology section, I will show the equations that define this adaptation. In the Results section, I will compare the two models’ results. Finally, the Discussion will dive into what we can be learn from this study.


In the dynamic adaptation of the model, the main equations are changed a bit, with this Glickmann article in mind. In the original model,

while in the dynamic model,

In other words, what these equations show is what I had already written: that team qualities are allowed to change with time. This adjustment is reflected by the superscript t added to every parameter.

Now, as to how time is able to change team strengths, the relevant equations are shown below:

An intuitive explanation of what these equations mean necessitates an explanation of the role of uncertainty. In statistics, we make frequent use of random variables, which are variables whose possible outcomes are contained in a certain grouping. The exact grouping changes according to the type of random variable, but the defining feature of a random variable is its randomness. Randomness, in this context, means that we are uncertain about the value the random variable will assume until we observe its outcome.

A different kind of uncertainty arises when we talk about the variance of a random variable. Variance is a dispersion measure – high variance means that the possible outcomes for a random variable are highly spread out (as such, outcomes that seem “crazier” are a bit more probable), while low variance means a lesser spread (the “crazier” results are a little less probable).

In this dynamic model, where time is measured according to the week of the season during which a game took place, the offensive quality of a team in week t is similar to its offensive quality in the following week. Variance is the reason they’re similar instead of equal (as they would be, on the static model), as every week we take the offensive strength of the previous week and add a new term (the omegas shown in the equations: ω).

That added term is a random variable with zero mean and non-zero variance. Since it is independent of the offensive quality of the previous week, all the term does is add variance – it adds uncertainty. Then, when the new week’s results come, that week’s parameters will be estimated, and they will be different from the previous week’s, with how different being affected by how much the week’s results differ from the teams’ previous norms.

In other words, what happens in the dynamic model is that every week we look at a team’s previous performance, but we don’t take it as gospel. We consider the team’s previous performance to be a suggestion of how good that team is, but how good it truly is will be decided after the week’s results. This process repeats itself every week – meaning that previous performance matters, but the most recent performance matters more.

Now, since this is a Bayesian model, we need to talk about its priors. The chosen priors are uninformative and shown below:

Three things that are true with the static model remain true with the dynamic model. The first is that since the priors are uninformative, any posterior estimates will be dependent mostly on the values of the likelihood function, which still has a known form, because $X_{ij}^t|\theta$ has a known distribution. The second is that the model is unidentifiable without restrictions being added. The restrictions employed are that the sum of all the offensive factors of a given week has to be 0, and the same goes for the defensive factors and the home court factors. Then, if a league has N teams,

Lastly, we are still using MCMC methods to sample the paramaters’ posterior distributions.


First, the part of the experiment that didn’t go as planned: the original plan for this model was to have an extra prior, one for the variance of the added variance term. This means that I wanted the list of priors to the model to look like this

which is equal to what I showed before, except for the last line.

The importance of that last line can’t be understated. Letting the variance of the added term be a random variable lets the data tell us how much we should take previous performance into account. When it is handled as a known constant, what I’m actually saying is that I have an idea as to how much the parameters should vary, which is something I don’t feel comfortable with. Also, having a prior for the variance was what I did for the soccer data, and that proved helpful, as the dynamic slightly outperformed the static model, as can be seen in the following table.

ModelGames analyzedHit ratePLNPLSPL
Dynamic Model4190.5449.586×10^{-181}7.51×10^{19}0.3718
Static Model4190.5116.268×10^{-181}5.14×10^{19}0.3715
Simple Model4190.5011.585×10^{-189}1.3×10^{11}0.3543
Null Model419NA1.22×10^{-200}10.333
Model comparison – Soccer data.

A different version of the above table was shown on the second post of this series, when I was introducing methods of model comparison. The difference between the tables comes in the first line of the one presented in this post, which shows the performance of the dynamic model when applied to all the soccer data. In fact, the dynamic model was the very best in predictive accuracy of all the models.

The performance history above shows why it is unfortunate that I couldn’t run the dynamic model for the NBA data with a prior for the variance of the added term. I can’t be completely sure about the reasons for my inability to run the desired model, but, based on what my advisor told me a few years ago (that I should be careful with basketball data, since the high number of points scored in games could trouble the estimation process of MCMC methods) and a look that I had at the estimates of the model, I’m convinced that the estimation process ended up stuck on a local minimum before finding the global minimum (in words, it got stuck on what it deemed a good solution and stopped looking for the best solution).

I will shortly move on from this discussion, but first I would like to show a smidgen of the evidence that convinced me. The following image shows the estimates for the offensive qualities of four teams over the weeks, with the black lines showing the quality of the worst and best teams according to the static model.

What that told me was that the estimate for the variance of the added term ended up too high, allowing the model estimates for the parameter to fluctuate too much. That is why Golden State’s estimates sometimes go below the lower black line, meaning that the Warriors were sometimes evaluated as the worst offensive team in the league. They also reach much higher than the upper black line, which means that they were measured as the best offensive team by miles. An estimation process that says a team can have the worst offense in the league or the best offense in NBA history, or anything in between, is not very useful. This effect also presented itself with the dynamic model’s predictions, which fared much worse than its competitors’.

Since the model I wanted to use was not usable (there are ways to try and make it so, but they would take months, and some of the simpler measures aren’t things I am particularly confident in), I had to compromise. I had to treat the variance as a known constant, even though that amounts to doing something I am not confortable with, which is telling the model how much the parameters should be allowed to vary. Because I was not too confortable with that idea, I tried to put in a value that would allow the estimates to fluctuate enough that a team would be allowed to go from the best to the worst, if the data showed that to be reasonable.

Without further ado, the following pictures will show the offensive factor estimates for four teams, using the model with known variance (which will still be called the dynamic model, just like the failed model was).

This image shows estimates that are much better-behaved (fluctuating less wildly), which in turn leads to predictions that are less wild as well. Sadly, I can’t show the graphs for every team, since that would make this post unbearably long. I will include graphs that describe the amount of fluctuation for each parameter per team over the 2018-19 season.

Despite the large number of lines, my purpose in displaying this data is to make clear that the dynamic model lets team quality vary over time – sometimes by a lot. As such, the results for the model’s predictions take into account that the model arrived at these results in a very different way than the static model did.

The first graph shows how offensive qualities varied over the weeks; as predicted, it is a mess of colors. However, if you follow a specific line, like Golden State’s (the yellow line at the top, showing that the Warriors were considered to have the best offense in the league by miles for a few weeks in the middle of the season), you can look at how the team’s offense was always near or at the top.

The second graph shows the same movement for the teams’ defenses, with the Atlanta Hawks’ line (the red one at the bottom that doesn’t change much) showing that their defense was considered quite consistent in its ineptitude.

The last graph shows the same movement for the teams’ home court advantages. The one line that sticks out to me is the Chicago Bulls – the red line that is the lowest from about Week 9 through the end of the season. If you track the line’s progress, you’ll see that the Bulls started the season with an average home court advantage and then bottomed out hard, especially in the last few weeks.

Having shown the estimates, all that is left to do is show the results for the dynamic model’s predictions, comparing them to the results for the static model as well as the results of competing models (first shown here.)

ModelGames analyzedHit ratePLNPLSPL
Dynamic Model9970.6632.87×10^{-269}3.838×10^{31}0.538
Static Model9970.6553.08×10^{-269}4.125×10^{31}0.538
Simple Model9970.5878.07×10^{-295}1.081×10^{6}0.507
Null Model997NA7.466×10^{-301}10.5
Model comparison – Soccer data.


What these results show is that the dynamic model fared a bit better than the static one on a few measures, and a bit worse on others. The result is not inspiring, since the dynamic model is so much more complex; a more complicated model didn’t overshadow a less complicated one. When looking at the issue through the lens of the Parsimony Principle, it may be that the dynamic model isn’t worth the effort, even though we may have many reasons to think that a model that lets team qualities vary through time should be much better than one that doesn’t.

In my experience with soccer data, however, that counterpoint would be wrong. When writing my final project, I was having problems with running the dynamic model where the variance of the added term is a random variable. My advisor and I tried multiple values for that variance, when treating it as a known constant. The result was what should be expected when you are trying to estimate by trial and error: there were big misses, small misses, and no hits. There were situations where the dynamic model fared similarly to the static model, just like shown in the last table, and there were others where the static model was clearly better. It was only when the variance was allowed to be a random variable that the dynamic model decidedly won the competition.

As such, I expect that when the dynamic model is fixed, it should clearly outperform its direct competition. I would still not expect it to beat CARMELO, but as I’ve previously stated: a model that gets 80% or more of CARMELO’s performance using data that is vastly less complex than that which is used by FiveThirtyEight is probably a good model.

Finally, I have to admit that I myself was disappointed with the results shown in this post. It took me a whole month to get results that were reasonable (because I spent almost three weeks trying to get the dynamic model with a prior for the variance to work). In the end, they were not so much better than the static model’s that I could honestly claim it was worth the trouble, even understanding that the purpose of this series of posts was not to show that my model is the best at predicting sports results. The initial goal was simply to demonstrate that the model was just very good at predicting match results, while taking simple inputs and being mathematically robust.

In discussing these difficulties with Greg, we agreed on the importance of reporting my results even if they were not what I wanted. Greg also pointed something that is quite important: the small difference in performance between the models may show that we should rethink how much weight we give to variations of team quality. That seemed like a very interesting point, because my work with sports dynamic data also gives me that impression.

To be clear, I do not think that accounting for these variations is meaningless. I just think that their impact may be overstated. Both the soccer data and the basketball data can be used to make my point. Even though the dynamic model lets team quality fluctuate by quite a bit, its predictions are not amazingly more accurate that the ones given by the static model. One possible explanation of this result is that though there are gains from accounting for things other than how many points a team scores and how many its opponent score, most of the heavy lifting is done by simply modelling points scored by both teams. Therefore, when adding factors that show themselves to be significant in improving model quality, we are still talking about marginal gains.

I have now finished the part of this series where I focus on defining the model. The next post will be about how to improve this model, and, with that in mind, I will revisit some of the problems that arose during the last four years of my life, as well as ideas that interested me during that time frame.


Leave a Reply