Last year, I faced impossible odds. I had a 1 in 9.2 quintillion shot of scoring myself a perfect bracket. And I fell miserably short. All four models I developed picked the wrong team in the first game that was played. It was fun while it lasted. One model was good enough to predict Virginia as the winner (and all models predicted Virginia in the final), but picking the winner isn't good enough, especially when that winner was a 1-seed.
This year the challenge remains the same. To develop some models that try to stay as perfect as long as possible. Now that we have a baseline from 2019, how much better can we be in a year where chaos in to be expected and every team is capable of losing? Hopefully one model can get past the first round. I can only hope.
First off, based off of last years models, I'm retiring the Estimated Difference Likelihood Model (EDLM) because it performed the worst as it picked Nevada to win (Nevada was upset in the first round) and due to it's complexity it took longer to organize and make predictions with. It is also more vulnerable to small sample sizes than the other models are.
Model | Champion | Runner-Up | Points |
---|---|---|---|
SPM | Virginia | Duke | 115 |
MASPM | Duke | Virginia | 88 |
EDM | Gonzaga | Virginia | 86 |
EDLM | Nevada | Virginia | 83 |
But given that I'm only retiring one model, it means that we start this year with three models to work with and compare against. In addition to those three models, I plan on developing two additional models utilizing different probabilistic aspects to achieve different goals. One model will use some form of Expectation-Maximization (EM) algorithm while the other one will look to maximize utility using probability by picking contrarian picks.
If the five models disagree with each other enough (which they didn't last year), I may also do a composite model for fun. Unlike last year, at the end of this chaotic month, I plan on cleaning up my dirty code and making it available on my Github for anyone to borrow.
Like last year, I'll cover some of the conference tournaments as trial runs for the different models and to just their effectiveness between each other. Throughout conference tournament season, I'll be sure to highlight the differences between the models and how that expresses itself in the different predictions the models may make.
To conclude this post, I'll briefly cover last year's models and give some links to last year's posts to cover each in more detail. The first round of conference tournaments starts Tuesday with four leagues starting tournament play. I'll try and cover each of the conferences the day before play as making predictions beforehand is more fun and a lot more impressive and gives more credence that the math magic I'm wielding is in fact doing something.
Returning Models
Schedule Plus / Minus (SPM) - 1st Place, 2019
This model is used to measure the difference between two teams by comparing their results in terms of the difficulty of one's schedule and the results that a team is able to obtain against that schedule. This focuses mostly on multiplying win-loss ratios of different teams against the strength of their respective schedules and then adding that score in the case of wins while subtracting that score in the case of losses and then taking the average score over the course of the season. Teams that rank the highest in SPM typically only lose to other teams that win a lot and have lots of wins over other winning (strong) teams.
To read more about SPM, click here.
Margin-Adjusted Schedule Plus / Minus (MASPM) - 2nd Place, 2019
This model is a derivative of the SPM model we covered earlier. It also measures schedule difficulty and how a teams gets results against that schedule. The difference is that this model also takes margin of victory into account as well. Rather than view wins and losses as binary, a sigmoid function is applied to partially assign wins or losses based on whether the result was a close game versus a sizable blowout. Teams that rank the highest in MASPM typically win big and win regularly against other competition.
To read more about MASPM, click here.
Expected Difference Metric (EDM) - 3rd Place, 2019
This model is used to measure the difference between two teams by comparing their results against other teams in terms of scoring margin. The EDM rating is based on the idea that as the margin of victory increases as the difference between the two teams increases. So, teams only increase this rating after a game if they outperform the expected margin. This means losing teams can increase their rating in close loses and winning teams can decrease their rating in tight wins. Teams that rank the highest in EDM win big regularly and dominate weaker competition in blowouts.
To read more about EDM, click here.