Kimball Basketball Model (KBM)
2012 is the second year that KBM has been in operation. There have been a number of modifications that have been made since 2011, which have been incorporated into the following text.
The KBM is an unbiased calibrated iterative model. It is unbiased in that it begins with the assumption that all teams and players are equal. It then computes predicted scores (they will all be the same the first time this is done), compares them against observed scores, alters the players and teams accordingly, and computes the values again, thus repeating the cycle. In this manner the numbers representing each player and team are calibrated against observed target values repeatedly until the modeled and observed scores are acceptably close (20 iterations are currently being used).
Each player has certain skill numbers used to predict their score for any given game. The player's skill numbers are as follows:
1) Free throw shooting skill
2) 2-point shooting skill
3) 3-point shooting skill
4) Free throw defending skill
5) 2-point defending skill
6) 3-point defending skill
To estimate how many of each type of basket a player will score in a game, simply divide his shooting skill by his opponent's defending skill, and multiply the result by the number of minutes played. For instance, if player X has a 2-point shooting skill of 1.0, his opponents have a defense-againt-2-pointers skill of 10, and this player is expected to play 30 minutes, then KBM estimates that he will score 1.0 X 30 / 10 = 3, or 3 two-pointers.
Player skills are listed on the player rankings page, with the shooting skill multiplied by a factor of 100, and the defense skill multiplied by a factor of 10 (for visual purposes). Values on that page are also rounded to the nearest whole number. The 'Offense Skill' is a weighted average of the individual shooting scores (weighted according to the value of the basket), as is the 'Defense Skill.'
The location of each game is taken into consideration to compensate for home-court-advantage. Each team has a "home mutliplier" and "away multiplier" that are multiplied by the predicted score. Neutral games use the average between the home and away multipliers. Both home and away multipliers are further divided into offense and defense. Each team starts with multipliers of 1, and this number is adjusted according to observed values in the calibration process.
KBM also factors in time. When performing calculations to compare against observed values, the KBM weighs recent games heavier than older games. The weighting slope is inverse-exponential, with a half-life of 30 days and a convergence value of 0.1. Thus, teams that have performed better in recent games are modeled higher than teams that performed better at the beginning of the season.
As of 2012, KBM factors injuries. When estimating scores of games it is necessary to estimate the number of minutes each player will play. KBM uses a time-weighted formula for this. However, when a player is on the injured list that player is now given an estimated time of zero. The minutes that he would have played are distributed among the other players, depending on how many extra minutes they have available, their similarity to the injured player in position, and their rank on the team's depth chart.
As of 2012, KBM also now utilized performance under pressure. Because this required a large sample set for an accurate determination, it is only used at the end of the season during the tournament. The general concept is that some teams play harder against better teams, and some teams play harder against worst teams. For the tournament it is assumed that all teams play as if they were playing against the hardest team. This is a factor derived through linear regression and multiplied by the team's overall score. As a result, many more upsets have been forecasted than in 2011.
The KBM is replete with weighted averages. For instance, the observed score against an individual player is impossible to assess absolutely. It is obvious who scored, but it is impossible, looking at numbers alone, to determine which defender was at fault. Thus, the points scored against a particular player during a game are estimated using a weighted average, weighted according to difference in position and minutes played by the scorer.
The KBM assumes that each player is most likely to interact with players of the same position, and half as likely to interact with players of the next-nearest position. For instance, a Guard-Guard interaction is twice as likely as a Guard-Guard/Forward interaction, and four-times as likely as a Guard-Forward interaction. A Guard-Center interaction is the least likely of all, being 16-times less likely than a regular Guard-Guard interaction.
Take the following test case. We will estimate the score against a hypothetical player named Player A, who is a Forward. The observed points for the other team are as follows:
Opponent A, Forward, 10 minutes, 2 pts
Opponent B, Guard, 20 minutes, 8 pts
Opponent C, Guard, 35 minutes, 14 pts
Opponent D, Guard/Forward, 30 minutes, 18 pts
Opponent E, Forward/Center, 25 minutes, 6 pts
Opponent F, Center, 15 minutes, 4 pts
Opponent G, Forward, 25 minutes, 14 pts
Their values are weighted and summed, as follows:
Opponent A, 10 x 2 x 16 = 320
Opponent B, 20 x 8 x 4 = 640
Opponent C, 35 x 14 x 4 = 1960
Opponent D, 30 x 18 x 8 = 4320
Opponent E, 25 x 6 x 8 = 1200
Opponent F, 15 x 4 x 4 = 240
Opponent G, 25 x 14 x 16 = 5600
The sum, 14280, is divided by the sum of the factors, 1280, and the result is 11.15 points scored against Player A. This number is then run through another weighted average along with the values of all of Player A's teammates, according to how much time they played. All of their numbers are finally normalized so that the sum of all all points scored against Player A and his teammates matches the points scored by the opponents. Assuming Player A only played for half the game, the points scored against him would be probably reduced in half, from 11.15 to 5.58.
A similar weighted average is performed to estimate the skill score against a player in any particular game.
Beginning at the 11th iteration the home multiplier takes effect. All teams begin with a multiplier of 1, or no multiplier, and the total predicted score for home, away and neutral games is totaled. This score is compared against the respective observed scores, and the multipliers are adjusted using the same 50% rule mentioned above. The home multiplier applies to the entire team, and not individual players.
There are, in actuality, four multipliers. One for offense scores, one for defense scores, both for home and away games. The multipliers for neutral games are assumed to be the raw average, or midpoint between the home and away games.
The model is then ready to be applied to future games. Predicting the score is easy, as it was an important part of the calibration process already. For the tournament, these scores are then multiplied by each teams "performance under pressure" multipliers.
There is always error inherent in the model, and the model can see how much error has been observed in past games. Normal distribution curves are computed from this observed error, from which a percent likelihood of winning can be computed for each team.
To compute the percent likelihood that a team will advance to a given round, that team is pitted against every possible opponent. The percent likelihood of team A reaching that round is multiplied by the percent likelihood that team B will reach that round, then by the percent likelihood of winning that game. These results are all added to together for each team, and the results is the percent likelihood that each team will advance to that round.
See the results for this year's percentage analysis here.
-Clint Kimball
|