Wednesday, Aug 9: 10:30 AM - 12:20 PM
Invited Paper Session
Metro Toronto Convention Centre
Section on Statistics in Sports
The Elo rating system, originally designed for rating chess players, has since become a popular way to estimate competitors' time-varying skills in many sports. The self-correcting Elo algorithm is simple, intuitive, and often makes surprisingly accurate predictions. But, as it lacks a probabilistic justification, it is unclear how to extend it to include information beyond wins and losses. I will present a close connection between steady-state Kalman filtering and the Elo update. This connection both provides a probabilistic interpretation of Elo in terms of an approximate Bayesian update and a straightforward procedure for modifying it. I use this connection to derive versions of Elo incorporating margins of victory and correlated skills across different playing surfaces in tennis, and show that this improves predictive accuracy compared to both Elo and Glicko.
The Elo rating system contains a coefficient called the K-factor, which governs the amount of change to the updated rating. Currently theoretical studies on the K-factor are sparse, and little is known about the pertinent factors that impact its appropriate values in applications. In this talk we will present a K-factor that is optimal with respect to the mean-squared-error (MSE) criterion, as well as the results of a study of the optimal K-factor's sensitivity to relevant variables. We focus on the case in which the rating update is made after the completion of a round-robin tournament. A simple additive model for the true ratings is adopted to identify the factors that may affect the value of optimal K-factor. We discuss the results showing that the size of the tournament and the variability of the deviation between the true rating and the pre-tournament rating exert a substantial influence on the optimal K-factor. Comparison will be made between the optimal K-factor and the actual K-factor value used by the International Chess Federation in its Elo rating scheme.
Competitor rating systems for head-to-head games are typically used to measure playing strength from game outcomes. Most implemented rating systems assume only win/loss outcomes, and treat occurrences of ties as the equivalent to half a win and half a loss. However, in games such as chess, the probability of a tie (draw) is demonstrably higher for stronger players than for weaker players, so that rating systems ignoring this aspect of game results may produce inaccurate strength estimates. We develop a new rating system for head-to-head games that explicitly acknowledges a tie as a third outcome, with the probability of a tie depending on the strengths of the competitors. Our approach relies on time-varying game outcomes following a Bayesian dynamic modeling framework, and that posterior updates within a time period are approximated by one iteration of Newton-Raphson evaluated at the prior mean. The approach is demonstrated on a large dataset of chess games played in International Correspondence Chess Federation tournaments.