Innovative Developments in Competitor Rating Systems

Scott Evans Chair
George Washington University
Andrew Swift Discussant
University of Nebraska At Omaha
Mark Glickman Organizer
Harvard University
Wednesday, Aug 9: 10:30 AM - 12:20 PM
Invited Paper Session 
Metro Toronto Convention Centre 
Room: CC-718B 



Main Sponsor

Section on Statistics in Sports


How to extend Elo: a Bayesian perspective

The Elo rating system, originally designed for rating chess players, has since become a popular way to estimate competitors' time-varying skills in many sports. The self-correcting Elo algorithm is simple, intuitive, and often makes surprisingly accurate predictions. But, as it lacks a probabilistic justification, it is unclear how to extend it to include information beyond wins and losses. I will present a close connection between steady-state Kalman filtering and the Elo update. This connection both provides a probabilistic interpretation of Elo in terms of an approximate Bayesian update and a straightforward procedure for modifying it. I use this connection to derive versions of Elo incorporating margins of victory and correlated skills across different playing surfaces in tennis, and show that this improves predictive accuracy compared to both Elo and Glicko. 


Martin Ingram

MSE-Optimal K-factor of the Elo Rating System

The Elo rating system contains a coefficient called the K-factor, which governs the amount of change to the updated rating. Currently theoretical studies on the K-factor are sparse, and little is known about the pertinent factors that impact its appropriate values in applications. In this talk we will present a K-factor that is optimal with respect to the mean-squared-error (MSE) criterion, as well as the results of a study of the optimal K-factor's sensitivity to relevant variables. We focus on the case in which the rating update is made after the completion of a round-robin tournament. A simple additive model for the true ratings is adopted to identify the factors that may affect the value of optimal K-factor. We discuss the results showing that the size of the tournament and the variability of the deviation between the true rating and the pre-tournament rating exert a substantial influence on the optimal K-factor. Comparison will be made between the optimal K-factor and the actual K-factor value used by the International Chess Federation in its Elo rating scheme.  


Victor Chan, Western Washington University

Rating competitors in games with strength-dependent tie probabilities

Competitor rating systems for head-to-head games are typically used to measure playing strength from game outcomes. Most implemented rating systems assume only win/loss outcomes, and treat occurrences of ties as the equivalent to half a win and half a loss. However, in games such as chess, the probability of a tie (draw) is demonstrably higher for stronger players than for weaker players, so that rating systems ignoring this aspect of game results may produce inaccurate strength estimates. We develop a new rating system for head-to-head games that explicitly acknowledges a tie as a third outcome, with the probability of a tie depending on the strengths of the competitors. Our approach relies on time-varying game outcomes following a Bayesian dynamic modeling framework, and that posterior updates within a time period are approximated by one iteration of Newton-Raphson evaluated at the prior mean. The approach is demonstrated on a large dataset of chess games played in International Correspondence Chess Federation tournaments.  


Mark Glickman, Harvard University