leh1

Comments on the Lehman rating system

Prof. Mark E. Glickman

Overall, the Lehman system, despite its mathematical simplicity, is designed with most of the properties one might want in a rating system. Past performances are downweighted relative to new results, a formula allows players to compute the expected outcome of a game or set of games, better than expected performances will result in rating gains, and worse than expected performances will result in rating losses. Several undesirable features of the Lehman system, however, are worthy of comment. I discuss these below.

My first observation is that if four players are accurately rated, the mean updated rating for a player given by the Lehman formula has no problems, but the variance of the updated rating is larger when the player is highly rated (and, conversely, low when the player is lower rated).
Suppose North, South, East and West have ratings , , and , respectively. Also, assume that these players are accurately rated. If denotes the observed fraction of matchpoints won by NS in games, then the Lehman system says that the rating for North based on the results of the games (not accounting for past performance) is given by

$\begin{displaymath} r_N' = X_n(r_N+r_S+r_E+r_W)\left( \frac{r_N}{r_N+r_S} \right). \end{displaymath}$

The randomness in the above expression comes from the game outcome . The rest of the expression consists of constants. The probabilistic behavior of approximately follows a ``Beta'' distribution. The expected value of , $\E(X_n)$ , is assumed to be

$\begin{displaymath} \E(X_n) = \frac{r_N + r_S}{r_N+r_S+r_E+r_W} \end{displaymath}$

Also, the variance of (a description of the variability of the fraction of matchpoint scores) is approximately proportional to $\E(X_n)(1-\E(X_n))/n$ , so that

$\begin{displaymath} \Var(X_n) = k \frac{\E(X_n)(1-\E(X_n))}{n} \end{displaymath}$

for some fixed value , which should be fairly close to 1. Under these assumptions, the expected rating for North in the Lehman system is what it should be.

$\begin{eqnarray*} \E(r_N') &=& \E(X_n(r_N+r_S+r_E+r_W)\left( \frac{r_N}{r_N+r_S}... ... (r_N+r_S+r_E+r_W)\left( \frac{r_N}{r_N+r_S} \right) \\ &=& r_N \end{eqnarray*}$

Thus the expected value of is , which is what one would hope.
Using properties of variances, the variance of is computed as

$\begin{eqnarray*} \Var(r_N') &=& \Var(X_n(r_N+r_S+r_E+r_W)\left( \frac{r_N}{r_N+... ...}{(r_N+r_S)^2}\\ &=& \frac{k(r_E+r_W)/(r_N+r_S)}{n} \cdot r_N^2 \end{eqnarray*}$

The resulting formula reveals that the variance of is larger when is greater. This is likely an unintended feature of the Lehman system. One interpretation of this result is that higher-rated players have ratings that are less trustworthy than lower-rated players, assuming equal numbers of completed games (and similar frequency of past results). This also means that higher ratings are more erratic than lower ratings, even when underlying player strengths are stable.
While the Lehman system recognizes that current game outcomes should have more impact on ratings than past outcomes, the particular mechanism for ``averaging'' past and present may be too simplistic. The Lehman system as implemented is a naturally deflationary system. In other words, as time proceeds, the meaning of a rating of, say, 55 connotes lower playing strength. The reason is based on an important assumption, namely that, on average, players joining OKbridge tend to improve over time, and that no mechanism in the Lehman system injects the extra rating points to account for the overall improvement. A feature of a rating system that seems desirable is to have a rating denote the same strength over time. Furthermore, a possible ``burn-in'' period may be necessary while a player who first joins OKbridge becomes used to playing online, as his/her results may not reflect bridge-playing ability.
A feature of my Glicko rating system for chess that does not appear in the Lehman system is the quantification of rating uncertainty into the rating update calculations. When opponents have ratings that can be trusted (because, for example, they compete frequently so that their strengths are well-estimated from game outcomes), the rating calculations should reflect that the game results provide solid information about the players' abilities. Conversely, if the opponents have untrustworthy ratings (because, for example, they compete every once in a long while), then the game outcomes provide little information about the players' strengths. If two experienced players compete against new OKbridge members, then intuitively the results of the game should have a (potentially) dramatic impact on the new players, but very little impact on the ratings of the experienced players. The Lehman system does not distinguish between these types of players.
A difficulty with a bridge rating system that rates individuals occurs when players' underlying strengths are different from their ratings. For the Lehman system, if players' ratings do not correspond exactly with their underlying strength (which is almost always true), then the mean updated rating for a misrated player does not equal the rating the player should expect to receive based on the true strength. This sounds like a fundamental problem with the Lehman system, but I believe this issue is likely to be a problem with any bridge rating system that attempts to rate individuals (and my personal bias to develop a system that rates individuals). Regardless, here is how this issue arises within the Lehman system.
This idea can be made clearer with specific formulas. Suppose South, East and West are accurately rated with ratings , and . Also, suppose North has a rating but has an actual underlying strength of . This means that, if we knew , the expected proportion of matchpoints for NS is given by

$\begin{displaymath} E_1 = \frac{r_N^* + r_S}{r_N^*+r_S+r_E+r_W} \end{displaymath}$

If the Lehman system were self-consistent, the mean updated rating for North should be

$\begin{displaymath} \frac{r_N^*}{r_N^*+r_S+r_E+r_W}(r_N+r_S+r_E+r_W) \end{displaymath}$

The reason is that, in the above expression, the fraction represents the proportion of matchpoints North should be earning if was known to be the correct rating, and the second term is the total sum of ratings to be divided among the four players. In actuality, the mean updated rating for North given by the Lehman system is

$\begin{eqnarray*} E_2 &=& \E(r_N') = \E(X_n(r_N+r_S+r_E+r_W)\left(\frac{r_N}{r_N... ..._S)/(r_N^*+r_S+r_E+r_W)} {(r_N+r_S)/(r_N+r_S+r_E+r_W)} \cdot r_N \end{eqnarray*}$

The ratio of the mean rating under the Lehman system to the mean rating one should have can be simplified to

$\begin{displaymath} \frac{E_2}{E_1} = \frac{r_N/(r_N+r_S)}{r_N^*/(r_N^*+r_S)} . \end{displaymath}$

This implies that if a player is stronger than his/her rating (), then the updated rating (before averaging with past performances) will be, on average, lower than it should be, assuming the partner is correctly rated. In fact, the only way for the Lehman system to produce an intuitive result when North is misrated is if South is misrated by the same multiplicative amount (that is, as long as ).
As mentioned above, this criticism of the Lehman system is minor because it is difficult to know a priori how to make inferences about individual playing strength from the result of a partnership. One way of accounting for this problem is to incorporate the quantified uncertainty of a player's rating into the rating update calculations. If rating uncertainty is quantified, then the effect of misratings will be mitigated. In effect, rating updates will recognize that players may be somewhat misrated, and the calculations could proceed under this assumption.

About this document ...

Next: About this document ...

Mark Glickman
2000-11-05