My first observation is that if four players are
accurately rated, the mean updated rating for a player
given by the Lehman
formula has no problems, but the variance of the updated
rating is larger when the player is highly rated (and,
conversely, low when the player is lower rated).
Suppose North, South, East and West have ratings
,
,
and
, respectively.
Also, assume
that these players are accurately rated.
If
denotes the observed fraction of matchpoints won by
NS in
games, then the Lehman system says that the rating
for North based on the results of the games (not accounting
for past performance) is given by
The randomness in the above expression comes from the game
outcome
.
The rest of the expression consists of constants.
The probabilistic behavior of
approximately follows
a ``Beta'' distribution.
The expected value of
,
, is
assumed to be
Also, the variance of
(a description of the variability
of the fraction of matchpoint scores) is approximately proportional to
, so that
for some fixed value
, which should be fairly close to 1.
Under these assumptions, the expected rating for North in
the Lehman system is what it should be.
Thus the expected value of
is
, which is what
one would hope.
Using properties of variances,
the variance of
is computed as
The resulting formula reveals that the variance of
is
larger when
is greater.
This is likely an unintended feature of the Lehman system.
One interpretation of this result is that higher-rated
players have ratings that are less trustworthy than
lower-rated players, assuming equal numbers of completed games
(and similar frequency of past results).
This also means that higher ratings are more
erratic than lower ratings, even when underlying player
strengths are stable.
A difficulty with a bridge rating system that rates
individuals occurs when players' underlying strengths are
different from their ratings.
For the Lehman system, if
players' ratings do not correspond exactly with their underlying
strength (which is almost always true), then the mean
updated rating for a misrated player does not equal
the rating the player should expect to receive
based on the true strength.
This sounds like a fundamental problem with the Lehman
system, but I believe this issue is likely to be a problem with any
bridge rating system that attempts to rate individuals
(and my personal bias to develop a system that rates
individuals).
Regardless, here is how this issue arises within the Lehman
system.
This idea can be made clearer with specific formulas.
Suppose South, East and West are accurately rated with
ratings
,
and
.
Also, suppose North has a rating
but has an actual
underlying strength of
.
This means that, if we knew
, the expected proportion
of matchpoints for NS is given by
If the Lehman system were self-consistent, the mean updated
rating for North should be
The reason is that, in the above expression,
the fraction represents the proportion of matchpoints
North should be earning if
was known to be the
correct rating, and the second term is the total sum of
ratings to be divided among the four players.
In actuality, the mean updated rating for North given by the
Lehman system is
The ratio of the mean rating under the Lehman system to the
mean rating one should have can be simplified to
This implies that if a player is stronger than his/her
rating
(
),
then the updated rating (before averaging with past
performances) will be, on average, lower than it should be,
assuming the partner is correctly rated.
In fact, the only way for the Lehman system to produce an
intuitive result when North is misrated is if
South is misrated by the same multiplicative
amount (that is, as long as
).
As mentioned above, this criticism of the Lehman system
is minor because it is difficult
to know a priori how to make inferences about
individual playing strength from the result of a partnership.
One way of accounting for this problem is to incorporate the
quantified uncertainty of a player's rating into the rating
update calculations.
If rating uncertainty is quantified, then the effect of misratings
will be mitigated.
In effect, rating updates will recognize that players may be
somewhat misrated, and the calculations could proceed under
this assumption.