Statistic Notes: Regression towards the mean
BMJ 1994; 308 doi: https://doi.org/10.1136/bmj.308.6942.1499 (Published 04 June 1994) Cite this as: BMJ 1994;308:1499All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
After reading "regression towards the mean" (BMJ:1994:308:1499) I
felt compelled to share some helpful information related to regression
that is not present in most published literature on the topic.
The regression toward the mean simply minimizes the vertical
distances of the points from the fitted Y vs X line and the equations are
well known. Slope (Mv) is equal to SOSxy/SOSx where SOS refers to sums of
squares. The subscript 'v' denotes vertical distance minimization.
This method is ideal for the homoskedastic assumption where X is a
measurement variable and Y has constant variation across all X (and Y).
A second practice is to plot X vs Y and perform the same operation
and invert the answer so it applies to a Y vs X condition. Effectively,
this Y vs X slope (Mh) is equal to SOSy/SOSxy or Mv/R^2 where the 'h'
refers to horizontal distance minimization and R^2 is the coefficient of
determination. This method is similar to above but X has the constant
variation and Y is assumed to have no error.
A third method is to minimize orthogonal or perpendicular distances.
Here slope (Mp) equals q+SQRT(q^2+1) where q equals (SOSy-SOSx)/(2*SOSxy).
This method assumes both X and Y have error from their true value. It also
represents the principle axis of the elliptical scatter cloud for
bivariate normal data (i.e. Galtons' plot).
Lastly, a fourth method is to minimize the triangular areas created
by each point and the fitted line. Here slope (Mt) equals SQRT(Mv*Mh)
Statistical publications should do a better job explaining these four
methods, especially since most bio-statistical and clinical data does not
meet the homoskedastic assumption. Always using regression toward the
mean to minimize vertical distances, especially for elliptical looking
data with variability on X and Y is mediocrity.
Competing interests:
None declared
Competing interests: No competing interests
A caution when using regression
The article "Regression towards the mean", outlined some important
aspects of regression that are helpful to many. In a likewise fashion,
this response is offered.
Often medical instruments must be compared that measure across a
range when a master, high accuracy standard does not exist or can not be
used due to cost or risk. Examples include tonometers that measure
intraocular pressure under the cornea or blood pressure monitors. It is
not easy to evasively put high accuracy sensors inside the body.
For such comparisons, instruments are often tested using regression
as discussed in the main article and considered to be dissimilar (loosely
defined here) because the slope is not as close to unity as desired.
The often overlooked fact is that the ordinary least squares (OLS)
method should rarely be used in such cases. It underestimates the mean
since it minimizes the sum of squared VERTICAL distances.
When measurement error exists for both variables, it is better to
measure perpendicular distances of the points from the fitted line. Then
the slope would not be underestimated as discussed above. When making such
a device comparison, a simple shortcut exists. One can determine the slope
of Y vs X as well as the slope of X vs Y. The true slope lies somewhere
between. Orthogonal regression is just one appropriate method - others may
apply. Some titles include Deming, Type II, Geometric or Principle Axis
regression.
Though many references exist such as Bartlett (1949), Berkson (1950),
Deming (1943) and Madansky (1959) many continue to apply the incorrect
regression type for their problem.
As an aside, often when comparing two medical instruments, other
comparisons should be made as well for full understanding:
1.) Bland-Altman plot (1983): Subjective check for bias trends
2.) Maloney-Rastogi method (1970): Quantitative check for a linear
bias trend
3.) A simple paired t test: To check simple DC bias
If repeat runs (replicates) exist for each treatment (person), then a
few more tools can be explored:
4.) A paired t test of "within-treatment" variation for the two
instruments. A transform can be applied as discussed by Bland and Altman
in other articles.
5.) The pooled 'per instrument' treatment variances may be compared
using the F test. This compares the repeatability or precision in addition
to bias.
6.) The standard deviation of differences, sdiff, helps identify a
measure of difference across a range; but it does not indicate how much
variation comes from each instrument or the measured item.
7.) Maximum Likelihood Estimators (MLE's) can be used to test more
complex models and assumptions.
It should be noted that the R-squared and slope results from OLS
regression and the first three tests described above are important but the
second four tests are also needed to compare precision (if repeat runs
exist)
Both bias and precision must be checked to test two instruments for
interchangeability, practical agreement or "sameness".
Comparing precision has been discussed by Grubbs, Jaech, Lin,
Hanumara, Thompson and many others but for cases without repeat runs per
treatment.
Hopefully, this article can serve as a helpful guideline for
comparisons that need more than just the OLS regression even though the
detailed references and proofs were omitted.
OLS regression is a good tool; but it should not be alone in the
‘statistical comparison’ toolbox.
Competing interests:
None declared
Competing interests: No competing interests