Rapid responses are electronic comments to the editor. They enable our users
to debate issues raised in articles published on bmj.com. A rapid response
is first posted online. If you need the URL (web address) of an individual
response, simply click on the response headline and copy the URL from the
browser window. A proportion of responses will, after editing, be published
online and in the print journal as letters, which are indexed in PubMed.
Rapid responses are not indexed in PubMed and they are not journal articles.
The BMJ reserves the right to remove responses which are being
wilfully misrepresented as published articles or when it is brought to our
attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not
including references and author details. We will no longer post responses
that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
What's wrong with arguments against multiplicity adjustments (Letter to the editor concerning BMJ 1998;316:1236-1238)
Ralf Bender (1,2), Stefan Lange (2)
(1) Department of Metabolic Diseases and Nutrition
(WHO-Collaborating Centre for Diabetes)
Heinrich-Heine-University of Düsseldorf
D-40001 Düsseldorf, Germany
(2) Dept. of Medical Informatics, Biometry and Epidemiology
Ruhr-University of Bochum
D-44780 Bochum, Germany
Address for Correspondence:
Ralf Bender, Ph.D.
Dept. of Metabolic Diseases and Nutrition
Heinrich-Heine-University of Düsseldorf
P.O. Box 101007
D-40001 Düsseldorf, Germany
Tel.: +49 (0)211 81-13981
Fax: +49 (0)211 81-14294
E-Mail: bender@uni-duesseldorf.de
Number of words: 440 (excluding references)
Dear Sir - Recently, Perneger tried to establish that adjustments for multiple testing are unnecessary [1]. However, the main arguments against multiplicity adjustments are based upon principal misunderstandings and a lack of knowledge about simultaneous statistical inference.
Firstly, Perneger equated multiple test adjustments with Bonferroni corrections [1]. The Bonferroni procedure [2] ignores dependencies among the data and is therefore much too conservative if the number of tests is large. Hence, we agree with Perneger that the Bonferroni method should not be routinely used. This is, however, no argument against the use of multiplicity adjustments in general, as there are a number of alternative multiple test procedures3 totally ignored by Perneger [1].
Secondly, Perneger argued that multiple test adjustments are only concerned with the global null hypothesis that all individual null hypotheses are true simultaneously [1]. This is not true. The best multiple test procedures control the multiple level, also called experimentwise error rate in the strong sense, which is the probability of rejecting falsely at least one true individual null hypothesis, irrespective of which and how many of the other individual null hypotheses are true [3]. The control of the multiple level is the best protection against wrong conclusions and leads to the strongest statistical inference [3].
Thirdly, Perneger claimed that a multiple test procedure can only lead to the rejection of the global null hypothesis without possibility of concluding which tests are significant and which are not [1]. In fact, the contrary is true. Multiple test procedures were developed with the aim to conclude which tests are significant and which are not, but with control of the appropriate error rate.
Fourthly, Perneger suggested that Bonferroni adjustments should be made in studies without pre-specified hypotheses [1]. As the number of tests in such studies is frequently large and the Bonferroni procedure has low power a consequent observance of this rule would implicate that a large number of true effects - if not all - would be overlooked. Moreover, in exploratory studies without pre-specified hypotheses there is typically no clear structure in the multiple tests, so that an appropriate multiple test adjustment is difficult or even impossible. Hence, we prefer that data of exploratory studies are analysed without multiplicity adjustment. However, "significant" results based upon exploratory analyses should clearly be labelled as exploratory results. To confirm these results the corresponding hypotheses have to be tested in future confirmatory studies.
In confirmatory studies with a pre-specified goal represented by multiple hypotheses, in which significance tests are used as statistical evaluation tools for final decision making, the use of multiple test procedures is mandatory [4]. For this purpose a number of multiple test procedures beyond the Bonferroni method have been developed [3-5], which deserve wider use in biomedical research.
References
1. Perneger TV. What's wrong with Bonferroni adjustments. BMJ 1998;316:1236-1238.
2. Bland JM, Altman DG. Multiple significance tests: The Bonferroni method. BMJ 1995;310:170.
3. Bauer P. Multiple testing in clinical trials. Stat Med 1991;10:871-890.
4. Sankoh AJ, Huque MF, Dubin N. Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Stat Med 1997;16:2529-2542.
5. Westfall PH, Young SS. Resampling-Based Multiple Testing. New York: Wiley, 1993.
What's wrong with arguments against multiplicity adjustments
What's wrong with arguments against multiplicity adjustments (Letter to the editor concerning BMJ 1998;316:1236-1238)
Ralf Bender (1,2), Stefan Lange (2)
(1) Department of Metabolic Diseases and Nutrition
(WHO-Collaborating Centre for Diabetes)
Heinrich-Heine-University of Düsseldorf
D-40001 Düsseldorf, Germany
(2) Dept. of Medical Informatics, Biometry and Epidemiology
Ruhr-University of Bochum
D-44780 Bochum, Germany
Address for Correspondence:
Ralf Bender, Ph.D.
Dept. of Metabolic Diseases and Nutrition
Heinrich-Heine-University of Düsseldorf
P.O. Box 101007
D-40001 Düsseldorf, Germany
Tel.: +49 (0)211 81-13981
Fax: +49 (0)211 81-14294
E-Mail: bender@uni-duesseldorf.de
Number of words: 440 (excluding references)
Dear Sir - Recently, Perneger tried to establish that adjustments for multiple testing are unnecessary [1]. However, the main arguments against multiplicity adjustments are based upon principal misunderstandings and a lack of knowledge about simultaneous statistical inference.
Firstly, Perneger equated multiple test adjustments with Bonferroni corrections [1]. The Bonferroni procedure [2] ignores dependencies among the data and is therefore much too conservative if the number of tests is large. Hence, we agree with Perneger that the Bonferroni method should not be routinely used. This is, however, no argument against the use of multiplicity adjustments in general, as there are a number of alternative multiple test procedures3 totally ignored by Perneger [1].
Secondly, Perneger argued that multiple test adjustments are only concerned with the global null hypothesis that all individual null hypotheses are true simultaneously [1]. This is not true. The best multiple test procedures control the multiple level, also called experimentwise error rate in the strong sense, which is the probability of rejecting falsely at least one true individual null hypothesis, irrespective of which and how many of the other individual null hypotheses are true [3]. The control of the multiple level is the best protection against wrong conclusions and leads to the strongest statistical inference [3].
Thirdly, Perneger claimed that a multiple test procedure can only lead to the rejection of the global null hypothesis without possibility of concluding which tests are significant and which are not [1]. In fact, the contrary is true. Multiple test procedures were developed with the aim to conclude which tests are significant and which are not, but with control of the appropriate error rate.
Fourthly, Perneger suggested that Bonferroni adjustments should be made in studies without pre-specified hypotheses [1]. As the number of tests in such studies is frequently large and the Bonferroni procedure has low power a consequent observance of this rule would implicate that a large number of true effects - if not all - would be overlooked. Moreover, in exploratory studies without pre-specified hypotheses there is typically no clear structure in the multiple tests, so that an appropriate multiple test adjustment is difficult or even impossible. Hence, we prefer that data of exploratory studies are analysed without multiplicity adjustment. However, "significant" results based upon exploratory analyses should clearly be labelled as exploratory results. To confirm these results the corresponding hypotheses have to be tested in future confirmatory studies.
In confirmatory studies with a pre-specified goal represented by multiple hypotheses, in which significance tests are used as statistical evaluation tools for final decision making, the use of multiple test procedures is mandatory [4]. For this purpose a number of multiple test procedures beyond the Bonferroni method have been developed [3-5], which deserve wider use in biomedical research.
References
1. Perneger TV. What's wrong with Bonferroni adjustments. BMJ 1998;316:1236-1238.
2. Bland JM, Altman DG. Multiple significance tests: The Bonferroni method. BMJ 1995;310:170.
3. Bauer P. Multiple testing in clinical trials. Stat Med 1991;10:871-890.
4. Sankoh AJ, Huque MF, Dubin N. Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Stat Med 1997;16:2529-2542.
5. Westfall PH, Young SS. Resampling-Based Multiple Testing. New York: Wiley, 1993.
Competing interests: No competing interests