Turning a blind eye: the success of blinding reported in a random sample of randomised, placebo controlled trials
BMJ 2004; 328 doi: https://doi.org/10.1136/bmj.328.74327.37952.631667.EE (Published 19 February 2004) Cite this as: BMJ 2004;328:432All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
While the proposal by Fergusson et al (1), regarding post hoc
assessment of the success of blinding in RCTs, is problematic, it does
suggest an intriguing dimension of the so-called placebo effect. If the
intervention being studied in the RCT is truly more efficacious than
placebo, then, as noted in several of the response letters (2,3,4),
blinding can be invalidated through patients correctly guessing their
allocation by gauging their individual responses to treatment (leaving
aside the issue of side effects). In such a scenario, improvement due to
the treatment intervention is perhaps compounded by further improvement
brought on through the expectation of improvement, once blinding has been
effectively broken.
This effect might be most pertinent in trials of depression drugs,
where continuous subjectively-assessed outcomes are typically used and
where placebo effects may therefore occur (5). A patient randomized to
receive the drug begins to feel better, correctly guesses his or her
allocation, then possibly benefits further from the very expectation of
improvement. In short, the positive effects of expectation might be more
pronounced in the treatment group and less pronounced in the placebo
group.
The individual assessing outcomes might be similarly affected, since
he or she might also correctly guess the patient's allocation. In studies
with serial assessments, future assessments could then be biased.
For treatments that show dramatic results, RCTs may overestimate the
actual size of the treatment effect relative to placebo. It may then be
more appropriate to speak of testing the whole treatment experience,
rather than purporting to test the specific effectiveness of the
intervention alone.
References
1. Fergusson D, Glass KC, Waring D, Shapiro S. Turning a blind eye:
the success of blinding reported in a random sample of randomised, placebo
controlled trials. BMJ 2004; 328:432.
2. Altman DG, Schulz KF, Moher D. Turning a blind eye: testing the
success of blinding and the CONSORT statement. BMJ. 2004;328:1135.
3. Senn SJ. Turning a blind eye: authors have blinkered view of
blinding. BMJ. 2004; 328:1135-6.
4. Sackett DL.Turning a blind eye: why we don't test for blindness at
the end of our trials. BMJ. 2004; 328:1136.
5. Hrobjartsson A, Gotzsche PC. Is the placebo powerless? An analysis
of clinical trials comparing placebo with no treatment. N Engl J Med 2001;
344:1594-1602.
Competing interests:
None declared
Competing interests: No competing interests
Fergusson et al bring up an excellent point about the need to
formally assess blinding in clinical trials. Blinding is a key component
of a clinical trial and, along with randomization, is responsible for the
lower susceptibility of a clinical trial to bias when compared to an
observational study. When blinding is not properly accounted for, there
is the risk of methodological problems compromising the results. This
issue was recently discussed by Garbe and Suissa who noted the possibility
of detection bias in the Women’s Health Initiative (WHI) clinical trial
(1).
WHI is an excellent case where concerns with the effectiveness of the
blinding can lead to confusion about the results. There was a partial
report in the study on the high rate of unblinding (25% of subjects were
unblinded to a clinic gynaecologist) (2). Of special concern was that the
unblinding was differential (40% of women in the treatment group vs. 7% in
the placebo group) (2). This differential unblinding experienced by
gynaecologists creates concerns that study subjects or investigators could
have also experienced differential unblinding (1). However, without a
formal assessment of the blinding status of participants and investigators
it is impossible to assess the magnitude of this bias.
Properly assessing the effectiveness of blinding is an important step
forward in being able to better interpret the results of clinical trials.
Adopting the recommendations of Fergussson et al would be a solid step
forward in improving research using clinical trials.
Andrei SP Brennan, MA,
Medical Research Assistant
JAC Delaney, MSc, MA,
Statistican
Caroline Hebert, MSc,
PhD Candidate (Epidemiology)
1. Garbe E and Suissa S. Issues to debate on the Women’s Health
Initiative (WHI) study. Human Reproduction 2004; 19(1): 8-13.
2. Writing Group for the Women's Health Initiative Investigators.
Risks and benefits of estrogen plus progestin in healthy postmenopausal
women: principal results from the Women's Health Initiative randomized
controlled trial. JAMA 2002;288: 321-33.
Competing interests:
None declared
Competing interests: No competing interests
Dear Editor,
I read with some amusement the article1 on blinding and the
assessment of the success of blinding. The authors question the current
lack of reporting on success of blinding in randomised controlled placebo
controlled trials. I am sure the authors expect the assessment of success
of blinding to be done in a blind unbiased method, and the evaluation of
the blind unbiased method for testing the success of blinding by blind
procedures and so on…..
1. Fergusson D, Glass KC, Waring D, Shapiro S. Turning a blind
eye:the success of blinding reported in a random sample of randomised
placebo controlled trials. BMJ 2004; 328, 432-434, February 21.
Professor SK Chaturvedi, MD
Consultant Psychiatrist
North Staffordshire Combined Healthcare,
Greenfield Centre,
Stoke on Trent ST6 5UD
Competing interests:
None declared
Competing interests: No competing interests
Other rapid responses to Fergusson et al apart from my own (and Stan Shapiro's rejoinder) seem to have produced a positive gloss on the problem of unblinding in clinical trials. The general thought seems to be that measuring unblinding is difficult, so we may as well give up and carry on with our pretence. This may be to continue "turning a blind eye", as used in the phrase in the title of the original paper.
I may be too sceptical but I continue to wonder whether the small effect size in many clinical trials could be totally explained by bias introduced through unblinding. The measured degree of unblinding by guesses at the end of the trial may be greater than would be expected from correct hunches about efficacy. Like David Sackett, I do not think I am clever enough without help to distinguish true unblinding from correct hunches about efficacy, but the advantage of rapid responses means I can share my thoughts without having worked them through properly. I think it may be possible to measure what the degree of unblinding should be from correct hunches from efficacy based on effect size, and if the actual degree of unblinding with correct guesses is significantly greater than this, it would surely imply that bias had been introduced. I am not sure if this makes sense, but I am reluctant to leave the issue and be as negative about the implications as some of my fellow rapid responders.
For example, the small effect size in meta-analyses of antidepressant trials should be more widely known.1 Psychological variables, of course, may be particularly susceptible to bias. However, even trials containing hard end-points like mortality often have very small differences between active treatment and controls, and their statistical significance is heightened by the large scale of the trials.2 In psychiatric trials the outcome is commonly determined by raters rather than patients themselves. Patients may not necessarily be very good at determining the presence of placebo or active treatment. To give another example, the evidence is that patients may not be aware that they are taking lithium, but observers seem to be able to detect it in one way or another.3
If raters are able to be cued in to whether patients are receiving active or placebo treatment, their wish fulfilling expectancies could be affecting outcome ratings. How do we know that small effect sizes in particular are not due to this amplified placebo effect? I think we should stop turning a blind eye to this legitimate question. It does need to be answered to give confidence about the use of many medications that are endorsed in clinical practice.
- Moncrieff J, Double DB. Double blind random bluff.Mental Health Today 2003; Nov: 24-26 [Medline]
- Double DB. Large scale trials exacerbate risk of spurious conclusion if bias is not eliminated. bmj.com/cgi/eletters/317/7167/1170#1150, 4 Nov 1998 [Full text]
- Double DB Lithium revisited. [letter] British Journal of Psychiatry 1996; 168: 381-2 [Medline]
Competing interests:
None declared
Competing interests: No competing interests
Senn's and Sackett's responses seem to reflect a rather narrow point
of view. Both correctly note that there is likely to be a strong
relationship between an individual's improvement, real or perceived, and a
subsequent guess that they are receiving active treatment. That is why
assessing unblinding by simply examining the proportion of correct
'guesses' is not a particularly good choice. However, to argue that we
should therefore not attempt to assess whether blinding has been
maintained is an even poorer choice.
Our paper examines how trialists report on blinding, not only with
regard to outcome, but also with regard to process. To be charitable, one
might categorize the result as 'not very well'. Since the claim of assay
sensitivity for trials with a placebo arm rests on the assumption of
appropriate blinding, we do not have the luxury of continuing to avoid the
challenging measurement issues involved. It seems contrary to an evidence
based approach to avoid obtaining data because we have to struggle with
its interpretation. Asking individuals to provide not only their
'guesses', but the reasons for them may help the process. A variety of
additional approaches need to be explored.
We applaud the aim of CONSORT and understand Altman and colleagues
reticence to be prescriptive with regard to trial conduct. However, the
CONSORT statement does ask trialists to report not only the method used to
generate a random allocation sequence, but also the details of its
implementation and concealment. Similarly, we feel that there is room and
need for better attention and guidance with regard to the reporting of
blinding.
Caveats regarding the evaluation of new therapies via non-inferiority
trials(1,2) are certainly appropriate. However, methodological caveats
with regard to placebo controls need similar attention, and it appears
that this is being ignored(2). As we note, and Double echoes, the more
subjective the outcome the greater the concern. We believe that licensure
of an ineffective agent (a new anti-depressant for example) based on
studies with poorly designed and executed placebo controls should be
considered as serious an error as licensure on the basis of poorly
designed and executed non-inferiority trials. Since details about
maintenance of blinding are not being routinely sought, that symmetry
appears to be lacking. We think this needs to change.
References
(1) D'Agostino RB Sr. Non-inferiority trials: advances in concepts
and methodology. Stat Med 2003;22:166-167.
(2) International Conference on Harmonization of Technical
Requirements for the Registration of Pharmaceuticals for Human Uses (ICH).
Choice of control group and related issues in clinical trials (E-10).
Competing interests:
None declared
Competing interests: No competing interests
The principle of blinding has become entrenched as a manoeuvre to
minimise bias in comparative trials (1, 2). An inherent part of that bias
lies in the a priori expectations of the investigators. To deny such
expectations is unrea1istic since they underpin a reasonable expectation
of equipoise that makes asking the question both necessary and possible.
Blinding is but one way of minimising bias and we would be less than
honest if we did not admit that all trials contain bias. This in turn
necessitates a determination of the direction and magnitude of the biases
and their possible impact on the interpretation of the outcome.
Ferguson and colleagues (British Medical Journal, 328: 432-4, 21
February 2004) (3) ask an important question, namely how effective is
blinding? However they may be overstating the position when they say that
if blinding is ineffective then the protections are lost, in that the
efficacy of blinding is unlikely to be a binary process, and the impact of
blinding only partly offsets the actual effect size. Thus in a large
randomised trial that demonstrates clinically significant and biologically
plausible differences in outcomes, some imperfections in blinding are
unlikely to fully explain the variance in outcome. The degree of blinding
does however add to the credibility of the reported outcome by reducing
opportunities for bias to become manifest.
Nevertheless if we advocate blinding it is important that we also ask
whether the effectiveness can be measured and if so whether differences in
that effectiveness can explain variance in the observed outcomes.
Since no validated method for measuring the effectiveness of blinding
has yet been developed it is perhaps not surprising that Ferguson et al.
found the literature wanting in this regard.
Sackett (4) raises the interesting and valid question as to whether
testing for effectiveness is merely a surrogate for testing for a priori
expectations. Therefore if the effectiveness of blinding is to be measured
it may be important to measure such expectations as explanatory variables.
Whether the effectiveness can be reliably measured or not, the steps
taken to ensure blinding should be reported as a minimum. Bias can be
applied at several levels in a trial. These include participant and
investigator expectation, outcome evaluation, analysis and interpretation.
Each of these steps is potentially subject to estimation of the impact of
potential bias. The lexicography of this has been well described (5,6).
All commentators to date (7, 4, 8, 9) have pointed out that
attempting to measure the effectiveness of blinding is essentially
confounded by outcome, whether this be the planned outcome or unintended
adverse events.
Any attempt to develop a methodology for measuring the success or
effectiveness of blinding would need to include not only the a priori
expectations but also the effects of chance. Only two trials in Ferguson’s
survey used kappa to take this into account. Other considerations are
whose estimate of the allocation is being sought and the timing of the
assessment, such as whether this was performed prior to the onset of the
intended outcome or not.
In many cases completely effective blinding is very difficult due to
known differences in adverse event profiles. Furthermore participant
expectation can be unconsciously reinforced by differing descriptions in
the consent form, which itself could be subject to blinding. Thus it is
always encouraging to note that in reported results some adverse events or
dropouts can be higher amongst placebo patients than amongst those on
active medication.
While it is difficult to remove all effects of both participant and
investigator expectation, it is possible to isolate these factors from
evaluation. Independent outcome evaluation can usually be blinded more
successfully than assessments and interactions' between participants and
investigators in the clinic.
Outcome evaluation can be blinded not only to the actual assignments
but also the nature of the investigation itself. For instance radiologists
reading films can be potentially blinded to the whole trial and be asked
merely to report on the presence or absence of a feature on an individual
assessment or to compare pairs of films.
A special case is participant reported outcome such as Fisher’s
classic taste test, as mentioned by Senn (7).
Whether blinding can be effectively measured is perhaps not the whole
question. We should be cautious about implying that failure to
unequivocally establish perfect blinding invalidates the interpretation of
the data.
In the land of the unblind, even the partially blinded may be king.
References
1. Sackett D, Haynes R, Guyatt G, Tugwell P: Clinical epidemiology. A
basic science for clinical medicine. Little Brown, Boston. 2nd Ed. 1991.
2. Altman DG, Schulz KF, Moher D, Egger M, DavidoffF, Elboume D, et
al. The revised CONSORT statement for reporting randomized trials:
explanation and elaboration. Ann Intern Med 2001 ;134: 663-94.
3. BMJ, doi:10.1136/bmj.37952.631667.EE (published 22 January 2004)
4. Sackett DL: Why we don't test for blindness at the end of our
trials. BMJ Rapid Response February 20 2004.
5. Schulz KF, Chalmers I, Altman DG. The landscape and lexicon of
blinding in randomized trials. Ann Intern Med 2002;136:254-259. BMJ Rapid
Response February 20 2004.
6. Schulz KF, Grimes DA. Blinding in randomised trials: hiding who
got what. Lancet 2002;359:696-700.
7. Senn SJ: A blinkered view of blinding. BMJ Rapid Response February
20 2004.
8. Double DB: Changing the mindset about unblinding in clinical
trials. BMJ Rapid Response February 21 2004.
9. Altman DG et al.: Testing the success of blinding and the CONSORT
statement. BMJ Rapid Response February 21 2004.
Competing interests:
Until today I believed that the robustness of the effect size could be further validated by establishing the effectiveness of the blind.
Competing interests: No competing interests
Reports of randomised trials should state clearly whether or not
blinding was attempted, and if so who was blinded and how this was
done.[1] Fergusson and colleagues present interesting findings regarding
the important question of whether blinding was effective.[2] As they say,
blinding may be ineffective in some trials, making them less sound
methodologically than they appear to be.
However, asking trial participants (or caregivers) to try to identify
the treatment received runs the risk that the ability to guess the
treatment received might well be influenced by outcome. For example, the
presence of adverse symptoms will surely influence participants asked
which treatment they think they were on, especially as the possibility of
such symptoms would surely have been mentioned as part of the informed
consent procedure.[3] In this case some apparent loss of blinding is
likely and should not be considered necessarily to be a weakness of the
trial. We might expect to see an apparent breaking of the blind more often
in trials where there was a marked treatment effect, for either an
intended outcome or adverse effect. Indeed, end-of-trial tests of
blindness might actually be tests of hunches for adverse effects or
efficacy.[4]. Assessments of blinding would clearly be much more reliable
in trials where they can be carried out before the clinical outcome has
been determined.
Furthermore, individuals may camouflage unblinding efforts. If
someone deciphers assignments, they may provide responses contrary to
their deciphering findings to disguise their unblinding actions.[4] That
difficulty and the aforementioned interpretational difficulties lead us
to question the usefulness of blinding tests in many circumstances.
The CONSORT Statement does not say that “the success of blinding is
to be reported in the publication.” Rather, it recommends reporting the
findings of an assessment of blinding if it was done[5]. Fergusson and
colleagues suggest that the CONSORT Statement should be amended to suggest
that assessment of blinding should be done routinely. We are not convinced
that all trialists should carry out such an exercise. Further, we do not
agree that the CONSORT Statement should be modified as suggested. CONSORT
is a set of reporting recommendations – it does not make statements on how
trials should be done, but asks that what was done should be fully and
accurately reported.
1 Moher D, Schulz KF, Altman DG for the CONSORT Group. The CONSORT
statement: revised recommendations for improving the quality of reports of
parallel-group randomised trials. Lancet
2001;357:1191–4.
2 Fergusson D, Glass KC, Waring D, Shapiro S. Turning a blind eye:
the success of blinding reported in a random sample of randomised, placebo
controlled trials. BMJ 2004; 328: 432-4.
3 Schulz KF, Chalmers I, Altman DG. The landscape and lexicon of
blinding in randomized trials. Ann Intern Med 2002;136:254-259.
4 Schulz KF, Grimes DA. Blinding in randomised trials: hiding who got
what. Lancet 2002;359:696-700.
5 Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, et
al. The revised CONSORT statement for reporting randomized trials:
explanation and elaboration. Ann Intern Med 2001;134:663-94.
Douglas G Altman
Cancer Research UK Medical Statistics Group, Centre for Statistics in
Medicine, Old Road Campus, Oxford
Kenneth F Schulz
Family Health International, Research Triangle Park, NC, USA
David Moher
University of Ottawa, Chalmers Research Group, Ottawa, Ontario, Canada
Competing interests:
The authors are the organisers of the CONSORT Group
Competing interests: No competing interests
Fergusson et al have usefully highlighted the problem of unblinding in clinical trials.1 They propose that item 11b of the CONSORT statement2 should be revised to make assessment of blinding a requirement.
Their data suggest that more often than not blinding in clinical trials is compromised. In the circumstances, merely reporting whether a trial is blinded may be insufficient. In fact, there is bias in the wording of item 11b of the CONSORT statement. The item in the checklist is "If done, how the success of blinding was evaluated". In other words, it seems to imply that assessment of blinding will confirm its validity. We need to change our mindset to whether it matters if and when the blind is broken.
Reporting of the assessment of blinding is uncommon. Correlation of outcome measures with degree of unblinding is even less common and was only done in a few of the 15 trials reported by Fergusson et al. For example, Sackheim et al found a robust association with relapse status and patients’ guesses.3 As pointed out by Fergusson et al, the direction of the causality of this association may be difficult to ascertain. After all, if treatment is obviously effective it is not possible technically to perform a trial double-blind. This may be the explanation of the association found by Sackheim et al. On the other hand, it could be a reflection of bias introduced through unblinding. How we evaluate the efficacy of active treatment is unclear. Breaking of the double-blinding has been interpreted as the explanation for a positive trial result.4 Why should this not be the case in more trials which conclude that active treatment is effective?
Fergusson et al’s suggested minimum set of information for the assessment of blinding includes counts of the correctness of patients’ guesses. As they note, this is particularly beneficial in trials with subjective outcomesor outcomes reported by patients. However, raters’ guesses may be more important in trials where the outcome measures are dependent on such scores. It is possible for patients to remain blind, but for raters still to be unblinded leading to significant correlation with outcome measures.5
- Fergusson D, Glass KC, Waring D, Shapiro S. Turning a blind eye: the success of blinding reported in a random sample of randomised, placebo controlled trials. BMJ, doi:10.1136/bmj.37952.631667.EE (published 22 January 2004) [Full text]
- CONSORT statement website http://www.consort-statement.org/
- Sackeim HA, Haskett RF, Mulsant BH, Thase ME, Mann JJ, Pettinati HM, et al. Continuation pharmacotherapy in the prevention of relapse following electroconvulsive therapy: a randomized controlled trial. JAMA 2001; 285: 1299-1307. [Full Text]
- Karlowski TR, Chalmers TC, Frenkel LD et al Ascorbic acid for the common cold. Journal of the American Medical Association 1975; 231: 1038-42.
- Double DB. Unblinding in trials of the withdrawal of anticholinergic agents in patients maintained on neuroleptics. Journal of Nervous and Mental Disease 1995; 183: 599-602 [Medline]
Competing interests:
None declared
Competing interests: No competing interests
Blindness is important, for all the reasons Dean Fergusson and his
colleagues present in their paper.
However, asking patients or their clinicians at the end of a trial
which drug they think they were taking confounds the success of blinding
with hunches about efficacy. When patients or their study clinicians have
a hunch about which treatment is superior, patients who have done well
will tend to think they were on that treatment, and so will their
clinicians.
My colleagues and I discovered this phenomenon (to our chagrin) when
we were the first group to test aspirin and sulfinpyrazone in the hope
that one or both of these drugs might prevent major and fatal strokes in
patients with transient ischemic attacks (ref 1). In those early days, we
shared the hunch that sulfinpyrazone was probably efficacious, but that
aspirin probably wasn't (we did the trial because we were uncertain about
these hunches, not indifferent about them). As it happened, our pre-trial
hunches were wrong: aspirin turned out to be highly efficacious in our
trial, and sulfinpyrazone worthless.
At the end of our trial we asked study clinicians to predict which
drug each of their patients had been assigned, thinking that we were
measuring whether blindness had been successful during the trial. To our
confusion, their predictions were statistically significantly WRONG. With
4 regimens in this "double-dummy" trial, we'd expect correct predictions
for 25% of patients; our clinicians' predictions were correct for only 18%
of them.
Our confusion lifted when we thought through the effect of our prior
hunches about efficacy. When our patients had done well, their clinicians
tended to predict that they had received sulfinpyrazone; when patients had
suffered strokes, these same clinicians tended to predict that they had
received aspirin or the double-placebo.
But what if our pre-trial hunches about efficacy had been correct?
If patients who had done well were predicted to have received aspirin, and
those who had done poorly were predicted to have received sulfinpyrazone
or the double-placebo, our end-of-study test for blindness would have led
to the incorrect conclusion that blinding was unsuccessful.
I'm not smart enough to be able to look at an end-of-study test for
blindness and distinguish unsuccessful blinding from correct hunches about
efficacy. I hope somebody is. In the meanwhile, both here and in prior
personal correspondence, I've encouraged Dean Fergusson and his colleagues
to reconsider their study's interpretations and recommendations. To the
extent that patients and clinicians were correct in their hunches about
the comparative efficacy of the treatment arms in the trials they
examined, they would draw the incorrect conclusion that blinding had been
unsuccessful, even when it was completely successful.
My colleagues and I vigorously test for blindness before our trials,
but not during them and never at their conclusion.
Ref 1: The Canadian Cooperative Study Group. A randomized trial of
aspirin and sulfinpyrazone in threatened stroke. N Engl J Med 1978;299:53-
9.
Competing interests:
please see: bmj.com/cgi/content/full/324/7336/539/DC1
Competing interests: No competing interests
Placebo-controlled: is it really a MeSH term?
We read this article with great interest. In the methods section
authors mentioned that "Our Medline search used publication type
"randomised controlled trial" and the MeSH term "placebo-controlled" to
identify placebo controlled randomised trials".
Yet an easy MeSH Database search shows that "placebo-controlled" is not a
MeSH term.
Competing interests:
None declared
Competing interests: No competing interests