BMJ 2001;323:829-832 ( 13 October )

Papers

Quality of Cochrane reviews: assessment of sample from 1998

Ole Olsen, senior researcher aPhilippa Middleton, assistant director bJeanette Ezzo, systematic reviews coordinator cPeter C Gøtzsche, director aVictoria Hadhazy, research associate cAndrew Herxheimer, emeritus fellow dJos Kleijnen, director eHeather McIntosh, lecturer b

a See also editorial by Clarke and Langhorne Nordic Cochrane Centre, Rigshospitalet, Dept 7112, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark, b Australasian Cochrane Centre, Department of General Practice, Flinders Medical Centre, Adelaide, South Australia, Australia 5042, c Cochrane Complementary Medicine Field, Complementary and Alternative Program, University of Maryland, School of Medicine, Baltimore, MD, USA, d UK Cochrane Centre, NHS R&D Programme, Oxford OX2 7LG, e NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

Correspondence to: O Olsen o.olsen{at}cochrane.dk


    Abstract
Top
Abstract
Introduction
Methods
Results
Discussion
References

Objective: To assess the quality of Cochrane reviews.
Design: Ten methodologists affiliated with the Cochrane Collaboration independently examined, in a semistructured way, the quality of reviews first published in 1998. Each review was assessed by two people; if one of them noted any major problems, they agreed on a common assessment. Predominant types of problem were categorised.
Setting: Cyberspace collaboration coordinated from the Nordic Cochrane Centre.
Studies: All 53 reviews first published in issue 4 of the Cochrane Library in 1998.
Main outcome measure: Proportion of reviews with various types of major problem.
Results: No problems or only minor ones were found in most reviews. Major problems were identified in 15 reviews (29%). The evidence did not fully support the conclusion in nine reviews (17%), the conduct or reporting was unsatisfactory in 12 reviews (23%), and stylistic problems were identified in 12 reviews (23%). The problematic conclusions all gave too favourable a picture of the experimental intervention.
Conclusions: Cochrane reviews have previously been shown to be of higher quality and less biased on average than other systematic reviews, but
improvement is always possible. The Cochrane Collaboration has taken steps to improve editorial processes and the quality of its reviews. Meanwhile, the Cochrane Library remains a key source of evidence about the effects of healthcare interventions. Its users should interpret reviews cautiously, particularly those with conclusions favouring experimental interventions and those with many typographical errors.


What is already known on this topic
Cochrane reviews are, on average, more systematic and less biased than systematic reviews published in paper journals

Errors and biases also occur in Cochrane reviews

What this study adds
Too often, reviewers' conclusions over-rated the benefits of new interventions

Readers of Cochrane reviews should remain cautious, especially regarding conclusions that favour new interventions

The Cochrane Collaboration has taken steps to improve the quality of reviews



    Introduction
Top
Abstract
Introduction
Methods
Results
Discussion
References

In the late 1980s clinicians drew attention to the poor scientific quality of healthcare review articles.1-3 Subsequently, the need for systematic reviews of the effects of healthcare interventions has been widely recognised and checklists and guidelines have been developed.4-6 The Cochrane Collaboration has led the way in setting new standards for preparing systematic reviews,4 which are published in electronic format (CD Rom issued quarterly) as part of the Cochrane Library. In contrast, few conventional medical journals provide specific guidelines for authors of systematic reviews.7 Cochrane reviews should therefore be expected to use higher quality methods and should be less prone to bias than systematic reviews published in traditional medical journals. These expectations have been confirmed by four comparative studies of quality and one of bias.8-12

However, there is always room for improvement. All scientific reports, including Cochrane reviews, should be read critically. Errors occur, and potential biases may emerge. The comments and criticisms (electronic "letters to the editor" linked to the relevant review) published in the Cochrane Library and the ensuing changes show that some Cochrane reviews have needed correction and improvement.

A group of Cochrane methodologists collaborated to critically read a sample of Cochrane reviews in order to identify and characterise the most common methodological problems. We expected that Cochrane reviews would fulfil most of the criteria that were listed in the current version of the Cochrane handbook4 and in relevant checklists, so we used a semistructured approach that allowed the assessors to note all kinds of problems they encountered. Our aim was to identify the aspects of Cochrane reviews that are most in need of improvement.


    Methods
Top
Abstract
Introduction
Methods
Results
Discussion
References

During the 1998 Cochrane Colloquium the lead researcher (OO) contacted 11 methodologists with various Cochrane affiliations, who subsequently volunteered to assess the methodological quality of the 53 reviews first published in issue 4 of the Cochrane Library in 1998.w1-w53 The project was carried out in 1999 and was coordinated by the Nordic Cochrane Centre. Each review was independently examined by two assessors. We allocated the reviews to assessors by assigning a random number to each review, sorting the numbers in ascending order, and linking the sorted list to a prespecified list of the 55 possible pairs of assessors. Each assessor was assigned nine or 10 reviews, and this assignment was not subsequently changed.

We gave a letter A-E to the overall assessment for each review: A indicated no problems, B minor problems, C major problems, D lack of clarity, and E other types of comments (box 1). We collected and tabulated the individual scores at the Nordic Cochrane Centre. If one of the assessors had noted a major problem in a review, the two assessors decided whether to give feedback to the reviewers and editors by using the comments and criticisms system in the Cochrane Library. The lead researcher also used submitted comments and criticisms to identify common types of problems. This paper focuses on reviews that had major problems.


Box 1 : Assessment format---the assessors were asked to choose one option for each review and give details as appropriate

A I encountered no problems in the review. I would be proud to show this review as an example of a Cochrane review
B I encountered minor problems in the review---namely, the following: . . .
C I encountered major problems in the review, and I think it deserves a comment and criticism along the lines of . . .
D The review might be OK, but I need clarification on . . .
E Any other unstructured comments




    Results
Top
Abstract
Introduction
Methods
Results
Discussion
References

One of the 11 methodologists withdrew from the project, and a few assessors did not manage to assess all of their allotted reviews. We initially received 91 (86%) sets of comments out of the expected 106; these related to 52 of the 53 reviews. The review that had not been assessed was removed from the analysis. Three assessors subsequently volunteered to read additional reviews, so that two people assessed any review in which a major problem had been identified.

The scores given in the 91 independent assessments were A, 24; B, 31; C, 19; D, 10; AB, 3; BD, 3; and DE, 1. The number of A scores (indicating no problems) given by individual assessors ranged from 0 to 6, with a median of 2, out of a possible maximum of 10. The number of C scores (major problems) ranged from 0 to 4, also with a median of 2 (table 1).

Of the 52 reviews, 39 (75%) were assessed independently by two reviewers (table 2). Pairs of assessors agreed completely for 13 (33%) reviews; they gave assessments in adjacent categories (A and B or B and C) for 14 (34%). Two (5%) reviews had contradictory assessments (A and C); for each of the remaining 10 (28%) reviews, one of the assessors felt it lacked clarity (D). The 13 reviews assessed by only one reviewer obtained the following scores: A, 1; AB, 2; B, 7; and C, 3. Nineteen (37%) of the 52 reviews had at least one A score, and 17 (33%) had at least one C score.


                              
View this table:
[in this window]
[in a new window]
 

Table 1. Scoring of reviews: quality of 52 reviews as assessed by 10 assessors


                              
View this table:
[in this window]
[in a new window]
 

Table 2. Combinations of scores within pairs of assessors

Pairs of assessors reached agreement on 13 comments and criticisms, which both reviewers wrote jointly; for various reasons an additional four were contributed by only one assessor (one additional assessor withdrew from the project). The full texts of the submitted comments and criticisms are on the BMJ's website.

While classifying the comments and criticisms, we discovered that for two reviews the assessors seemed to have agreed on a C by mistake, leaving 15 reviews (29%, 95% confidence interval 17% to 43%) with major problems. There were three areas of concern. Firstly, the evidence did not support the conclusions in nine (17%) reviews (table 3). Secondly, the conduct or reporting of the reviews was unsatisfactory in 12 (23%) reviews (box 2). Thirdly, there were stylistic concerns with 12 (23%) reviews.

The problematic conclusions all described the effect of the experimental treatment in terms that we judged to have been too optimistic (table 3). None of these nine reviews indicated a bias towards the control treatment.

The most common problems with methods (box 2) concerned inclusion and exclusion of trials (six reviews), concealment of allocation (five), loss to follow up (four), choice of outcome measures (four), and statistics (three). Other problems were found only once---for example, a conclusion based largely on a single trial that seemed to have major weaknesses.

Stylistic problems were indicated by statements such as "many spelling and grammatical errors," "a few typographical errors," "seems to be an unfinished draft," "needs to be edited to be more readable and comprehensible." Four reviews had many spelling and typographical errors, and these four also had problems relating to their methods and conclusions.


                              
View this table:
[in this window]
[in a new window]
 

Table 3. Conclusions not supported by the evidence


Box 2 : Some examples of problems with methods

Inclusion and exclusion criteria were not well defined,w31,w51 inconsistently transferred from title to methods section (another category of patients or another type of intervention),w9,w31 or applied inconsistently,w31 subjectively,w29 or with retrospective rationalisation.w27 The problems partly related to choice of terminology---for example, "experimental or quasi-experimental designs" (meaning randomised or quasi-randomised or non-randomised studies?) resulting in different interpretations or enumerations in the "selection criteria," "description of studies," "main results," meta-analytic graphs, and "table of included studies" and in conflicting numbersw31,w51
Concealment of allocation was mixed up with double blinding,w4,w20,w29,w31 or discrepancies existed between the quality of the trials as described in the table of included trials, the graphs, and the main text.w31,w52 These problems may occur much more often than we found as most occurrences were noted by a single assessor
Assessors were particularly concerned about loss to follow up when rates of dropout were high (29%, 43%, 50%),w29,w51 dropouts were treatment failures,w29 or the reviewers claimed to have performed an intention to treat analysis but the numbers on the graphs did not fit with the number of included patientsw52; the assessors asked for subgroup analyses including only trials with full follow up, more thorough documentation and discussion of loss to follow up,w4 or more cautious conclusionsw29,w51
Outcome measures were inappropriately combined (death plus surrogate outcome),w53 inappropriately split (different lengths of follow up),w20 too numerous without precautions against multiple testing,w31,w52 unblinded subjective assessments,w53 or derived from only a single small trialw31
Statistical problems related to extremely different standard deviations between experimental and control groupw18 or between trialsw31 or to a confidence interval with zero lengthw51




    Discussion
Top
Abstract
Introduction
Methods
Results
Discussion
References

Fifteen (29%) of 52 Cochrane reviews first published in 1998 were judged to have major problems, among which biased conclusions, problems with methods, and insufficient typographical or stylistic editing were the most common. Thus, even though Cochrane reviews are based on specific guidelines4 and have higher quality methods on average than systematic reviews published in conventional journals,8-11 problems were still common. The problems we identified in these reviews were brought to the attention of their authors through the electronic comments and criticism system; revised versions of some reviews have subsequently appeared in the Cochrane Library (in some instances with a changed title), and other revisions are being prepared.

Strengths and weaknesses of the study
We studied Cochrane reviews that were first published nearly three years ago. A range of quality initiatives has been implemented by the Cochrane Collaboration since then; we hope that a study of current Cochrane reviews would reveal a smaller proportion of reviews with major problems.

Assessing only reviews first published in issue 4 of the Cochrane Library in 1998 was a decision of convenience and may have led to a sample that was not representative of all the Cochrane reviews available at that time. For example, if these new reviews had been contributed mainly by relatively inexperienced reviewers and editorial groups, our overall findings may have overestimated the proportion of reviews with major problems. On the other hand, even in 1998, three years after the first Cochrane reviews had been published, new reviews might be expected to have been of higher quality than older reviews, thus leading to a bias in the opposite direction.

Our individual assessments were done in a semistructured way, without a checklist, by experienced, selected volunteer methodologists from the Cochrane Collaboration, who were advised to spend not more than 10 hours in total on the exercise. The time constraint and lack of a checklist may have led to some errors going undetected. Conversely, the use of experienced methodologists may have led to detection of unexpected errors. The way assessors were recruited may have led us to be particularly mild or particularly hard in our assessments. Because agreement was reached on most assessments, disagreements in the individual assessments probably reflected oversights by one of the assessors rather than true disagreement. On the other hand, four of the 15 comments and criticisms that were submitted had been written by only one assessor. Thus the number of identified problems might be an underestimate. Alternatively, the high number of problems described as major may partly reflect a few very demanding assessors. The fact that agreements were reached despite the variability in individual assessments indicates that selection of another set of methodologically trained assessors would probably not have greatly altered the final assessments.

One person (OO), while reading the 17 comments, created the three categories by grouping the problems identified. No hypotheses regarding these categories were stated at the outset. Prespecified categories and two data extractors would have strengthened the validity of the findings.

Relation to other studies of bias
Empirical studies have identified several important biases, such as publication bias and bias related to poor randomisation, related to individual trials that all tend to exaggerate the estimated beneficial effects of new treatments.13-15 Important bias also arises in the step from results to conclusions. In a study of 196 drug trials, bias in the conclusions or abstracts favoured the new drug in 81 of 82 reports.16 In accordance with these findings, the bias we found favoured the experimental treatment in all of the nine reviews with problematic conclusions. Thus the same type of bias seems to occur in the conclusions of systematic reviews as in reports of trials. This bias in systematic reviews, which is unrelated to bias in the individual trials, seems not to have been reported before. However, although dubious or invalid statements were found in 76% of the conclusions or abstracts of drug trial reports,16 they occurred in only 17% of the Cochrane reviews.

Steps to improve quality
Solutions to most of the methodological problems we met were described in the Cochrane handbook and other checklists for systematic reviews. 4 17 This indicates the need for better use of guidelines in scientific editing and peer review. The conclusion bias we observed has led to improved advice on how to write conclusions in Cochrane reviews in the most recent version of the Cochrane reviewers' handbook.18

Since 1998 the Cochrane Collaboration has taken several additional steps to improve the quality of its reviews. A quality advisory group has been established and the post of quality improvement manager has been created. The Cochrane reviewers' handbook is improved regularly, and tools for assessing the quality of reviews are being developed. Several new courses for reviewers, statisticians, and editors have been developed, and two centralised editing projects are running. It is also important that users of the Cochrane Library participate in the process by submitting any relevant comments and criticisms they might have.

Implications for clinicians and policymakers
Seekers of the best available evidence on treatment and prevention should continue to look to the Cochrane Library as a key source of information, despite the deficiencies that we found in a minority of Cochrane reviews in the 1998 sample. Reliance on unsystematic reviews, textbooks, and anecdotal evidence is likely to be far more problematic. 1 19 No matter which sources of evidence are being used, users of the evidence need to learn the skills of critical appraisal. Guides and courses on critical appraisal are now widely accessible.20 As with any scientific report, readers should themselves assess the reliability of individual Cochrane reviews. They should be particularly cautious of reviews with conclusions that favour experimental interventions when relatively little evidence is available for the review and of reviews with many typographical errors.

    Acknowledgments

We thank Phil Alderson and Matthias Egger for their contribution to the study and Iain Chalmers and Mike Clarke for useful comments on the manuscript.

Contributors: OO planned, coordinated, and reported the work and is the guarantor of the paper. All authors, Phil Alderson, and Matthias Egger participated in the assessments. All authors gave input to, and some commented on, the draft versions of the paper. All authors approved the manuscript.

    Footnotes

Funding: OO was funded by the Danish Institute for Health Technology Assessment.

Competing interests: All assessors are associated with the Cochrane Collaboration.

Additional references plus full text of the submitted comments and criticisms are available on the BMJ's website


    References
Top
Abstract
Introduction
Methods
Results
Discussion
References

1. Mulrow CD. The medical review article: state of the science. Ann Intern Med 1987; 106: 485-488.
2. Oxman AD, Guyatt GH. Guidelines for reading literature reviews. Can Med Assoc J 1988; 138: 697-703[Abstract].
3. Ellenberg S, ed. Proceedings of "Methodologic issues in overviews of randomized clinical trials." Stat Med 1987; 6: 217-409[Medline].
4. Mulrow CD, Oxman AD, eds. Cochrane Collaboration handbook. In: Cochrane Library. Issue 4. Oxford: Update Software, 1997.
5. Mulrow CD. Rationale for systematic reviews. BMJ 1994; 309: 597-599[Free Full Text].
6. Oxman AD. Checklists for review articles. BMJ 1994; 309: 648-651[Free Full Text].
7. Moher D, Jadad AR. How to peer review a manuscript. In: Godlee F, Jefferson T, eds. Peer review in health sciences. London: BMJ Books, 1999:146-156.
8. Jadad AR, Cook DJ, Jones A, Klassen TP, Tugwell P, Moher M. Methodology and reports of systematic reviews and meta-analyses. JAMA 1998; 280: 278-280[Abstract/Free Full Text].
9. Shea B, Moher D, Pham B, Tugwell P. Assessing the quality of reporting meta-analyses of randomized controlled trials [abstract]. VII Cochrane Colloquium, Rome, 5-9 October 1999:A39.
10. Shea B, Dubé C, Moher D. Assessing the quality of reports of systematic reviews and meta-analyses: A systematic review of checklists and scales [abstract]. VII Cochrane Colloquium, Rome, 5-9 October 1999:A40.
11. Jadad AR, Moher M, Browman GP, Booker L, Sigouin C, Fuentes M, et al. Systematic reviews and meta-analyses on treatment of asthma: critical evaluation. BMJ 2000; 320: 537-540[Abstract/Free Full Text].
12. Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315: 629-634[Abstract/Free Full Text].
13. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995; 273: 408-412[Abstract].
14. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ 1997; 315: 645-645.
15. Tramér MR, Reynolds DJM, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 1997; 315: 6335-6340.
16. Gøtzsche PC. Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal anti-inflammatory drugs in rheumatoid arthritis. Control Clin Trials 1989; 10: 31-56[CrossRef][Medline].
17. Oxman AD, Cook DJ, Guyatt GH, for the Evidence-Based Medicine Working Group. Users' guides to the medical literature. VI: How to use an overview. JAMA 1994; 272: 1367-1371[CrossRef][Medline].
18. Clarke M, Oxman AD, eds. Common errors in reaching conclusions. Cochrane reviewers' handbook 4.1.3, section 9.7. In: Cochrane Library. Issue 3. Oxford: Update Software, 2001.
19. Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA 1992; 268: 240-248[Abstract].
20. EBH calendar of events. http://cebm.jr2.ox.ac.uk/docs/calendar.html (accessed 17 July 2001).

(Accepted 29 August 2001)


© BMJ 2001

Related Articles

Cochrane reviews compared with industry supported meta-analyses and other meta-analyses of the same drugs: systematic review
Anders W Jørgensen, Jørgen Hilden, and Peter C Gøtzsche
BMJ 2006 333: 782. [Abstract] [Full Text] [PDF]

Quality of Cochrane reviews
Mark Petticrew, Paul Wilson, Kath Wright, Fujian Song, Helen Handoll, and Rajan Madhok
BMJ 2002 324: 545. [Extract] [Full Text]

War, normality, and a time for women
BMJ 2001 323: 0. [Full Text] [PDF]

War, normality, and a time for women
BMJ 2001 323: 0. [Full Text] [PDF]

Some caution is needed in interpreting Cochrane reviews
BMJ 2001 323: 0. [Full Text]

Revisiting the Cochrane Collaboration
Mike Clarke and Peter Langhorne
BMJ 2001 323: 821. [Extract] [Full Text] [PDF]

This article has been cited by other articles:

  • Jorgensen, A. W, Hilden, J., Gotzsche, P. C (2006). Cochrane reviews compared with industry supported meta-analyses and other meta-analyses of the same drugs: systematic review. BMJ 333: 782- [Abstract] [Full text]  
  • Waters, E., Doyle, J., Jackson, N., Howes, F., Brunton, G., Oakley, A. (2006). Evaluating the effectiveness of public health interventions: the role and activities of the Cochrane Collaboration.. J. Epidemiol. Community Health 60: 285-289 [Abstract] [Full text]  
  • Gilligan, P, Law, H, Lumsden, G, Brenchley, J, Kitching, G, Taylor, A, Khan, A, Shepherd, M, Jones, J, Hegarty, D (2005). SOCRATES 3 (synopsis of Cochrane reviews applicable to emergency services). Emerg. Med. J. 22: 50-52 [Full text]  
  • Bigby, M., Williams, H. (2003). Appraising Systematic Reviews and Meta-analyses. Arch Dermatol 139: 795-798 [Abstract] [Full text]  
  • Poikolainen, K. (2002). A NICE TRY THAT FAILS: THE SWEDISH COUNCIL ON TECHNOLOGY ASSESSMENT IN HEALTH CARE (SBU) EVALUATION OF THE EFFECT OF TREATMENT OF ALCOHOL AND DRUG PROBLEMS: THE EPIDEMIOLOGIST'S VIEW. Alcohol Alcohol 37: 416-418 [Abstract] [Full text]  
  • Petticrew, M., Wilson, P., Wright, K., Song, F., Handoll, H., Madhok, R. (2002). Quality of Cochrane reviews. BMJ 324: 545-545 [Full text]  
  • Ober, M., Shohara, R., Rennie, D., Yank, V., Bero, L. A. (2002). The Criticism Management System for the Cochrane Library. Eval Health Prof 25: 98-115 [Abstract]  
  • Shea, B., Moher, D., Graham, I., Pham, B., Tugwell, P. (2002). A Comparison of the Quality of Cochrane Reviews and Systematic Reviews Published in Paper-Based Journals. Eval Health Prof 25: 116-129 [Abstract]  
  • Laursen, L. C. (2002). Article from Clinical Evidence. BMJ 324: 428-428 [Full text]  
  • (2001). Cochrane Reviews: Good, but Could Be Better. JWatch General 2001: 7-7 [Full text]  

Rapid Responses:

Read all Rapid Responses

Further observations on Cochrane reviews
Helen Handoll
bmj.com, 18 Oct 2001 [Full text]
Quality of Cochrane and other systematic reviews
Mark Petticrew
bmj.com, 19 Oct 2001 [Full text]
Cochrane Reviews: are they reliable?
Alan M Edwards, et al.
bmj.com, 3 Jun 2005 [Full text]



Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview