BMJ 2001;323:157-162 ( 21 July )

Education and debate

Systematic reviews in health care

Systematic reviews of evaluations of diagnostic and screening tests

This is the third in a series of four articles

Jonathan J Deeks, senior medical statistician

Imperial Cancer Research Fund/NHS Centre for Statistics in Medicine, Institute of Health Sciences, Oxford OX3 7LF

Correspondence to: J J Deeks J.Deeks{at}icrf.icnet.uk

Tests are routinely used in medicine to screen for, diagnose, grade, and monitor the progression of disease. Diagnostic information is obtained from a multitude of sources, including imaging and biochemical technologies, pathological and psychological investigations, and signs and symptoms elicited during history taking and clinical examinations.1 Each of these items of information can be regarded as a result of a separate diagnostic or screening "test." Systematic reviews of evaluations of tests are undertaken for the same reasons as systematic reviews of treatment interventions: to produce estimates of test performance and impact based on all available evidence, to evaluate the quality of published studies, and to account for variation in findings between studies.2-5 Reviews of studies of diagnostic accuracy involve the same key stages of defining questions, searching the literature, evaluating studies for eligibility and quality, and extracting and synthesising data. However, studies that evaluate the accuracy of tests have a unique design requiring different criteria to appropriately assess the quality of studies and the potential for bias. Additionally, each study reports a pair of related summary statistics (for example, sensitivity and specificity) rather than a single statistic (such as a risk ratio) and hence requires different statistical methods to pool the results of the studies. This article concentrates on the dimensions of study quality and the advantages and disadvantages of different summary statistics for combining studies in meta-analysis. Other aspects, including searching the literature and further technical details, are discussed elsewhere.6


Summary points


Systematic reviews of studies of diagnostic accuracy differ from other systematic reviews in the assessment of study quality and the statistical methods used to combine results

Important aspects of study quality include the selection of a clinically relevant cohort, the consistent use of a single good reference standard, and the blinding of results of experimental and reference tests

The choice of statistical method for pooling results depends on the summary statistic and sources of heterogeneity, notably variation in diagnostic thresholds

Sensitivities, specificities, and likelihood ratios may be combined directly if study results are reasonably homogeneous

When a threshold effect exists, study results may be best summarised as a summary receiver operating characteristic curve, which is difficult to interpret and apply to practice




    Studies of diagnostic accuracy

Studies of test performance (or accuracy) compare test results between groups of patients with and without the target disease, each of whom undergoes the experimental test as well as a "gold standard" diagnostic investigation to ascertain disease status. The relation between the test results and disease status is described using probabilistic measures, such as sensitivity, specificity, likelihood ratios, diagnostic odds ratios (box), and receiver operating characteristic curves (box).


Summary statistics of diagnostic accuracy

Sensitivities and specificities

The rates of correct identification of patients with and without the disease are known as test sensitivity and test specificity, respectively.7 For a test to be useful at ruling out a disease it must have high sensitivity, and for it to be useful at confirming a disease it must have high specificity

Likelihood ratios

Positive and negative likelihood ratios describe the discriminatory properties of positive and negative test results, respectively.8 Likelihood ratios state how many times more likely particular test results are in patients with disease than in those without disease. Positive likelihood ratios above 10 and negative likelihood ratios below 0.1 have been noted as providing convincing diagnostic evidence, whereas those above 5 and below 0.2 give strong diagnostic evidence.9 Likelihood ratios can be directly applied to give probabilistic statements concerning the likelihood of disease in an individual (box)

Diagnostic odds ratios

The diagnostic odds ratio is a convenient measure when combining studies in a systematic review (it is often reasonably constant regardless of the diagnostic threshold) but is difficult to apply directly to clinical practice. The diagnostic odds ratio describes the odds of positive test results in participants with disease compared with the odds of positive test results in those without disease. A single diagnostic odds ratio corresponds to a set of sensitivities and specificities depicted by a receiver operating characteristic curve


Receiver operating characteristic curves

Receiver operating characteristic curves are used in studies of diagnostic accuracy to depict the pattern of sensitivities and specificities observed when the performance of the test is evaluated at several different diagnostic thresholds. Figure 1 is a receiver operating characteristic curve from a study of the detection of endometrial cancer by endovaginal ultrasonography.8 Women with endometrial cancer are likely to have increased endometrial thicknesses: few women who do not have cancer will have thicknesses above a high threshold whereas few women with endometrial cancer will have thicknesses below a low threshold. This pattern of results is seen in figure 1 , with the 5 mm threshold showing high sensitivity (0.98) but poor specificity (0.59) and the 25 mm threshold showing poor sensitivity (0.24) but high specificity (0.98).

The overall diagnostic performance of a test can be judged by the position of the receiver operating characteristic line. Poor tests have lines close to the rising diagonal, whereas the lines for perfect tests would rise steeply and pass close to the top left hand corner, where both the sensitivity and specificity are 1. Receiver operating characteristic plots are used in systematic reviews to display the results of a set of studies, the sensitivity and specificity from each study being plotted as a separate point in the receiver operating characteristic space



View larger version (33K):
[in this window]
[in a new window]
 
Fig 1.   Receiver operating characteristic plot of endovaginal ultrasonography for detecting endometrial cancer



    Dimensions of study quality

The quality of a study relates to aspects of the study's design, methods of sample recruitment, the execution of the tests, and the completeness of the study report, as summarised in table 1. 4-6 10-12


                              
View this table:
[in this window]
[in a new window]
 

Table 1. Framework for considering study quality and likelihood of bias

To be reliable a systematic review should aim to include only studies of the highest quality. Systematic reviews may either exclude studies that do not meet these criteria and are susceptible to bias or include studies with a mixture of quality characteristics and explore the differences. 3 5 Whichever approach is adopted, it is essential that the quality of the studies included in the review is assessed and reported, so that appropriately cautious inferences can be drawn.


    Empirical evidence

A recent empirical study evaluated which aspects of design and execution listed in table 1 are of most importance.13 The most notable finding related to the design of the study. Studies that recruited participants with disease separately from those without disease (for example, by comparing a group known to have the disease with a group of healthy controls) overestimated diagnostic accuracy when compared with studies that recruited a cohort of patients unselected by disease status and representative of the clinical population in which the test was used. Studies that used different reference tests according to the results of the experimental test also overestimated diagnostic performance, as did unblinded studies. Omission of specific details from the report of the study was also associated with systematic differences in results.


    Meta-analysis of studies of diagnostic accuracy

Meta-analysis is a two stage process involving derivation of summary statistics for each study and computation of a weighted average of the summary statistics across the studies.14 I illustrate the application of three commonly used methods for pooling different summaries of diagnostic accuracy with a case study.

As with systematic reviews of randomised controlled trials, meta-analysis should be considered only when the studies have recruited from similar patient populations (it is problematic to combine studies from general practice with studies from tertiary care), have used comparable experimental and reference tests, and are unlikely to be biased. Even when these criteria are met there may still be such gross heterogeneity between the results of the studies that it is inappropriate to summarise the performance of a test as a single number.

Case study

Detection of endometrial cancer with endovaginal ultrasonography
Smith-Bindman et al published a systematic review of 35 studies evaluating the diagnostic accuracy of endovaginal ultrasonography for detecting endometrial cancer and other endometrial disorders.15 All studies included in the review were of prospective cohort designs and used the results of endometrial biopsy, dilation and curettage, or hysterectomy as a reference standard. Most of the studies presented sensitivities and specificities at several endometrial thicknesses detected by endovaginal ultrasonography (the receiver operating characteristic curve in figure 1 is from one of these studies). The case study is based on the subset of 20 studies from this review that considered the diagnostic accuracy of endovaginal ultrasonography in ruling out endometrial cancer with endometrial thicknesses of 5 mm or less. Figure 2 shows the sensitivities and specificities for the 20 studies.



View larger version (42K):
[in this window]
[in a new window]
 
Fig 2.   Estimates from 20 studies of sensitivity and specificity of measurement of endometrial thicknesses of more than 5 mm using endovaginal ultrasonography for detecting endometrial cancer.15 Points indicate estimates of sensitivity and specificity. Horizontal lines are 95% confidence intervals for estimates. Size of points reflects total sample size



    Sources of heterogeneity

The choice of meta-analytical method depends in part on the pattern of variability (heterogeneity) observed in the results. Heterogeneity can be considered graphically by plotting sensitivities and specificities from the studies as points on a receiver operating characteristic plot (fig 3). Some divergence of the results around a central point is to be expected by chance, but variation in other factors, such as patient selection and features of the study's design, may increase the observed variability.16



View larger version (19K):
[in this window]
[in a new window]
 
Fig 3.   Receiver operating characteristic plots showing three approaches to meta-analysis of 20 studies of diagnostic accuracy of endovaginal ultrasonography for detecting endometrial cancer. Results of studies are indicated by squares. Area of squares is proportional to study sample size. Fitted lines indicate (left) average sensitivity and specificity, (centre) average positive and negative likelihood ratios, and (right) average diagnostic odds ratios. Figures in brackets are 95% confidence intervals for summary estimates

One important extra source of heterogeneity is variation introduced by changes in diagnostic threshold. Studies may use different thresholds to define positive and negative test results. Some may have done this explicitly---for example, by varying numerical cut-off points used to classify a biochemical measurement as positive or negative, whereas for others there may be naturally occurring variations in diagnostic thresholds between observers, laboratories, or machines. The choice of a threshold may also vary according to the prevalence of the disease---when the disease is rare a more extreme threshold may have been used to avoid large numbers of false positive diagnoses. Unlike other sources of variability, variation of the diagnostic threshold introduces a particular pattern into the receiver operating characteristic plot of study results, such that the points show curvature (fig 1).

If there is no heterogeneity between the studies, the best summary estimate of test performance should be a single point on the receiver operating characteristic graph. The first two methods estimate such a summary, first by pooling sensitivities and specificities then by pooling positive and negative likelihood ratios. The third method is more complex and pools diagnostic odds ratios to take account of possible heterogeneity in diagnostic threshold.


    Pooling sensitivities and specificities

The pooled estimate of sensitivity is 0.96 (95% confidence interval 0.93 to 0.99) and is depicted by the horizontal line on the receiver operating characteristic plot in figure 3 (left). The overall estimate of mean specificity is lower: 0.61 (0.55 to 0.66).

Heterogeneity is, however, clearly evident in figure 3 (left): although the study points lie reasonably close to the summary sensitivity (test for heterogeneity, P=0.04), the results of many studies lie some distance from the summary specificity (test for heterogeneity, P<0.001).

Regardless of the causes of the heterogeneity, the overall high estimate and relative consistency of the sensitivity results does suggest that a negative test result could be of potential clinical use in ruling out endometrial cancer. As there is heterogeneity between specificities, however, it is more appropriate to note the range of specificities (0.27 to 0.88) rather than to quote the average value of 0.61. It is difficult to draw a conclusion about test specificity: the observed values vary considerably and there is no understanding from this analysis as to the reasons for the variation.


Application of a likelihood ratio

The probability of endometrial cancer in a woman with an endometrial thickness of 5 mm or less measured by endovaginal ultrasonography can be computed with Bayes' theorem 1 17 :

Post-test odds = pretest odds × likelihood ratio

Assuming that the study samples are representative, an estimate of the pretest odds can be calculated from the prevalence of endometrial cancer across the studies (13%)

Pretest:odds=<FR><NU>prevalence</NU><DE>1 - prevalence</DE></FR> = <FR><NU>0.13</NU><DE>0.87</DE></FR> = 0.15

Applying Bayes' theorem to the summary negative likelihood ratio:

Post-test odds = pretest odds × negative likelihood ratio = 0.15 × 0.09 = 0.014

and converting the post-test odds to a probability:

Post-test:probability = <FR><NU>post-test:odds</NU><DE>1 + post:test:results</DE></FR> = <FR><NU>0.014</NU><DE>1 + 0.014</DE></FR> = 0.013

we estimate that only 1.3% of women with an endometrial thickness of 5 mm or less measured by endovaginal ultrasonography will have endometrial cancer. Knowledge of other characteristics of a particular patient that either increase or decrease their prior probability of endometrial cancer can be incorporated into the calculation by adjusting the pretest probability accordingly1




    Pooling likelihood ratios

For the case study the pooled estimate of the positive likelihood ratio was not particularly high (2.54, 2.16 to 2.98), and the values varied significantly between the studies (test for heterogeneity, P<0.001). In figure 3 (centre) it is clear that the summary positive likelihood ratio lies some distance from many of the values. Again it is debatable whether reporting the average value of such heterogeneous results is sensible, but it is unlikely that a positive test result could provide convincing evidence of the presence of endometrial cancer as the positive likelihood ratios are all below 10 (data not shown).

The negative likelihood ratios show no evidence of significant heterogeneity (test for heterogeneity, P=0.09), the pooled estimate being 0.09 (0.06 to 0.13), with the summary line on the receiver operating characteristic plot in figure 3 (centre) lying close to the results of most of the studies. This finding again shows that a measurement of an endometrial thickness of 5 mm or less made by endovaginal ultrasonography can provide reasonably convincing evidence to rule out endometrial cancer.

Although these conclusions concerning potential diagnostic use are similar to those obtained by pooling sensitivities and specificities, the summaries obtained by pooling likelihood ratios can be more easily interpreted and applied to clinical practice. The box describes how the summary negative likelihood ratio can be applied to estimate the probability of endometrial cancer in a woman with a negative test result.


    Diagnostic odds ratios and summary receiver operating characteristic curves

If the observed heterogeneity between the studies arises due to variation in the diagnostic threshold, estimates of summary sensitivity and specificity or summary positive and negative likelihood ratios will underestimate diagnostic performance.18 In this situation the appropriate meta-analytical summary is not a single point in the receiver operating characteristic space but the receiver operating characteristic curve itself. Methods of deriving the best fitting summary receiver operating characteristic curve are necessarily more complex. 2-5 18-21

How is a summary receiver operating characteristic curve estimated? The simplest approach involves calculating a single summary statistic for each study---the diagnostic odds ratio (box). Each diagnostic odds ratio corresponds to a particular receiver operating characteristic curve. If the studies in a review all relate to the same curve they may have consistent diagnostic odds ratios even if they have variable sensitivities and specificities. Table 2 gives examples of diagnostic odds ratios corresponding to particular sensitivities, specificities, and positive and negative likelihood ratios.


                              
View this table:
[in this window]
[in a new window]
 

Table 2. Examples of diagnostic odds ratios corresponding to particular pairings of sensitivity and specificity and positive and negative likelihood ratios

In the case study it is possible that some of the observed heterogeneity could be explained by a threshold effect, perhaps due to differences in calibration of the ultrasound machines. The estimate of the summary diagnostic odds ratio is 28.0 (18.2 to 43.2) and is reasonably consistent across the studies (test for heterogeneity, P=0.3), suggesting that the points indeed could have originated from the same receiver operating characteristic curve. The summary diagnostic odds ratio can be interpreted in terms of sensitivities and specificities by consulting table 2 (for example, a diagnostic odds ratio of 29 corresponds to a sensitivity of 0.95 and a specificity of 0.60 and to a sensitivity of 0.60 and specificity of 0.95) or by plotting the corresponding summary receiver operating characteristic curve (fig 3 (right)). This method does not yield a unique joint summary estimate of sensitivity and specificity: it is only possible to obtain a summary estimate of one value by specifying the value of the other. This greatly limits its clinical application.


    Discussion

Systematic reviews of diagnostic accuracy have not, as yet, made the same impression on the practice of evidence based health care as have systematic reviews of randomised controlled trials. Reasons relate to reliability, heterogeneity, and clinical relevance.
 
(Credit: MARK OLDROYD)

Are systematic reviews of diagnostic studies reliable?
Many meta-analyses of the accuracy of diagnostic tests are hindered by the poor quality of the primary studies: most published evaluations of the accuracy of diagnostic tests having at least one flaw.12 Headway has been made in understanding the importance of particular features of a study's design and in improving quality, but for many diagnostic tests few high quality studies have been undertaken and published.13

The reliability of a review also depends crucially on whether the included studies are an unbiased selection. As with all reviews, systematic reviews of diagnostic tests are susceptible to publication bias, and this may be a greater problem than for randomised controlled trials. 2 3 No investigations, however, have been conducted to estimate rates of publication bias for studies of diagnostic accuracy.

How useful are systematic reviews to a practising clinician?
Heterogeneity of the results of studies of diagnostic accuracy is common but in itself does not prevent conclusions of clinical value from being drawn.22 Despite heterogeneity being observed in the case study, it was still possible to draw a conclusion of clinical value---that an endometrial thickness of 5 mm or less can rule out endometrial cancer.

Diagnostic odds ratios and summary receiver operating characteristic curves are, however, often promoted as the most statistically valid method for combining test results when there is heterogeneity between studies, and they are commonly used in systematic reviews of diagnostic accuracy.2-4 Unfortunately summary curves are of little use to practising healthcare professionals: they can identify whether a test has potential clinical value, but they cannot be used to compute the probability of disease associated with specific test outcomes. Their use is also based on a potentially inappropriate and untested assumption that observed heterogeneity has arisen through variation in diagnostic threshold. In the case study, whereas the diagnostic odds ratio was a reasonably consistent summary statistic across the studies, there was no evidence to suggest that the observed heterogeneity arose through variations in diagnostic threshold (all included studies had a 5 mm threshold for endometrial thickness). Variation in referral patterns, sample selection, and study methods may be more likely explanations for the heterogeneity. There is no clear statistical advantage in using a summary receiver operating characteristic approach to synthesise the results over pooling sensitivity and specificity or likelihood ratios unless there is a threshold effect. Empirical research is urgently required to find out whether the simpler methods for pooling sensitivities, specificities, and likelihood ratios are likely to be seriously misleading in practice and whether apparent threshold effects are really due to variations in diagnostic threshold rather than alternative sources of heterogeneity.

Are studies of diagnostic accuracy clinically relevant?
Systematic reviews of the accuracy of tests do not always answer the most clinically relevant question. New tests are often evaluated for their ability to replace or be used alongside existing tests. The important issues are comparisons of tests or comparisons of testing algorithms: these would be best addressed in properly designed comparative studies, rather than by synthesising studies of diagnostic accuracy separately for each test.

The evaluation of the diagnostic accuracy of a test is also only one component of assessing whether it is of clinical value. 23 24 Treatment interventions are recommended for use in health care only if they are shown on average to be of benefit to patients: the same criterion should also be applied for the use of a diagnostic test, and even the most accurate of tests can be clinically useless or do more harm than good. It should always be considered whether undertaking a systematic review of studies of diagnostic accuracy is likely to provide the most useful evidence of the value of a diagnostic intervention.

    Acknowledgments

I thank Rebecca Smith-Bindman for providing the data for the case study.

    Footnotes

Series editor: Matthias Egger

Competing interests: None declared.

Systematic Reviews in Health Care: Meta-analysis in Context can be purchased through the BMJ Bookshop (www.bmjbookshop.com); further information and updates for the book are available on www.systematicreviews.com.


    References

1. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: a basic science for clinical medicine 2nd ed. Boston: Little, Brown, 1991.
2. Irwig L, Tosteson AN, Gatsonis CA, Lau J, Colditz G, Chalmers TC, et al. Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med 1994; 120: 667-676[Abstract/Free Full Text].
3. Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytical methods for diagnostic test accuracy. J Clin Epidemiol 1995; 48: 119-130[CrossRef][Medline].
4. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests. Recommended methods [updated 6 Jun 1996]. www.cochrane.org/cochrane/sadtdoc1.htm (accessed 27 March 2001).
5. Vamvakas EC. Meta-analyses of studies of diagnostic accuracy of laboratory tests: a review of concepts and methods. Arch Pathol Lab Med 1998; 122: 675-686[Medline].
6. Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. In: Egger M, Davey Smith G, Altman DG, eds. Systematic reviews in health care: meta-analysis in context 2nd ed. London: BMJ Books, 2001.
7. Bland JM, Altman DG. Diagnostic tests. 1: Sensitivity and specificity. BMJ 1994; 308: 1499[Free Full Text].
8. Deeks JJ, Morris JM. Evaluating diagnostic tests. Baillière's Clinical Obstetrics and Gynaecology 1996; 10: 613-630[CrossRef].
9. Jaeschke R, Guyatt GH, Sackett DL, for the Evidence-Based Medicine Working Group. Users' guides to the medical literature. VI. How to use an article about a diagnostic test. B: What are the results and will they help me in caring for my patients? JAMA 1994; 271: 703-707[CrossRef][Medline].
10. Jaeschke R, Guyatt GH, Sackett DL, for the Evidence-Based Medicine Working Group. Users' guides to the medical literature. VI. How to use an article about a diagnostic test. A: Are the results of the study valid? JAMA 1994; 271: 289-291[Abstract].
11. Mulrow CD, Linn WD, Gaul MK, Pugh JA. Assessing the quality of a diagnostic test evaluation. J Gen Intern Med 1989; 4: 288-295[Medline].
12. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 1995; 274: 645-651[Abstract].
13. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JHP, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999; 282: 1061-1066[Abstract/Free Full Text].
14. Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG, eds. Systematic reviews in health care: meta-analysis in context 2nd ed. London: BMJ Books, 2001.
15. Smith-Bindman R, Kerlikowske K, Feldstein VA, Subak L, Scheidler J, Segal M, et al. Endovaginal ultrasound to exclude endometrial cancer and other endometrial abnormalities. JAMA 1998; 280: 1510-1517[Abstract/Free Full Text].
16. Devillé W, Yzermans N, Bouter LM, Bezemer PD, van der Windt DAWM. Heterogeneity in systematic reviews of diagnostic studies. In: Proceedings of the 2nd symposium on systematic reviews: beyond the basics. Oxford , 1999. Abstract available at on www.ihs.ox.ac.uk/csm/talks.html#p21 (accessed 27 March 2001).
17. Ingelfinger JA, Mosteller F, Thibodeau LA, Ware JH. Biostatistics in clinical medicine 3rd ed. New York: McGraw-Hill, 1994:26-50.
18. Shapiro DE. Issues in combining independent estimates of the sensitivity and specificity of a diagnostic test. Acad Radiol 1995; 2: 37-47S.
19. Moses LE, Littenberg B, Shapiro D. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytical approaches and some additional considerations. Stat Med 1993; 12: 1293-1316[Medline].
20. Kardaun JWPF, Kardaun OJWF. Comparative diagnostic performance of three radiological procedures for the detection of lumbar disk herniation. Meth Info Med 1990; 29: 12-22.
21. Littenberg B, Moses LE. Estimating diagnostic accuracy from multiple conflicting reports: a new meta-analytical method. Med Decis Making 1993; 13: 313-321.
22. Oosterhuis WP, Niessen RW, Bossuyt PM. The science of systematically reviewing studies of diagnostic tests. Clin Chem Lab Med 2000; 38: 577-588[CrossRef][Medline].
23. Deeks JJ. Using evaluations of diagnostic tests: understanding their limitations and making the most of available evidence. Ann Oncol 1999; 10: 761-768[Free Full Text].
24. Guyatt GH, Tugwell P, Feeny DH, Haynes RB, Drummond M. A framework for clinical evaluation of diagnostic technologies. Can Med Assoc J 1986; 134: 587-594[Abstract].


© BMJ 2001

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

Relevant Articles

Grading quality of evidence and strength of recommendations for diagnostic tests and strategies
Holger J Schünemann, Andrew D Oxman, Jan Brozek, Paul Glasziou, Roman Jaeschke, Gunn E Vist, John W Williams, Jr, Regina Kunz, Jonathan Craig, Victor M Montori, Patrick Bossuyt, Gordon H Guyatt for the GRADE Working Group
BMJ 2008 336: 1106-1110. [Extract] [Full Text] [PDF]

A meta-analysis of the diagnostic performance of the direct agglutination test and rK39 dipstick for visceral leishmaniasis
François Chappuis, Suman Rijal, Alonso Soto, Joris Menten, and Marleen Boelaert
BMJ 2006 333: 723. [Abstract] [Full Text] [PDF]

Systematic reviews of diagnostic tests in cancer: review of methods and reporting
Susan Mallett, Jonathan J Deeks, Steve Halligan, Sally Hopewell, Victoria Cornelius, and Douglas G Altman
BMJ 2006 333: 413. [Abstract] [Full Text] [PDF]

Systematic review and meta-analysis of strategies for the diagnosis of suspected pulmonary embolism
Pierre-Marie Roy, Isabelle Colombet, Pierre Durieux, Gilles Chatellier, Hervé Sors, and Guy Meyer
BMJ 2005 331: 259. [Abstract] [Full Text] [PDF]

Systematic comparison of four sources of drug information regarding adjustment of dose for renal function
Liat Vidal, Maya Shavit, Abigail Fraser, Mical Paul, and Leonard Leibovici
BMJ 2005 331: 263. [Abstract] [Full Text] [PDF]

Systematic reviews of evaluations of diagnostic and screening tests
Nicole Jill-Marie Blackman, Gerben ter Riet, Alphons G H Kessels, and Lucas M Bachmann
BMJ 2001 323: 1188. [Extract] [Full Text]

This article has been cited by other articles:

  • Anthony, P. S. (2008). Nutrition Screening Tools for Hospitalized Patients. Nutr Clin Pract 23: 373-382 [Abstract] [Full text]  
  • Schunemann, H. J, Oxman, A. D, Brozek, J., Glasziou, P., Jaeschke, R., Vist, G. E, Williams, J. W Jr, Kunz, R., Craig, J., Montori, V. M, Bossuyt, P., Guyatt, G. H, for the GRADE Working Group, (2008). Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ 336: 1106-1110 [Full text]  
  • Elamin, M. B., Murad, M. H., Mullan, R., Erickson, D., Harris, K., Nadeem, S., Ennis, R., Erwin, P. J., Montori, V. M. (2008). Accuracy of Diagnostic Tests for Cushing's Syndrome: A Systematic Review and Metaanalyses. J. Clin. Endocrinol. Metab. 93: 1553-1562 [Abstract] [Full text]  
  • Verhagen, T.E.M., Hendriks, D.J., Bancsi, L.F.J.M.M., Mol, B.W.J., Broekmans, F.J.M. (2008). The accuracy of multivariate models predicting ovarian reserve and pregnancy after in vitro fertilization: a meta-analysis. Hum Reprod Update 14: 95-100 [Abstract] [Full text]  
  • Zemek, R. L., Bhogal, S. K., Ducharme, F. M. (2008). Systematic Review of Randomized Controlled Trials Examining Written Action Plans in Children: What Is the Plan?. Arch Pediatr Adolesc Med 162: 157-163 [Abstract] [Full text]  
  • Robinson, C., Halligan, S., Taylor, S. A., Mallett, S., Altman, D. G. (2008). CT Colonography: A Systematic Review of Standard of Reporting for Studies of Computer-aided Detection. Radiology 246: 426-433 [Abstract] [Full text]  
  • Hegedus, E J, Goode, A, Campbell, S, Morin, A, Tamaddoni, M, Moorman, C T III, Cook, C (2008). Physical examination tests of the shoulder: a systematic review with meta-analysis of individual tests. Br. J. Sports. Med. 42: 80-92 [Abstract] [Full text]  
  • Matulis, G, Juni, P, Villiger, P M, Gadola, S D (2008). Detection of latent tuberculosis in immunosuppressed patients with autoimmune diseases: performance of a Mycobacterium tuberculosis antigen-specific interferon {gamma} assay. Ann Rheum Dis 67: 84-90 [Abstract] [Full text]  
  • Abdulla, J., Abildstrom, S. Z., Gotzsche, O., Christensen, E., Kober, L., Torp-Pedersen, C. (2007). 64-multislice detector computed tomography coronary angiography as potential alternative to conventional coronary angiography: a systematic review and meta-analysis. Eur Heart J 28: 3042-3050 [Abstract] [Full text]  
  • Guerriero, S., Ajossa, S., Piras, S., Gerada, M., Floris, S., Garau, N., Minerba, L., Paoletti, A. M., Melis, G. B. (2007). Three-Dimensional Quantification of Tumor Vascularity as a Tertiary Test After B-Mode and Power Doppler Evaluation for Detection of Ovarian Cancer. J Ultrasound Med 26: 1271-1278 [Abstract] [Full text]  
  • Thangaratinam, S., Daniels, J., Ewer, A. K, Zamora, J., Khan, K. S (2007). Accuracy of pulse oximetry in screening for congenital heart disease in asymptomatic newborns: a systematic review. Arch. Dis. Child. Fetal Neonatal Ed. 92: F176-F180 [Abstract] [Full text]  
  • Mupparapu, M., Kim, I. H. (2007). Calcified carotid artery atheroma and stroke: A systematic review. Journal of the American Dental Association 138: 483-492 [Abstract] [Full text]  
  • Harbord, R. M., Deeks, J. J., Egger, M., Whiting, P., Sterne, J. A. C. (2007). A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8: 239-251 [Abstract] [Full text]  
  • Halligan, S., Altman, D. G. (2007). Evidence-based Practice in Radiology: Steps 3 and 4--Appraise and Apply Systematic Reviews and Meta-Analyses. Radiology 243: 13-27 [Abstract] [Full text]  
  • Medina, L. S., Bernal, B., Ruiz, J. (2007). Role of Functional MR in Determining Language Dominance in Epilepsy and Nonepilepsy Populations: A Bayesian Analysis. Radiology 242: 94-100 [Abstract] [Full text]  
  • Guha, I N, Parkes, J, Roderick, P R, Harris, S, Rosenberg, W M (2006). Non-invasive markers associated with liver fibrosis in non-alcoholic fatty liver disease.. Gut 55: 1650-1660 [Full text]  
  • Broekmans, F.J., Kwee, J., Hendriks, D.J., Mol, B.W., Lambalk, C.B. (2006). A systematic review of tests predicting ovarian reserve and IVF outcome. Hum Reprod Update 12: 685-718 [Abstract] [Full text]  
  • Chappuis, F., Rijal, S., Soto, A., Menten, J., Boelaert, M. (2006). A meta-analysis of the diagnostic performance of the direct agglutination test and rK39 dipstick for visceral leishmaniasis. BMJ 333: 723- [Abstract] [Full text]  
  • Greco, S, Girardi, E, Navarra, A, Saltini, C (2006). Current evidence on diagnostic accuracy of commercially based nucleic acid amplification tests for the diagnosis of pulmonary tuberculosis. Thorax 61: 783-790 [Abstract] [Full text]  
  • Mallett, S., Deeks, J. J, Halligan, S., Hopewell, S., Cornelius, V., Altman, D. G (2006). Systematic reviews of diagnostic tests in cancer: review of methods and reporting. BMJ 333: 413- [Abstract] [Full text]  
  • Granchi, D., Pellacani, A., Spina, M., Cenni, E., Savarino, L. M., Baldini, N., Giunti, A. (2006). Serum Levels of Osteoprotegerin and Receptor Activator of Nuclear Factor-{kappa}B Ligand as Markers of Periprosthetic Osteolysis. JBJS 88: 1501-1509 [Abstract] [Full text]  
  • Moayyedi, P., Talley, N. J., Fennerty, M. B., Vakil, N. (2006). Can the Clinical History Distinguish Between Organic and Functional Dyspepsia?. JAMA 295: 1566-1576 [Abstract] [Full text]  
  • Savarino, L., Granchi, D., Cenni, E., Baldini, N., Greco, M., Giunti, A. (2005). Systemic cross-linked N-terminal telopeptide and procollagen I C-terminal extension peptide as markers of bone turnover after total hip arthroplasty. J Bone Joint Surg Br 87-B: 571-576 [Abstract] [Full text]  
  • Casazza, G. (2004). A note on meta-analysis of diagnostic test studies.. Arch Dermatol 140: 363-364 [Full text]  
  • Gilbert, D. L., Sethuraman, G., Kotagal, U., Buncher, C. R. (2003). Meta-analysis of EEG test performance shows wide variation among studies. Neurology 60: 564-570 [Abstract] [Full text]  
  • Cooper, N J, Sutton, A J, Abrams, K R (2002). Decision analytical economic modelling within a Bayesian framework: application to prophylactic antibiotics use for caesarean section. Stat Methods Med Res 11: 491-512 [Abstract]  
  • Steurer, J., Fischer, J. E, Bachmann, L. M, Koller, M., ter Riet, G. (2002). Communicating accuracy of tests to general practitioners: a controlled study. BMJ 324: 824-826 [Abstract] [Full text]  
  • Blackman, N. J.-M., ter Riet, G., Kessels, A. G H, Bachmann, L. M (2001). Systematic reviews of evaluations of diagnostic and screening tests. BMJ 323: 1188-1188 [Full text]  

Rapid Responses:

Read all Rapid Responses

Recalculating data doesn't add information
G H Hall
bmj.com, 21 Jul 2001 [Full text]
Errors in Figure 2
Frank Shann
bmj.com, 27 Jul 2001 [Full text]
unconditional likelihood ratios are not constants
Gerben ter Riet, et al.
bmj.com, 26 Jul 2001 [Full text]
Corrections to Figure 2
Jonathan J Deeks
bmj.com, 27 Jul 2001 [Full text]
Odds ratio not prevalence independent
Nicole Jill-Marie Blackman
bmj.com, 9 Aug 2001 [Full text]
Effect of intra and inter-observer variation on systematic reviews of evaluations of clinical tests
J Bernades, et al.
bmj.com, 5 Sep 2001 [Full text]



Student BMJ

Sepsis

The latest guidlines will affect how we practice medicine

www.student.bmj.com

Listen to the latest BMJ Interview