BMJ 1999;318:1322-1323 ( 15 May )

Papers

Reporting of precision of estimates for diagnostic accuracy: a review

Robert Harper, principal optometrist a Barnaby Reeves, senior lecturer b

a Department of Ophthalmology, Manchester Royal Eye Hospital, Manchester M13 9WH, b Health Services Research Unit, London School of Hygiene and Tropical Medicine, London WC1E 7HT

Correspondence to: Dr Harper robert.harper{at}man.ac.uk

Diagnostic accuracy is usually characterised by the sensitivity and specificity of a test, and these indices are most commonly presented when evaluations of diagnostic tests are reported. It is important to emphasise that, as in other empirical studies, specific values of diagnostic accuracy are merely estimates. Therefore, when evaluations of diagnostic accuracy are reported the precision of the sensitivity and specificity estimates or likelihood ratios should be stated.1-3 If sensitivity and specificity estimates are reported without a measure of precision, clinicians cannot know the range within which the true values of the indices are likely to lie.

Confidence intervals are widely used in medical literature, and journals usually require confidence intervals to be specified for other descriptive estimates and for epidemiological or experimental analytical comparisons. Journals seem less vigilant, however, for evaluations of diagnostic accuracy. For example, a recent review of compliance with methodological standards in diagnostic test research found that for the period 1978-93 only 12 of 112 studies published in the New England Journal of Medicine, JAMA, the BMJ, and the Lancet reported the precision of the estimates of diagnostic accuracy.3 We have found that the reporting of 95% confidence intervals for estimates is somewhat better in a more recent 2 year interval for studies published in the BMJ but still far from ideal.

    Methods and results
Top
Methods and results
Comment
References

We searched the Medline database (for 1996 and 1997) for reports of diagnostic evaluations in the BMJ. After we excluded letters, case reports, and review or education articles we identified 16 studies (references supplied on request). Only eight (95% confidence interval 25% to 75%) papers reported precision for the estimates of diagnostic accuracy, with two of these studies providing confidence intervals only for either predictive power values or likelihood ratios but not for the sensitivity or specificity estimates also reported.

    Comment
Top
Methods and results
Comment
References

Evaluations of diagnostic accuracy should be prescribed with confidence intervals. We have also recently reviewed the extent of compliance with the reporting of confidence intervals in the ophthalmic literature and concluded that evaluations of diagnostic tests in this specialty are similarly flawed.4 The omission of the precision of estimates for diagnostic accuracy can make a considerable difference to a clinician's interpretation of the findings of a study. For example, an evaluation of the sensitivity and specificity of an imaging system for the optic nerve head for the detection of glaucoma reported estimates of 89% and 78%, respectively5; the 95% confidence intervals of these estimates (not reported in the paper) ranged from 80% to 98% for sensitivity and from 66% to 90% for specificity. For a test with poorer diagnostic accuracy, these 95% confidence intervals would have been even larger for an equivalent sample size because of the dependence of the standard error of a proportion on the proportion itself (figure). The figure shows how the precision of the sensitivity or specificity estimate varies as a function of both the point estimate itself and the sample size.



View larger version (31K):
[in this window]
[in a new window]
 
Breadth of exact binomial 95% confidence intervals as function of sample estimate of proportion of interest and sample size; from outside to centre, pairs of lines represent sample sizes of 20, 40, 60, 100, 200, and 500. Note 95% confidence interval is widest for proportion equal to 0.5 and narrows as proportion tends to 0 or 1. To use figure, read off upper and lower 95% confidence intervals and simply add and subtract sample estimate---for example, a sample estimate of 0.5, based on sample size of 100, has 95% confidence interval that ranges from 0.5-0.1 to 0.5+0.1 (0.4 to 0.6)

Most statistical packages will generate exact binomial confidence intervals. Approximate confidence intervals can easily be calculated by using the formula for the SE of a proportion (radical pq/n), which is based on a binomial approximation to the normal distribution and can be used to calculate 95% confidence intervals for sensitivity and specificity (for instance, p±1.96radical pq/n, where p represents either sensitivity or specificity, q=1-p, n is the sample size, and where n×p is >10).

To enhance the quality of information on diagnostic tests made available to clinicians we recommend that 95% confidence intervals are supplied with estimates of diagnostic accuracy. Referees and journal editors should enforce this requirement in the same way as they routinely do for other descriptive or comparative estimates.

    Acknowledgments

   Contributors: RH and BR both contributed to the idea and the methods. RH carried out the search and reviewed the papers, and BR performed the calculations to develop the figure. RH and BR jointly drafted and revised the paper and are both guarantors.

    Footnotes

Funding: No external funding.

Competing interests: None declared.

    References
Top
Methods and results
Comment
References

1. Jaeschke A, Guyatt GH, Sackett DL, for the Evidence-based Medicine Working Group. Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? JAMA 1994; 271: 389-391[Medline].
2. Jaeschke A, Guyatt GH, Sackett DL, for the Evidence-based Medicine Working Group. Users' guides to the medical literature. III. How to use an article about a diagnostic test B. What are the results and will they help me in caring for patients? JAMA 1994; 271: 703-707[Medline].
3. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research: getting better but still not good. JAMA. 1995; 274: 645-651[Abstract].
4. Harper R, Reeves B. Compliance with methodological standards when evaluating ophthalmic diagnostic tests. Optom Vis Sci 1998; 75: 78.
5. Mikelberg FS, Parfitt CM, Swindale SL, Graham SL, Drance SM, Gosine R. Ability of the Heidelberg retina tomograph to detect early glaucomatous visual field loss. J Glaucoma 1995; 4: 242-247.

(Accepted 15 December 1998)


© BMJ 1999

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

Related Article

Estimates of diagnostic accuracy should give confidence intervals
BMJ 1999 318: 0. [Full Text]

This article has been cited by other articles:

  • Bochmann, F., Johnson, Z., Azuara-Blanco, A. (2007). Sample size in studies on diagnostic accuracy in ophthalmology: a literature survey. Br. J. Ophthalmol. 91: 898-900 [Abstract] [Full text]  
  • Lykins, E. L.B., Pavlik, E. L., Andrykowski, M. A. (2007). Validity of Self-Reports of Return for Routine Repeat Screening in an Ovarian Cancer Screening Program. Cancer Epidemiol. Biomarkers Prev. 16: 490-493 [Abstract] [Full text]  
  • Shunmugam, M., Azuara-Blanco, A. (2006). The quality of reporting of diagnostic accuracy studies in glaucoma using the heidelberg retina tomograph.. IOVS 47: 2317-2323 [Abstract] [Full text]  
  • Vander Cruyssen, B, Hoffman, I E A, Zmierczak, H, Van den Berghe, M, Kruithof, E, De Rycke, L, Mielants, H, Veys, E M, Baeten, D, De Keyser, F (2005). Anti-citrullinated peptide antibodies may occur in patients with psoriatic arthritis. Ann Rheum Dis 64: 1145-1149 [Abstract] [Full text]  
  • Reeves, B C (2005). Evidence about evidence. Br. J. Ophthalmol. 89: 253-254 [Full text]  
  • Siddiqui, M A R, Azuara-Blanco, A, Burr, J (2005). The quality of reporting of diagnostic accuracy studies published in ophthalmic journals. Br. J. Ophthalmol. 89: 261-265 [Abstract] [Full text]  
  • Hoffman, I. E.A., Peene, I., Pottel, H., Union, A., Hulstaert, F., Meheus, L., Lebeer, K., De Clercq, L., Schatteman, L., Poriau, S., Mielants, H., Veys, E. M., De Keyser, F. (2005). Diagnostic Performance and Predictive Value of Rheumatoid Factor, Anti-citrullinated Peptide Antibodies, and the HLA Shared Epitope for Diagnosis of Rheumatoid Arthritis. Clin. Chem. 51: 261-263 [Full text]  
  • Valentini, A. L., De Gaetano, A. M., Minordi, L. M., Nanni, G., Citterio, F., Viggiano, A. M., Tancioni, V., Destito, C. (2004). Contrast-enhanced Voiding US for Grading of Reflux in Adult Patients Prior to Antireflux Ureteral Implantation. Radiology 233: 35-39 [Abstract] [Full text]  
  • Hugues, G., Olivier, L., Benoit, G., Allaouchiche, B., Boselli, E., Gibot, S., Levy, B., Bene, M.-C. (2004). Soluble TREM-1 and the Diagnosis of Pneumonia. NEJM 350: 1904-1905 [Full text]  
  • Medina, L. S., Zurakowski, D. (2003). Measurement Variability and Confidence Intervals in Medicine: Why Should Radiologists Care?. Radiology 226: 297-301 [Abstract] [Full text]  
  • Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Moher, D., Rennie, D., de Vet, H. C.W., Lijmer, J. G. (2003). The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration. ANN INTERN MED 138: W1-W12 [Abstract] [Full text]  
  • Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Moher, D., Rennie, D., de Vet, H. C.W., Lijmer, J. G. (2003). The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration. Clin. Chem. 49: 7-18 [Abstract] [Full text]  
  • Karlik, S. J. (2003). Exploring and Summarizing Radiologic Data. Am. J. Roentgenol. 180: 47-54 [Full text]  
  • Hoffman, I. E.A., Peene, I., Veys, E. M., De Keyser, F. (2002). Detection of Specific Antinuclear Reactivities in Patients with Negative Anti-nuclear Antibody Immunofluorescence Screening Tests. Clin. Chem. 48: 2171-2176 [Abstract] [Full text]  
  • Gazzolo, D., Bruschettini, M., Lituania, M., Serra, G., Bonacci, W., Michetti, F. (2001). Increased Urinary S100B Protein as an Early Indicator of Intraventricular Hemorrhage in Preterm Infants: Correlation with the Grade of Hemorrhage. Clin. Chem. 47: 1836-1838 [Full text]  
  • Fritz, J. M, Wainner, R. S (2001). Examining Diagnostic Tests: An Evidence-Based Perspective. ptjournal 81: 1546-1564 [Abstract] [Full text]  
  • Harper, R., Reeves, B. (1999). Compliance with Methodological Standards When Evaluating Ophthalmic Diagnostic Tests. IOVS 40: 1650-1657 [Abstract] [Full text]  

Rapid Responses:

Read all Rapid Responses

Do most statistical packages generate exact confidence intervals?
Lars Bõcklund
bmj.com, 18 May 1999 [Full text]
Reporting on diagnostic accuracy
David Bruns
bmj.com, 19 May 1999 [Full text]
A program for exact confidence intervals for proportions.
Martin Bland
bmj.com, 21 May 1999 [Full text]
Confidence intervals not for everythig
Ildefonso Hernadez-Aguado
bmj.com, 20 May 1999 [Full text]
Diagnostic Performance Statistics
Roderick Mackenzie
bmj.com, 30 May 1999 [Full text]
Comment on precision of diagnosis
P B Pynsent
bmj.com, 17 Jun 1999 [Full text]



Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview