BMJ  2005;330:724-726 (26 March), doi:10.1136/bmj.330.7493.724

Education and debate

Evidence based diagnostics

Christian Gluud, head of department1, Lise Lotte Gluud, specialist registrar1

1 Cochrane Hepato-Biliary Group, Copenhagen Trial Unit, Centre for Clinical Intervention Research, H:S Rigshospitalet, Copenhagen University Hospital, DK-2100 Copenhagen, Denmark

Correspondence to: C Gluud cgluud{at}ctu.rh.dk

Diagnostic tests are often much less rigorously evaluated than new drugs. It is time to ensure that the harms and benefits of new tests are fully understood

Introduction

No international consensus exists on the methods for assessing diagnostic tests. Previous recommendations stress that studies of diagnostic tests should match the type of diagnostic question.1 2 Once the specificity and sensitivity of a test have been established, the final question is whether tested patients fare better than similar untested patients. This usually requires a randomised trial. Few tests are currently evaluated in this way. In this paper, we propose an architecture for research into diagnostic tests that parallels the established phases in drug research.

Stages of research

We have divided studies of diagnostic tests into four phases (box). We use research on brain natriuretic peptide for diagnosing heart failure as an illustrative example.2 However, the architecture is applicable to a wide range of tests including laboratory techniques, diagnostic imaging, pathology, evaluation of disability, electrodiagnostic tests, and endoscopy.

Establishing the normal range

In drug research, phase I studies deal with pharmacokinetics, pharmacodynamics, and safe doses.3 Phase I diagnostic studies are done to determine the range of results obtained with a newly developed test in healthy people. For example, after development of a test to measure brain natriuretic peptide in human plasma, phase I studies were done to establish the normal range of values in healthy participants.4 5



The harms and benefits of diagnostic tests needs evaluating—just as drugs do

Credit: GUSTO/SPL

 

Diagnostic phase I studies must be large enough to examine the potential influence of characteristics such as sex, age, time of day, physical activity, and exposure to drugs. The studies are relatively quick, cheap, and easy to conduct, but they may occasionally raise ethical problems—for example, finding abnormal results in an apparently healthy person.6

Diagnostic accuracy

In phase II, studies explore the diagnostic accuracy of a test in participants with both known and suspected relevant disease. Phase IIa studies compare test results in participants with disease diagnosed by a standard method with those in healthy participants (from diagnosis to test result). For example, a phase IIa study found significantly raised concentrations of brain natriuretic peptide in participants with left ventricular dysfunction diagnosed by echocardiography (median 493.5 (range 248.9-909.0) pg/ml) compared with healthy participants (129.4 (53.6-159.7) pg/ml).7 Subsequently, brain natriuretic peptide was recommended as a useful diagnostic aid for left ventricular dysfunction.7

After an association has been found between test results and a certain disease, phase IIb studies may be done to examine whether test results are related to the severity of a disease. For example, in a phase IIb study, brain natriuretic peptide concentrations were measured in healthy participants and participants with congestive heart failure.8 The study found a linear relation between test values and the degree of ventricular dysfunction. The authors concluded that the concentration of brain natriuretic peptide is a good indicator of the severity of chronic heart failure.8 However, the design only allows inferences about how a test works under ideal conditions.

Phase IIc studies examine the predictive value of a test among people with suspected disease (from test results to diagnosis). For example, a phase IIc study measured brain natriuretic peptide concentrations in participants with suspected heart disease.9 All participants had transthoracic echocardiography. The results showed raised concentrations of brain natriuretic peptide in participants with left ventricular systolic dysfunction (median 79.4 (interquartile range 35.9-151.0) pg/ml) compared with those with normal ventricular systolic function (26.7 (12.2-54.3) pg/ml).9 A concentration > 17.9 pg/ml had a sensitivity of 88% and specificity of 34%. Choosing different cut-off points did not improve the predictive characteristics.


Four phases in architecture of diagnostic research

Phase I—Determining the normal range of values for a diagnostic test though observational studies in healthy people

Phase II—Determining the diagnostic accuracy through case-control studies, including healthy people and (a) people with known disease assessed by diagnostic standard and (b) people with suspected disease

Phase III—Determining the clinical consequences of introducing a diagnostic test through randomised trials

Phase IV—Determining the effects of introducing a new diagnostic test into clinical practice by surveillance in large cohort studies


The authors concluded that measuring brain natriuretic peptide in addition to routine investigations provides a small diagnostic advantage.9 However, the characteristics of the test may be different in other settings. A narrative review summarised several phase II studies on brain natriuretic peptides for diagnosing left ventricular systolic dysfunction.10 The studies found that sensitivity ranges from 26% to 92% and specificity from 34% to 89%. The predictive ability seemed to depend on sex, and the test performed less well in community based studies than in referral series.

Several concerns surround the validity and applicability of phase II studies. Two of the most important concerns are blinded evaluations of test results and selection of cut-off values or limits for normal values.2 To improve the quality of reporting of studies of diagnostic tests, the Standards for Reporting of Diagnostic Accuracy (STARD) Initiative was launched.11 Checklists and flowcharts were developed to aid authors of phase II studies. Future studies are planned to evaluate the effect of the initiative.

Clinical effects

In some cases, the value of a diagnostic test is self evident—for example, in genetic testing. However, for most diagnostic tests, phase III studies are necessary to evaluate the beneficial and harmful effects of implementing a new test. The potential effects depend on how the information is used in subsequent clinical decisions. In phase III diagnostic studies, randomisation determines whether participants have the test or not. In some randomised trials, the result of the test may be used to determine a specific clinical course, including treatment. Alternatively, knowledge of a test result may be incorporated into standard clinical practice and treatment strategies remain unchanged.

A phase III study compared the effect of using brain natriuretic peptide concentrations or clinical assessment to guide treatment.12 The study included 69 participants with impaired systolic function and symptomatic heart failure. Participants were randomised to receive treatment guided by brain natriuretic peptide concentrations or by a clinical score of symptoms and signs of heart failure. Fewer deaths, hospital admissions, and cases of decompensation of heart failure occurred among participants whose treatment was guided by brain natriuretic peptide values than among those whose treatment was guided by clinical score.

The study shows the way for diagnostic research. However, the interpretation of the results is not simple. Larger trials with the most recently developed drugs are necessary before the test is implemented in clinical practice. The benefits and harms of the test in other settings—for example, in screening for asymptomatic left ventricular dysfunction—also seem relevant.

Methodological issues also arise. Estimation of required sample size is difficult in diagnostic trials.13 In randomised trials comparing two binary diagnostic tests, patients in the two arms with concordant results will not contribute to the final difference. Sample size estimations in such trials therefore include discordance rates. Other methodological aspects are similar to those in randomised drug trials. In both trial types, methods for adequate generation of the allocation sequence, allocation concealment, and blinding deserve attention.14 When several randomised trials on diagnostic tests are completed, systematic reviews and possibly meta-analyses are warranted.15

Long term consequences

Logistical problems such as storage, freezing, and thawing of samples or poor calibration of equipment may affect the accuracy of a diagnostic test after it is introduced into routine clinical practice. Several factors, such as a change in diagnostic indications, may influence the circumstances under which a test is used. Phase IV studies are therefore needed to determine whether the diagnostic accuracy of a test in practice corresponds to predictions from systematic reviews of phase III trials.

Phase IV studies include large cohorts of consecutive participants. Regular reports on regional, national, and international quality and bench markings may also help improve quality of testing in clinical practice. Phase IV diagnostic studies are an important aid in quality assurance and quality development and are necessary to identify rare adverse events.16

Conclusion

Few will argue that valid evidence is necessary before we introduce new drugs in clinical practice. The randomised trial is the best method for comparing interventions. Randomised trials are also necessary to evaluate the potential effects of introducing a diagnostic test. Unfortunately, few randomised trials deal with diagnostic tests. We searched the Cochrane Central Register of Controlled Trials (Issue 1, 2005) and found that only 4.2% (18 366 of 435 786 records) dealt with diagnostic tests or screening. Awareness of the need for evidence based diagnostic testing must be increased. Organisations such as the Cochrane Collaboration can help by improving facilities for and methodological quality of systematic reviews of diagnostic tests.

The demand for diagnostic phase III and phase IV studies is increasing with the continuous development of new diagnostic methods. Although defensive use of diagnostic tests improves clinical outcomes for some patients, it worsens clinical outcomes for others.17 The four temporal phases of research provide a logical, stepwise procedure for development of diagnostic tests. However, the four phases do not apply to all diagnostic tests or provide an adequate basis for all types of diagnostic studies. Furthermore, one type of study may occur in several phases. The phase concept is meant as a guide that may be adjusted according to individual circumstances.


Summary points

The harms and benefits of diagnostic tests should be fully evaluated before they are used in clinical practice

A four phase process of assessment is suggested, mirroring that used for new drugs

The first phase focuses on establishing the normal range

The second phase focuses on establishing sensitivity and specificity and other measures of diagnostic accuracy

Randomised trials are then needed to determine whether patients benefit from the testing

The final phase is large continuous surveillance studies to identify consequences of testing in clinical practice



Contributors and sources: CG directs The Copenhagen Trial Unit, a non-specialty oriented centre for clinical intervention research and studies random and systematic errors in clinical research. LLG studies random and systematic errors in clinical research. CG and LLG are physicians and editors of the Cochrane Hepato-Biliary Group. The literature came from unsystematic and systematic searches of PubMed, The Cochrane Library, and personal files. CG drafted and LLG revised the paper. CG is the guarantor.

Competing interests: None declared.

References

  1. Feinstein AR. Clinical epidemiology. The architecture of clinical research. Philadelphia: WB Saunders, 1985.
  2. Sackett D, Haynes RB. The architecture of diagnostic research. BMJ 2002;324: 539-41.[Free Full Text]
  3. International Conference on Harmonisation Steering Committee. ICH harmonised tripartite guideline. General considerations for clinical trials. http://www.ich.org/MediaServer.jser?@_ID=484&@_MODE=GLB (accessed 29 Jan 2005).
  4. Ationu A, Carter ND. Brain and atrial natriuretic peptide plasma concentrations in normal healthy children. Br J Biomed Sci 1993;50: 92-5.[ISI][Medline]
  5. Jensen KT, Carstens J, Ivarsen P, Pedersen EB. A new, fast and reliable radioimmunoassay of brain natriuretic peptide in human plasma. Reference values in healthy subjects and in patients with different diseases. Scand J Clin Lab Invest 1997;57: 529-40.[ISI][Medline]
  6. Illes J, Desmond JE, Huang LF, Raffin TA, Atlas SW. Ethical and practical considerations in managing incidental findings in functional magnetic resonance imaging. Brain Cognition 2002;50: 358-65.[CrossRef][ISI][Medline]
  7. Talwar S, Sieberhofer A, Williams B, Ng L. Influence of hypertension, left ventricular hypertrophy, and left ventricular systolic dysfunction on plasma N terminal pre-BNP. Heart 2000;83: 278-82.[Abstract/Free Full Text]
  8. Selvais PL, Donckier JE, Robert A, Laloux O, van Linden F, Ahn S, et al. Cardiac natriuretic peptides for diagnosis and risk stratification in heart failure. Eur J Clin Invest 1998;28: 636-42.[CrossRef][ISI][Medline]
  9. Landray MJ, Lehman R, Arnold I. Measuring brain natriuretic peptide in suspected left ventricular systolic dysfunction in general practice: cross-sectional study. BMJ 2000;320: 985-6.[Free Full Text]
  10. Wang TJ, Levy D, Benjamin EJ, Vasan RS. The epidemiology of "asymptomatic" left ventricular systolic dysfunction: implications for screening. Ann Intern Med 2003;138: 907-16.[Abstract/Free Full Text]
  11. Bossuyt PM, Retisma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. www.consort-statement.org/stardstatement.htm (accessed 7 Jun 2004).
  12. Thoughton RW, Frampton CM, Yandle TG, Espiner EA, Nicholls MG, Richards AM. Treatment of heart failure guided by plasma aminoterminal brain natriuretic peptide (N-BNP) concentrations. Lancet 2000;355: 1126-30.[CrossRef][ISI][Medline]
  13. Lijmer JG, Bossuyt PM. Diagnostic testing and prognosis: the randomised controlled trial in diagnostic research. In: Knottnerus JA, ed. The evidence base of clinical diagnosis. How to do diagnostic research. London: BMJ Books, 2002: 61-80.
  14. Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med 2001;135: 982-9.[Abstract/Free Full Text]
  15. Cochrane Collaboration. Cochrane Screening and Diagnostic Tests Methods Group. Cochrane Library. Issue 2. Oxford: Update Software, 2003.
  16. Knottnerus JA. Epilogue: overview of evaluation strategy and challenges. In: Knottnerus JA, ed. The evidence base of clinical diagnosis. How to do diagnostic research. London: BMJ Books, 2002: 209-15.
  17. DeKay ML, Asch DA. Is the defensive use of diagnostic tests good for patients, or bad? Med Decis Making 1998;18: 19-28.
(Accepted 24 January 2005)


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

Relevant Articles

Licensing diagnostic tests may benefit everyone
Eric S Kilpatrick
BMJ 2005 330: 1330. [Extract] [Full Text]

New diagnostic tests need more rigorous evaluation
BMJ 2005 330: 0. [Full Text]

Heading where exactly?
Fiona Godlee
BMJ 2005 330: 0. [Extract] [Full Text] [PDF]

This article has been cited by other articles:

  • Lefievre, L., Bedu-Addo, K., Conner, S. J, Machado-Oliveira, G. S M, Chen, Y., Kirkman-Brown, J. C, Afnan, M. A, Publicover, S. J, Ford, W C. L, Barratt, C. L R (2007). Counting sperm does not add up any more: time for a new equation?. Reproduction 133: 675-684 [Abstract] [Full text]  
  • Harbord, R. M., Deeks, J. J., Egger, M., Whiting, P., Sterne, J. A. C. (2007). A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8: 239-251 [Abstract] [Full text]  
  • van Zaane, B., Nierich, A. P., Buhre, W. F., Brandon Bravo Bruinsma, G. J., Moons, K. G. M. (2007). Resolving the blind spot of transoesophageal echocardiography: a new diagnostic device for visualizing the ascending aorta in cardiac surgery. Br J Anaesth 98: 434-441 [Abstract] [Full text]  
  • Sanchini, M. A., Nanni, O., Calistri, D. (2006). Urine Telomerase and Bladder Cancer Detection--Reply. JAMA 295: 999-999 [Full text]  
  • Kilpatrick, E. S (2005). Licensing diagnostic tests may benefit everyone. BMJ 330: 1330-1330 [Full text]  

Rapid Responses:

Read all Rapid Responses

Analytical problems (not only) with BNP determining
Rudolf Gasko
bmj.com, 26 Mar 2005 [Full text]
Licensing diagnostic tests may benefit everyone
Eric S Kilpatrick
bmj.com, 26 Mar 2005 [Full text]
Use established methods for evaluating diagnostic methods
Sten Öhman
bmj.com, 29 Mar 2005 [Full text]
Multivariable approaches should not be left out in evidence based diagnostics
Gerben ter Riet, et al.
bmj.com, 4 Apr 2005 [Full text]
Diagnostics is a sea: it’s time to navigate into it
Giuseppe Giocoli
bmj.com, 1 Apr 2005 [Full text]
Evidence based diagnostics: what about customary clinical tests
Alireza Moayyeri, et al.
bmj.com, 11 Apr 2005 [Full text]
Evidence based diagnostics and genetic testing
Simon P Sanderson, et al.
bmj.com, 15 Apr 2005 [Full text]
Authors' reply
Christian Gluud, et al.
bmj.com, 24 May 2005 [Full text]
Cost-effectiveness should be included in diagnostic research
Joanna D Schaafsma, et al.
bmj.com, 3 Jun 2005 [Full text]



Student BMJ

Sepsis

The latest guidlines will affect how we practice medicine

www.student.bmj.com

Listen to the latest BMJ Interview