Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
|
4. Measurement error and bias
Selection bias Information bias Another study looked at risk of hip osteoarthritis according to physical activity at work, cases being identified from records of admission to hospital for hip replacement. Here there was a possibility of bias because subjects with physically demanding jobs might be more handicapped by a given level of arthritis and therefore seek treatment more readily. Bias cannot usually be totally eliminated from epidemiological studies. The aim, therefore, must be to keep it to a minimum, to identify those biases that cannot be avoided, to assess their potential impact, and to take this into account when interpreting results. The motto of the epidemiologist could well be "dirty hands but a clean mind" (manus sordidae, mens pura).
Measurement error In practice, therefore, validity may have to be assessed indirectly. Two approaches are used commonly. A technique that has been simplified and standardised to make it suitable for use in surveys may be compared with the best conventional clinical assessment. A self administered psychiatric questionnaire, for instance, may be compared with the majority opinion of a psychiatric panel. Alternatively, a measurement may be validated by its ability to predict future illness. Validation by predictive ability may, however, require the study of many subjects.
Analysing validity
From this table four important statistics can be derived: Sensitivity - A sensitive test detects a high proportion of the true cases, and this quality is measured here by a/a + c. Specificity- A specific test has few false positives, and this quality is measured by d/b + d. Systematic error - For epidemiological rates it is particularly important for the test to give the right total count of cases. This is measured by the ratio of the total numbers positive to the survey and the reference tests, or (a + b)/(a + c). Predictive value-This is the proportion of positive test results that are truly positive. It is important in screening, and will be discussed further in Chapter 10. It should be noted that both systematic error and predictive value depend on the relative frequency of true positives and true negatives in the study sample (that is, on the prevalence of the disease or exposure that is being measured).
Sensitive or specific? A matter of choice By choosing the right test and cut off points it may be possible to get the balance of sensitivity and specificity that is best for a particular study. In a survey to establish prevalence this might be when false positives balance false negatives. In a study to compare rates in different populations the absolute rates are less important, the primary concern being to avoid systematic bias in the comparisons: a specific test may well be preferred, even at the price of some loss of sensitivity.
Repeatability Repeatability can be tested within observers (that is, the same observer performing the measurement on two separate occasions) and also between observers (comparing measurements made by different observers on the same subject or specimen). Assessment of repeatability may be built into a study - a sample of people undergoing a second examination or a sample of radiographs, blood samples, and so on being tested in duplicate. Even a small sample is valuable, provided that (1) it is representative and (2) the duplicate tests are genuinely independent. If testing is done "off line" (perhaps as part of a pilot study) then particular care is needed to ensure that subjects, observers, and operating conditions are all adequately representative of the main study. It is much easier to test repeatability when material can be transported and stored - for example, deep frozen plasma samples, histological sections, and all kinds of tracings and photographs. However, such tests may exclude an important source of observer variation - namely the techniques of obtaining samples and records
Reasons for variation in replicate measurements Within observer variation - Discovering one's own inconsistency can be traumatic; it highlights a lack of clear criteria of measurement and interpretation, particularly in dealing with the grey area between "normal" and "abnormal". It is largely random-that is, unpredictable in direction. Between observer variation - This includes the first component (the instability of individual observers), but adds to it an extra and systematiccomponent due to individual differences in techniques and criteria. Unfortunately, this may be large in relation to the real difference between groups that it is hoped to identify. It may be possible to avoid this problem, either by using a single observer or, if material is transportable, by forwarding it all for central examination. Alternatively, the bias within a survey may be neutralised by random allocation of subjects to observers. Each observer should be identified by a code number on the survey record; analysis of results by observer will then indicate any major problems, and perhaps permit some statistical correction for the bias. Random subject variation -When measured repeatedly in the same person, physiological variables like blood pressure tend to show a roughly normal distribution around the subject's mean. Nevertheless, surveys usually have to make do with a single measurement, and the imprecision will not be noticed unless the extent of subject variation has been studied. Random subject variation has some important implications for screening and also in clinical practice, when people with extreme initial values are recalled. Thanks to a statistical quirk this group then seems to improve because its members include some whose mean value is normal but who by chance had higher values at first examination: on average, their follow up values necessarily tend to fall ( regression to the mean). The size of this effect depends on the amount of random subject variation. Misinterpretation can be avoided by repeat examinations to establish an adequate baseline, or (in an intervention study) by including a control group. Biased (systematic) subject variation -Blood pressure is much influenced by the temperature of the examination room, as well as by less readily standardised emotional factors. Surveys to detect diabetes find a much higher prevalence in the afternoon than in the morning; and the standard bronchitis questionnaire possibly elicits more positive responses in winter than in summer. Thus conditions and timing of an investigation may have a major effect on an individual's true state and on his or her responses. As far as possible, studies should be designed to control for this - for example, by testing for diabetes at one time of day. Alternatively, a variable such as room temperature can be measured and allowed for in the analysis.
Analysing repeatability For qualitative attributes, such as clinical symptoms and signs, the results are first set out as a contingency table:
The overall level of agreement could be represented by the proportion
of the total in cells a and d. This measure unfortunately turns
out to depend more on the prevalence of the condition than on
the repeatability of the method. This is because in practice it
is easy to agree on a straightforward negative; disagreements
depend on the prevalence of the difficult borderline cases. Instead,
therefore, repeatability is usually summarised by the
Return to main page Go to previous page Go to next page | |||||||||||||||||||||||||||||||||||||||
What can you learn from this BMJ paper? Read Leanne Tite's Paper+