|Articles|November 1, 2010

Evidence-based medicine: how to critically appraise the literature (Proceedings)

Author(s)Noah D. Cohen, VMD, MPH, PhD, DACVIM (large animal)

Critical appraisal of reports entails 3 fundamental steps: 1) determining if study results are valid; 2) assessing the clinical importance of study findings; and, 3) assessing if the results of valid, clinically important studies are relevant to our patients. A foundation for applying these 3 steps is a hierarchy of study types for EBM, which places a premium on those that are patient-based. Clinical importance has many interpretations, but in terms of quantification it is best assessed in reference to the magnitude of the observed association(s) in a study.

A preceding presentation served to introduce the topic of evidence-based medicine (EBM). For practicing EBM, critically appraising results of searching the literature for evidence is arguably the most important step. This presentation will introduce the principles for critically appraising patient-based reports .

Critical appraisal of reports entails 3 fundamental steps: 1) determining if study results are valid; 2) assessing the clinical importance of study findings; and, 3) assessing if the results of valid, clinically important studies are relevant to our patients. A foundation for applying these 3 steps is a hierarchy of study types for EBM, which places a premium on those that are patient-based. Clinical importance has many interpretations, but in terms of quantification it is best assessed in reference to the magnitude of the observed association(s) in a study.

Assessing validity

In studies of patients, one generally estimates a measure of association (such as the odds ratio [OR] or relative risk [RR]) or another parameter (such as the cumulative incidence of disease). Results of a study are valid when the observed or estimated parameter is the same as the true/actual value. The term bias refers to a systematic error in the study relating to its design, data collection methods, or data analysis.1 Such systematic error is distinct from the random error that results from the imprecision of the device(s) used for collecting data. Biases in epidemiological/patient-based studies fall into 3 categories: selection bias, information bias, and confounding bias.1

Criteria that are helpful for evaluating data include the study design and the type of question being addressed (diagnosis, treatment, prognosis, or harm). Because the type of question being asked is most relevant to clinicians, we will focus on appraisal of the literature based on the primary clinical activities with which we are engaged in clinical practice: 1) choosing and interpreting diagnostic tests; 2) selecting treatments/interventions; and, 3) making prognoses. The types of evidence we use varies somewhat by each of these clinical activity.

Diagnostic tests:

When we appraise an article that relates to a diagnostic test, there are 3 critical aspects to evaluate: 1) the spectrum of disease represented by the patients studied; 2) if the "gold standard" test was applied irrespective of the results of the diagnostic test being evaluated; and, 3) whether the "gold standard" was measured independently of the other test.2,3

It is common for studies of the performance of diagnostic tests to be assessed using severe forms of disease (e.g., necropsy-confirmed cases of sepsis) and horses free of signs of disease. Although use of such case-control studies is useful for initial evaluation of tests, this design is of limited value with respect to clinical application. Evaluation of diagnostic tests must encompass the full spectrum of disease to which the test will be applied; thus, patients must be included with milder as well as florid forms of the disease, in early as well as late stages of disease, and among both treated and untreated patients. Case-control studies are generally weak sources of evidence for evaluating diagnostic tests. Prospectively designed studies of consecutively enrolled patients who undergo pre-specified diagnostic testing criteria against a reference standard that is consistently applied are the best sources of evidence for evaluating diagnostic tests. Studies of non-consecutive patients provides weaker evidence because there is potential for bias in the selection of cases that are included.

When a patient has a negative test, investigators may be tempted to forego testing with the reference standard, especially when the latter is more invasive. For example, consider a study to evaluate the diagnostic sensitivity and specificity of thoracic ultrasound for detecting subclinical Rhodococcus equi pneumonia using foals at a farm with endemic R. equi pneumonia. One might not want to perform tracheobronchial aspiration to obtain a sample for microbiologic culture and cytologic evaluation in foals from the farm that appear healthy and whose thoracic ultrasound findings are normal. But failure to perform such testing introduces a bias that is an important limitation.

The diagnostic test and the reference standard should be assessed independently, and it is best that the results of the diagnostic tests be unknown to those conducting testing of the reference standard (and vice versa): readers should be wary of studies in which the same individuals are performing both/all tests, particularly when both/any of the tests has categorical outcomes that are subjective levels (e.g., absent, mild, moderate, or severe; or, negative, weak positive, moderate positive, strong positive). Caution is urged in evaluating results of studies in which the reference standard relies on expert opinion, such as the interpretation of biopsy results.

A good understanding of the principles of sensitivity, specificity, predictive values (positive and negative), and likelihood ratios (positive and negative) is essential for interpreting test results. This is because if the study design is valid, we need to be sure that the test can accurately distinguish patients with and without the disease. Although review of these topics is beyond the scope of this report, they are considered in many other places.1-3

Treatment/interventions: The reference standard for an individual study for evaluating treatment is the randomized clinical trial (RCT).2-4 Just because a study is an RCT, however, does not mean that the results should be reflexively accepted as valid. As with other studies designs, bias can occur in RCTs. There are published criteria for evaluating RCTs, and a full review of this topic is beyond the scope of this presentation. Briefly, individual RCTs should be evaluated for the following: 1) description of the randomization process; 2) whether allocation of treatment was concealed from those managing and evaluating patients; 3) the extent to which the study groups were similar at the time the study is initiated; 4) the extent to which patients were followed is important; 5) the duration of follow-up; 6) whether patients were followed up and monitored similarly irrespective of treatment group assignment; 7) whether data were analyzed according to the group to which patients were originally assigned (even if we know that the patient was inadvertently assigned another treatment or failed to take their treatment); 8) whether those administering treatment and monitoring patients were blind to which treatment was being given to each patient. There are numerous resources available that can help with assessing RCTs, including the Consolidated Standards for Reporting Clinical Trials (CONSORT; www.consort-statement.org).

Unfortunately, valid RCTs are rarely available for most of the treatments we use in equine medicine. Consequently, we rely heavily on weaker forms of evidence. In the absence of RCTs, we should try whenever possible to use evidence from well designed cohort studies. As for RCTs, there are check-lists available to aid in assessing cohort studies (e.g., www.strobe-statement.org). Except for the absence of randomization, the principles of evaluating cohort studies for treatment should be the same as those for RCTs, including assessment of validity, clinical importance of effects, and the extent to which the results apply to our patients. Case-control studies should be viewed as weaker sources of evidence for evaluating therapies because they are more subject to biases. We should view case series and individual case reports as no more than preliminary and particularly subject to being misleading with respect to treatment effects.

Prognosis:

Cohort studies are considered the best design for assessing prognosis, although the case-control design may be useful for rare diseases or disorders for which follow-up must be very long. It is important that patients in cohort studies for prognosis be included relatively early in the disease process, to avoid missing out on more severely affected patients that might die before being included. Evaluating the methods by which individuals were selected for inclusion in as study is critical for assessing the potential for selection bias. As for clinical trials, it is important for cohort studies that the follow-up procedures are consistent among groups, that losses to follow-up are not excessive, that events of interest aren't missed, and that the length of the study period is appropriate for the disease of interest. Evidence that those lost to follow-up were compared with the baseline population and that the groups appeared similar enhances the validity of a cohort study. As a guideline, the validity of study results should be interpreted with considerable caution when more than 20% of a cohort is lost to follow-up. Although assessing death is relatively objective, defining the cause of death or other outcomes for prognosis (e.g., failure to return to racing, infertility) may be more subjective and less accurately determined. Prognostic studies in which clear definitions for objective assessments of outcomes are specified represent a better form of evidence than those that lack these definitions. For all study designs, the extent to which the patients studied are similar to the horse(s) to which we wish to apply the results needs to be assessed. Generally, there will be considerable heterogeneity among study populations making it very difficult to have great confidence in results from most studies that we read.

Assessing clinical importance

If a study is determined to be valid, one must then assess the clinical importance of the study. Clinical importance is generally assessed with respect to the magnitude of the clinical effect/association that has been quantified. A number of measures of the strength of effect/association exist, such as relative risks, odds ratios, relative risk reduction, and risk differences. Another clinically useful measure of effect is the number needed to treat (NNT). The NNT is the number of patients that need to be treated during the study period in order to prevent 1 additional case of the disease.2 The NNT is the inverse of the absolute value of the absolute risk difference. For example, in an RCT of azithromycin for preventing R. equi pneumonia, the absolute risk difference through weaning between foals receiving azithromycin and those that did not was 16%, such that the estimated NNT was 6.25 (1/0.16).6 Thus, it was estimated that 7 foals needed to be treated with azithromycin to prevent 1 new case of R. equi pneumonia developing from birth to weaning. To reflect the effects of random error inherent in measuring outcomes, confidence intervals should be calculated for all measures of clinical importance. For the R. equi prevention example, the 95% confidence interval for the estimated NNT was 4 to 12: thus, we are 95% confident that the true NNT for the period from birth to weaning for chemoprophylactic administration of azithromycin was between 4 and 12 foals. It is not possible to specify a level for a measure of association (such as an OR, NNT, etc.) that is clinically important for all circumstances: one can have a large OR for a rare disease, such that the clinical impact may not be that important, or one can have a relatively modest OR that is clinically important because the disease or exposure of interest are relatively common. All treatments have the potential for adverse effects. The number needed to cause harm (NNH) can also be estimated as an indicator of the magnitude of potential for adverse effects in RCTs.2 The NNH is calculated as the inverse of the absolute value of the absolute difference in risk of harm. For example, if the absolute risk difference of diarrhea in the aforementioned azithromycin trial was 1%, we would estimate that the NNT was 100 (or that 100 foals would have to be treated for 1 foal to develop diarrhea from birth through weaning as a result of treatment).

Assessing relevance of results to one's patient(s)

If the results of a study are valid and clinically important, we should then assess the extent to which the patients studied are similar to the horse(s) to which we wish to apply the results of our EBM efforts. Generally, there will be considerable heterogeneity among study populations making it very difficult to have great confidence in results from most studies that we read. Moreover, there are no formal guidelines or measures for implementing this aspect of assessing clinical importance.

Summary

The most important step of applying EBM to practice is likely to be critical appraisal of the evidence we have gathered. Understanding principles of assessing validity, quantifying clinical results, and determining relevance to the patient(s) of interest represent the 3 steps for critical appraisal in the EBM context.

References

Saville WJ and Wittum Thomas E. Veterinary epidemiology. In Reed SM, Bayly WM, Sellon DC, editors: Equine internal medicine, 2nd ed, St. Louis, 2004, Saunders.

Straus SE, Richardson WS, Glasziou P, Haynes RB: Evidence-based medicine, 3rd ed, Edinburgh, 2005, Elsevier Churchill Livingstone.

Haynes RB, Sackett DL, Guyatt GH, et al. Clinical epidemiology: how to do clinical practice research. 3rd ed, Philadelphia, 2006, Lippincott, Williams & Wilkins.

Guyatt G, Rennie D: Users' guide to the medical literature. Essentials of evidence-based clinical practice, Chicago, 2002, AMA Press.

Holmes MA. How to start practicing evidence-based veterinary medicine: a practical guide for over-worked practitioners, in Proceedings. Am Assoc Equine Pract 2008;54:327-335.

Chaffin MK, Cohen ND, Martens RJ. Chemoprophylactic effects of azithromycin against Rhodococcus equi-induced pneumonia among foals at equine breeding farms with endemic infections. J Am Vet Med Assoc 2008;232:1-35-1047.

From exam room tips to practice management insights, get trusted veterinary news delivered straight to your inbox—subscribe to dvm360.

Subscribe Now!

Evidence-based medicine: how to critically appraise the literature (Proceedings)

Newsletter

Related Content

Foraminotomy now available at UC Davis

Nanotechnology treatment for equine OA may soon be available

The challenges of equine foot lameness

Equine herpesvirus poses threat to horses in multiple states

Biometric sensors may help identify the few Thoroughbreds at highest risk for fatal musculoskeletal injury

Trending on dvm360

Top 10 dvm360 podcasts of 2025

Paws and profits: Small Door Veterinary appoints new chief financial officer, and other updates

3 Must-reads for New Year's Eve

The 2025 year in review

Tarter control toothpaste receives VOHC Seal of Acceptance