Selecting and Appraising Studies for a Systematic Review

  1. Maureen O. Meade, MD, FRCPC, MSc; and
  2. W. Scott Richardson, MD
  1. From the University of Toronto, Toronto, Ontario, Canada; and the University of Rochester School of Medicine and Dentistry, Rochester, New York. Acknowledgment: The authors thank the clinical reviewer, Norman J. Wilder. Requests for Reprints: Deborah Cook, MD, MSc, Department of Medicine, Division of Critical Care, St. Joseph's Hospital, 50 Charlton Avenue East, Hamilton, Ontario L8N 4A6, Canada. Current Author Addresses: Dr. Meade: The Wellesley-Central Hospital, Room 244, Jones Building, 150 Wellesley Street East, Toronto, Ontario M4Y 1J3, Canada.

    Abstract

    After thoroughly searching the potentially relevant literature for a systematic review, reviewers face the sequential tasks of selecting studies for inclusion and appraising these studies.Methodical, impartial, and reliable strategies are necessary for these two tasks because systematic reviews are retrospective exercises and are therefore prone to both bias and random error. To plan for study selection, reviewers begin with a focused clinical question and choose selection criteria that reflect this question. A detailed selection protocol that specifies the study designs and publication status of articles to be included is often helpful. Selection criteria are itemized on customized forms and are used to examine each potentially relevant primary study, usually by two different reviewers. In planning the critical appraisal of included studies, reviewers decide which clinical and methodologic study features require documentation. After choosing methods for evaluating study quality, reviewers construct customized appraisal forms and an explicit protocol for the actual evaluation. Some of the techniques commonly used to minimize the potential for error in study appraisal include duplicate, independent examination; blinding to study results and other identifying features of each article; and correspondence with study authors to clarify issues.

    Ultimately, primary studies should be selected, appraised, and reported in sufficient detail to allow readers to judge the applicability of the review to clinical practice and to clarify the strength of the inferences that can be drawn from the review.

    Systematic Review Series

    Series Editors:

    Cynthia Mulrow, MD, MSc

    Deborah Cook, MD, MSc

    The last article in this series outlined methods with which to search the literature for studies on the clinical question that generates a systematic review [1]. Herein, we discuss the subsequent steps of selecting and appraising studies for a review. Both of these steps involve important judgments that can influence the results of a review. In selecting studies, reviewers judge the relevance of the studies to the review question. In appraising studies, reviewers judge numerous features of design and analysis. Some of these judgments are easy to make; others are more difficult and prone to error.

    To be confident in their decisions, reviewers should use methods that are reliable (the results do not change if the procedure is repeated), impartial (not influenced by the study results), and explicit (unambiguous) [2]. These strategies for selection and appraisal are sensible, and they distinguish most systematic reviews from most narrative reviews. However, evidence to support the importance of some of the methods we suggest is either scant or conflicting; readers are referred to the original research on these approaches for more details.

    Selecting Studies for Systematic Reviews

    If reviewers perform a comprehensive search of the literature using the methods described previously in this series [1], they will probably have assembled a large sample of articles. This sample will include most (ideally, all) studies that are relevant to the review question (that is, the sensitivity of the search will be high). Inevitably, because such a wide net is cast, articles not pertinent to the clinical question will be retrieved (that is, the specificity of the search will be modest). Thus, the reviewers' next task is to sort through all of the potentially relevant articles and select those that will be included in the review. To do so, reviewers adopt several of the tactics listed in Table 1 and Table 2 for planning and executing the selection process (in effect, improving the specificity of the search); these tactics are described below.

    Table 1. Planning Study Selection
    Table 2. Strategies for Selecting and Appraising Studies

    Begin with a Well-Built Clinical Question

    Reviewers should ensure that the question for review includes the four elements of a well-built clinical question [3, 4]: the patients of interest, the main interventions under investigation, the comparison interventions, and the clinical outcomes of interest. By including these four elements, reviewers can better focus the selection process.

    Choose Selection Criteria That Fit the Clinical Question

    Consider a systematic review of the effectiveness of a drug treatment (for example, a proton-pump inhibitor) for patients with a particular disorder (such as esophageal reflux). Reviewers need to decide whether to include studies of patients with any symptoms of reflux, only those with “classic” symptoms, or only those in whom definitive diagnostic tests have confirmed the presence of reflux. In addition, reviewers might choose to include studies of patients with different comorbid conditions; patients from different demographic or geographic or cultural backgrounds; or patients from different health systems, such as inpatient or community populations.

    Similarly, reviewers should use selection criteria that reflect the main and comparison interventions of interest. In our esophageal reflux example, reviewers would need to decide whether to include studies of a particular drug or studies of all agents in that drug's class and whether to include studies of any dose and regimen or only studies with a specific regimen. For the comparison interventions, the reviewers would decide whether to include studies that compare the experimental drug with alternate treatments (such as antacids or histamine-2-receptor antagonists), with placebo, or with both.

    For the clinical outcomes, reviewers have analogous tasks of defining the outcomes and translating them into criteria. In our example, the reviewers would start by listing each clinical outcome (for example, whether the outcome was endoscopic or clinical and whether it focused on cure or persistence) and then decide whether to include studies that reported any outcome or only those with certain clinically important outcomes (such as improvement in symptoms at 1 year).

    After thoroughly considering each element of the review question, reviewers compile a set of explicit selection criteria. When these criteria are not explicit, the results of the review are more prone to error [5, 6]. Reporting the selection criteria used in a review is extremely important to readers because the criteria indicate the relevance of the review to the readers' clinical practice.

    Specify the Types of Study Design To Be Included

    After creating selection criteria that appropriately reflect the review question, reviewers should consider which study designs to include. Ideally, reviewers choose study designs that are most likely to produce valid results. For example, to answer questions about therapy or harm, reviewers may want to include randomized trials [7] because they provide more accurate estimates of benefit or harm than do cohort studies, case–control studies, and case series [8]. In reality, however, randomized trials may not be conducted to address questions of harm [9]. Therefore, reviewers need to consider which study designs are likely to be available to answer their question; this information may necessitate modification of originally conceptualized selection criteria to incorporate observational (nonexperimental) studies.

    Specify Criteria Related to Type and Form of Publication

    Reviewers also need to consider issues related to type and form of publication. Ideally, all of the relevant studies would be published as peer-reviewed journal articles. However, some completed studies may be published only as abstracts, in non-peer-reviewed form, or not at all. Reviewers decide whether to include these incompletely reported studies when planning their literature search. By including all articles in various stages of publication and subjecting them to rigorous critical appraisal, reviewers minimize the threat of publication bias (the preferential reporting of studies with positive results) [10-12], which could generate misleading reviews. Other studies may be reported more than once. To avoid over-representing duplicate studies in the review, investigators should plan to look for and exclude duplicate publications [13]. Finally, because studies may be published in different languages and because excluding studies published in different languages may bias the results of reviews [14, 15], articles should be included, as appropriate, regardless of the language of publication (translating as necessary). Limited time and resources, however, may preclude such an approach.

    Construct and Pretest Selection Forms

    After deciding on selection criteria, reviewers can prepare customized forms that contain checklists of the selection criteria (Figure 1). Using these forms can simplify the selection process, increase reliability, and provide a record of the judgments made about each study. After drafting form prototypes, reviewers “pretest” these forms for clarity, ease of application, and reliability. To pretest the forms, two or more independent reviewers typically apply them to a random sample of studies identified by the literature search. Reviewers compare their results to identify sources of ambiguity and then revise the forms accordingly. If the revisions are substantial, this process may need to be repeated before the forms can be used.

    Figure 1.
    View larger version:
      Figure 1. Example of a form that might be developed for the selection of studies for a systematic review evaluating the efficacy of β-blockers for secondary prevention of variceal bleeding.

      Write a Detailed Protocol

      Having a selection protocol as part of a larger protocol for the entire review helps reviewers in two ways. First, it provides a document that explicitly states the review question and the selection criteria, making the process accountable. Reviewers can later return to the protocol for guidance in resolving disagreements about article selection. Second, the selection protocol identifies what work will be done, by whom, in what manner, when, and for what reason; thus, it provides a mode of communication within the review team.

      When reviewers have a very large sample of studies from which to select, they can simplify this task by reviewing all of the titles, then the abstracts, and then the full articles, excluding studies that do not meet one or more selection criteria at each step. In doing so, reviewers should record (on the selection forms) the reasons for exclusion. After reviewers have selected studies for the systematic review, they will move to the next task of critical appraisal. This procedure also requires careful planning.

      Appraising Studies for Systematic Reviews

      Reviewers appraise the studies selected for review with three objectives in mind: 1) to understand the validity of the studies, 2) to uncover reasons for differences among study results other than chance, and 3) to provide readers with sufficient information with which to judge for themselves the applicability of the systematic review to their clinical practice. To achieve these goals, reviewers use the strategies outlined in Table 2 and Table 3 to carefully reexamine many important features of the primary studies.

      Table 3. Planning Study Appraisal

      Examine Important Clinical Features

      Although the selection criteria for a systematic review define the population, interventions, and outcomes of interest, the appraisal process involves a detailed assessment of the patients (for example, high, medium, or low risk), the study interventions (for example, frequency, degree, and duration), and the outcome measurements (for example, definitions and degree of surveillance) in each study. Variations among any of these design features may be an important source of variation among study results. Appraisals of primary studies that reveal differences among study protocols may direct subsequent subgroup analyses if the results are statistically combined.

      Consider, for example, a systematic review of the accuracy of persantine thallium scanning in predicting postoperative myocardial infarction in patients who undergo noncardiac vascular repair. Because the accuracy of the test may vary among patients with different degrees of risk [16], reviewers should document risk factors for myocardial infarction (for example, age; diabetes; hypertension; and history of congestive heart failure, unstable angina, or myocardial infarction) for the patients in each study. Similarly, reviewers should record the methods used to administer the test, including intravenous compared with oral administration of persantine, planar compared with single-photon emission computed tomography, and the time interval for delayed images. Finally, reviewers should record details related to outcome measurement, including precise definitions of myocardial infarction and the completeness and duration of follow-up.

      Examine the Quality of Study Methods

      The research methods used in the primary studies reflect the “quality” of the studies. In this sense, quality refers to the extent to which the study design, conduct, and analysis minimize the potential for bias. Biased primary studies are obviously more likely to provide misleading results and, by extension, will generate misleading systematic reviews. Therefore, reviewers should critically appraise the methods of all primary studies. High-quality studies use methods that are most likely to provide a true estimate of the benefit or harm of a treatment or exposure, the diagnostic accuracy of a test, or a particular prognosis. Current standards for evaluating the quality of studies on therapy, prevention, diagnosis, prognosis, and harm [4] are the basis of quality assessments in systematic reviews (Table 4).

      Table 4. Abridged Checklist for Evaluating the Quality of Study Methods for Various Study Designs*

      Three methodologic features have been empirically shown to influence the results of studies about therapy: randomization, concealment of randomization, and blinding [7, 17-19]. Random allocation to treatment means that investigators ensure that each patient entering the trial has an equal chance of getting into each treatment group. Concealment means that investigators are unaware of the treatment group to which a patient will be randomly assigned before the patient enters the trial. Conversely, blinding refers to the masking of caregivers, patients, and research personnel to treatment allocation after a patient has been entered into a trial. Nonrandomized studies [17, 18], unconcealed randomization [18, 19], and unblinded studies [7] all tend to overestimate the effectiveness of therapeutic interventions. A summary of anonymous reports from investigators describes the extreme measures that have been used to unconceal treatment group allocation [20]. Therefore, in randomized trials, the methods of randomization should be clearly documented so that readers and reviewers can evaluate this important methodologic feature. Unfortunately, the randomization process is inadequately described in most published reports [19, 21, 22].

      Reviewers may choose from among many techniques for assessing the methodologic quality of included studies. First, the simplest approach is to use a few components that itemize the key design features (Table 4). Second, reviewers may develop more comprehensive checklists. Although they are generally more complex, customized checklists are particularly helpful for documenting the quality of studies of diagnostic tests or prognosis. Third, reviewers may use quantitative scales that provide a summary score for the overall quality of individual studies. A recent review identified nine checklists and 25 different scales for assessing the quality of primary studies for systematic reviews, although only one checklist was rigorously developed [23]. Most quality assessment scales that have been published for general use do not include items that are known to influence the ability of the study to provide a true (unbiased) estimate of treatment effect. For example, one published scale includes an item related to the reporting of sample size calculation [24]. Neglecting to report the sample size calculation may reflect the omission of an a priori sample size determination or could reflect poor reporting or editing; however, these oversights do not result in a biased estimate of efficacy. When systematic reviews use scales for measuring methodologic quality, points are allotted for methodologic features that minimize bias (such as randomization and blinding) (Table 4). These points are summed to provide a numerical descriptor of overall quality: the methodologic quality score.

      Methodologic quality scores can be applied in many ways in systematic reviews [25]. For example, they may be used to evaluate the influence of quality on study results. One method is to graphically plot study results against methodologic quality scores to evaluate any association. This practice may be useful if the methods of individual studies vary widely. In addition, methodologic quality scores can guide qualitative and quantitative analyses. For example, reviewers may choose a cutoff score to select studies for pooling in secondary analyses. Ideally, the cutoff score should be chosen a priori on the basis of biological rationale rather than on the basis of study results. Similarly, quality scores may be used to perform “sensitivity” analyses, in which sub-samples of selected studies featuring specific design characteristics or threshold quality scores are statistically combined. Finally, reviewers may use these scores to perform weighted analyses, in which the relative weight of an individual study in a meta-analysis is determined by the magnitude of the methodologic quality score. The application of methodologic quality scores in systematic reviews must be considered in light of the scores' limitations. For instance, assigning relative values to specific study methods to generate a quality score is largely an arbitrary and unscientific process. Furthermore, the assumption of a direct and universal relation between study quality and results may be unfounded. When one group of investigators applied a particular quality scale to studies reviewed in seven meta-analyses published in the 1980s, no association between quality scores and study results was shown [26]. These techniques of data analysis will be discussed in detail in the next article in this series [27].

      Construct and Pretest Appraisal Forms

      Just as reviewers are well advised to construct standardized forms for the selection process, they should develop and test forms for appraising the studies selected for review. Consultation with clinical experts or methodologists help verify that the forms include a comprehensive list of important study features particular to the study design and clinical topic.

      Write a Detailed Protocol

      Just as reviewers find it useful to create a protocol for the selection process, they should develop an explicit protocol outlining the appraisal procedures.

      Strategies for Executing Study Selection and Appraisal

      Follow the Protocol and Record Progress

      After writing protocols for study selection and appraisal that are based on sound criteria and use well-built forms, reviewers are well prepared to follow the protocol and record their progress. As each article is reviewed, selection and appraisal forms will provide a record of the judgments made about each study. This information can be invaluable when reviewers prepare a log of excluded studies for publication or a Table that summarizes the methodologic quality of included studies.

      Review Each Study Independently and in Duplicate

      Even when following explicit criteria for selecting and appraising studies, reviewers can face difficult decisions. To minimize the potential for error in these judgments, we recommend that two or more investigators review each study independently. Reviewers often measure their inter-rater agreement across studies for each item on the selection and appraisal forms using a κ statistic (which measures agreement beyond the play of chance) [28, 29]. Although agreement between two reviewers does not guarantee accurate decisions, the higher the agreement among reviewers, the more confidence readers can have in the results of the review [30]. When reviewers disagree, the reasons for these disagreements should be explored. Often, reviewers can quickly clarify the source of the disagreement if they both refer to the article and the review protocol. An alternative is to enlist other collaborators in resolving disagreements, as specified in the protocol. Resolving disagreements by enlightened discussion is often preferred to voting by majority because the majority might be incorrect. For future reference, reviewers should record these disagreements and how they are resolved.

      Consider “Blinding” to Study Results

      Investigators can consider blinding reviewers to study results in order to make their judgments as impartial as possible. This task may be as simple as providing reviewers with just the Methods sections from the articles being considered. Unfortunately, relevant information is often dispersed throughout the Results and Discussion sections; therefore, a great deal of cutting and pasting may be required. Furthermore, if a reviewer recognizes the authors, their institutions, or the date or journal of publication, he or she may be influenced by prior knowledge of the study or its results. Electrically scanning journal pages and printing them after relevant identifiers have been eliminated are a modern approach to blinding in this context; however, this process can be extremely labor-intensive and requires a high level of judgment.

      One study showed that blinded quality assessment of primary studies produces significantly lower and more consistent scores than do open assessments [31]. However, a recent randomized trial evaluated study selection and data abstraction either by assessment that was blinded to author, institution, and journal or by open assessment. Blinding had neither a clinically nor a statistically significant effect on the summary odds ratio of these five meta-analyses [32]. Although blinding reviewers to study results and all identifying characteristics continues to be advocated in some circles, the theoretical benefits of these efforts must be weighed against the effort involved in blinding.

      Correspond with Authors To Confirm Study Characteristics

      Some articles may contain incomplete or confusing descriptions of the study methods, leading to incorrect selection decisions or critical appraisal. To overcome this problem, reviewers can ask for collaboration from the authors for clarification. Such correspondence and subsequent decisions of the review team should also be recorded for future reference.

      Conclusions

      The selection and appraisal of studies for a systematic review should use methodical, reliable, and impartial methods. Familiarity with the principles of critical appraisal is fundamental to these steps. Highlighting the differences in study methods through critical appraisal facilitates an evaluation of the inconsistencies among study results. Furthermore, the appraisal exercises help to establish the strength of inferences that can be drawn from the review. This exercise may also guide further research on the topic.

      After applying the strategies presented in this article to the fruits of a literature search, systematic reviewers are prepared to move on to the qualitative and quantitative synthesis of study results. The strategies for combining studies for systematic reviews, including statistical analyses, will be presented in the next article in this series [27].

      Key Points To Remember

      In selecting studies for review, investigators must judge the degree of each study's relevance to the clinical question and critically assess study design features

      To be confident in their decisions, reviewers need to use methods of study selection and appraisal that are reliable, impartial, and explicit

      When selecting from a large sample of studies, reviewers can simplify the task by first reviewing all of the titles, then the abstracts, and then the full articles, excluding studies at each step that do not meet one or more selection criteria

      By including all types of documents (for example, peer-reviewed publications, abstracts, unpublished reports) and subjecting them equally to rigorous critical appraisal, reviewers minimize the possibility of publication bias

      Reviewers must appraise the studies selected for a systematic review with three objectives in mind: to understand the rigor of the studies to be included, to uncover reasons for differences among study results, and to provide readers with sufficient information with which to judge the applicability of the review to their clinical practice

      Three methodologic features have been empirically shown to influence the results of studies about therapy: randomization, concealment of randomization, and blinding

      Dr. Richardson: Department of Medicine, University of Rochester School of Medicine and Dentistry, 1425 Portland Avenue, Rochester, NY 14621.

      References

      1. 1.
      2. 2.
      3. 3.
      4. 4.
      5. 5.
      6. 6.
      7. 7.
      8. 8.
      9. 9.
      10. 10.
      11. 11.
      12. 12.
      13. 13.
      14. 14.
      15. 15.
      16. 16.
      17. 17.
      18. 18.
      19. 19.
      20. 20.
      21. 21.
      22. 22.
      23. 23.
      24. 24.
      25. 25.
      26. 26.
      27. 27.
      28. 28.
      29. 29.
      30. 30.
      31. 31.
      32. 32.
      « Previous | Next Article »Table of Contents