Locating and Appraising Systematic Reviews

Abstract

In this article, we describe the strengths and weaknesses of several methods of locating systematic reviews, including electronic databases such as MEDLINE, Best Evidence (the electronic version of ACP Journal Club and Evidence-Based Medicine), and the Cochrane Library (a regularly updated source of reviews and controlled trials produced by the Cochrane Collaboration). We also present steps that can be used to critically appraise review articles; as an example, we use a systematic review that evaluates the gastrointestinal toxicity of various nonsteroidal anti-inflammatory drugs in the context of a clinical scenario.

Systematic Review Series

Series Editors: Cynthia Mulrow, MD, MSc, Deborah Cook, MD, MSc.

From McMaster University, Hamilton, Ontario, Canada

The second article in this series on systematic reviews has two purposes: to describe tools and techniques that can help locate systematic reviews effectively and efficiently and to suggest a method of critically appraising the methodologic quality of these reviews. The latter is a necessary step in determining whether the results of a systematic review should be used in practice and, if so, how they should be used.

Clinical Scenario

Your patient is a 65-year-old man who has painful osteoarthritis in both knees and no other major medical conditions. Although he can still carry out his activities of daily living, he has limited mobility and reports pain at rest. You are now reviewing his history and current care with him. You had previously prescribed acetaminophen, 4 g/d, which provided minimal pain relief. The patient is eager to try a different medication. You mention that nonsteroidal anti-inflammatory drugs (NSAIDs) are generally not associated with improved analgesia compared with acetaminophen [1], but the patient still wants to try an alternate medication. You agree to offer him short-term NSAID therapy but are not sure which agent has the lowest rate of serious gastrointestinal complications, such as hemorrhage. You suspect that many original studies have been published that discuss the risks of different NSAIDs, but you would like to have a succinct and accurate summary of the study results rather than having to do all of the searching, selecting, and synthesizing yourself. Because this question is important to your patient and common in your practice, you proceed to look for a systematic review.

Locating Systematic Reviews

Internists have several valuable sources of systematic reviews: MEDLINE and other electronic databases, journals, Best Evidence (the electronic version of ACP Journal Club and Evidence-Based Medicine), and the Cochrane Library. Each resource has advantages and disadvantages.

Electronic Databases

The largest and most readily available tool for locating systematic reviews is MEDLINE, a multipurpose database produced by the U.S. National Library of Medicine. In MEDLINE and related databases, the National Library of Medicine indexes important biomedical literature from more than 4000 journals. The MEDLINE database has more than 7 000 000 citations that date back to 1966; 5 000 000 of these citations deal with humans. One tenth of the citations are indexed as review articles, but only a small fraction of these review articles are systematic reviews.

Because of the size and complexity of MEDLINE, searching this database for systematic reviews requires careful planning and an understanding of the terms and phrases used to describe systematic reviews (which form the basis of your search strategy). They include the adjectives “quantitative,” “methodological,” and “systematic” to describe either “reviews” or “overviews.” Another phrase, less commonly used, is “review articles with a methods section.” “Meta-analysis” has been spelled in various ways (meta-analysis, metaanalysis, metaanalyses, meta-analyses, meta analysis, meta analyses).

To facilitate searching, you need to be aware of how indexers classify and index systematic reviews and meta-analyses. The indexers at the National Library of Medicine recognize meta-analyses and index them using the Medical Subject Heading (MeSH) “meta-analysis (MeSH)” and the publication type (pt) “meta-analysis (pt).” They do not, however, recognize systematic reviews as different from traditional review articles. All review articles (systematic or otherwise) are indexed with the publication type “review (pt).” One way to identify the systematic reviews is to limit review articles to those that include the term “MEDLINE” in their abstract. To do so, the search terms “review (pt) AND MEDLINE (textword)” are used. “MEDLINE” is included here because most clinical systematic reviews include a description of how the component original studies were identified and because the term “MEDLINE” is often included in the abstract.

By using the preceding list of terms and phrases, we can create a search strategy to identify systematic reviews that are indexed in MEDLINE. Most MEDLINE access systems allow search strategies to be stored for easier searching in the future. Research efforts by members of the Cochrane Collaboration are currently under way to establish the most sensitive and specific search strategies for locating systematic reviews for questions about therapy. These strategies will complement those that have been developed to locate primary studies on therapy, diagnosis, cause, and prognosis [2]. Until this work has been completed, the following two search strategies (one simple approach and one more complex approach) are useful. The second strategy identifies many of the systematic reviews that are indexed in MEDLINE.

The simple search consists of the following steps:

1. meta-analysis (pt)

2. meta-anal: (textword) [see the appendix for explanation of the symbol “:” and other MEDLINE searching functions]

3. review (pt) AND medline (textword)

1 OR 2 OR 3

The comprehensive search consists of the following steps:

1. meta-analysis (pt)

2. meta-anal: (textword)

3. meta-anal: (textword)

4. quantitative: review: OR quantitative: overview: (textword)

5. systematic: review: OR systematic: overview: (textword)

6. methodologic: review: OR methodologic: overview: (textword)

7. review (pt) AND medline (textword)

1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7

Next, content terms are added to narrow the search to our clinical topic. For this search, we need to include terms for NSAIDs, adverse effects, and gastrointestinal complications. Articles on NSAIDs are indexed under the MeSH term “anti-inflammatory agents, non-steroidal”; this term is used for the family of drugs and for individual drugs. The National Library of Medicine recognizes 38 NSAIDs, from aminopyrine to tolmetin. We want to search on any NSAID, so we ask MEDLINE to “explode” the phrase. We then specify that these drugs must be the main topic of the article (this is done by “starring” or “majoring,” depending on the search system you use). We only want articles that look at side effects (adverse effects) of the NSAIDs, and thus we stipulate this criterion. We then use the “AND” command to cross this search for articles on NSAIDs with the search for systematic reviews. This combined search strategy yields seven citations, published in English from 1992 to the present; two seem to be exactly on the topic of interest [3, 4]. The other five address mucosal protective agents, economics, effects of NSAIDs on blood pressure, methodologic issues, and a case report that includes a review of the literature.

After retrieving the two potentially relevant articles, we find that the paper by Carson and Willett [4] examines the toxic effects of NSAIDs as a group, whereas the paper by Henry and colleagues [3] addresses our clinical question of which NSAID is associated with the fewest gastrointestinal side effects.

The European “MEDLINE” is EMBASE, the electronic version of Excerpta Medica. This database has a strong European content and little overlap with MEDLINE in terms of the journals covered. New publications are included more quickly in EMBASE than in MEDLINE. The EMBASE database places special emphasis on physical and occupational therapy, biology, drug research, psychiatry, health policy, and alternative medicine. The database is produced in the Netherlands by Elsevier, a commercial company. User costs are higher than those for MEDLINE, and few clinicians outside Europe have ready access to it. Librarians, however, can often provide EMBASE searches.

The EMBASE search for our scenario (done using a strategy and content terms similar to those used in the comprehensive MEDLINE search) retrieved 30 citations and cost $60. Several citations were unique and interesting, but none appeared to address our question any better than those that we had already identified through MEDLINE.

Journals

Most major medical journals publish systematic reviews. Using the comprehensive MEDLINE search strategy described earlier, we identified 117 citations in Annals of Internal Medicine from 1992 to June 1996 that may represent systematic reviews. In the same period, JAMA published 106 reviews, BMJ published 97, Archives of Internal Medicine published 40, and The New England Journal of Medicine published 21. Because so few systematic reviews are published in each issue, reading journals is not necessarily a high-yield source of systematic reviews for clinical problem solving. However, finding systematic reviews while browsing through journals can obviously help keep clinicians up to date.

Best Evidence

A new resource called Best Evidence, produced by the American College of Physicians, can be used to efficiently identify systematic reviews on clinical topics of interest to internists. Best Evidence is the electronic version of both ACP Journal Club and Evidence-Based Medicine. These publications contain structured abstracts of and expert commentary on high-quality, clinically important studies from more than 75 medical journals [5]. Each article must meet certain minimum methodologic quality standards. For example, studies on therapy must have used random allocation to the comparison groups, have had at least 80% follow-up, and have measured a clinically important outcome. This means that articles on therapy abstracted in Best Evidence are likely to be valid and relevant to patient care [6, 7]. To be included in Best Evidence, review articles must address a specific clinical question and describe how potentially relevant primary studies were identified and either included or excluded. All review articles in Best Evidence (approximately 10% to 20% of the current total of more than 1000 articles) are systematic reviews rather than narrative reviews. Most of them contain the term “meta-analysis” or “review” in their short title.

Returning to our initial scenario, we search Best Evidence using the terms “NSAID” and “gastrointestinal” and retrieve nine citations. Two are systematic reviews that look potentially useful; one of the two is the review by Henry and colleagues [3]. Best Evidence is easy to use, but it may not include a systematic review if it was recently published or was not published in the journals that are scanned for ACP Journal Club and Evidence-Based Medicine.

Cochrane Library

A quick way to identify systematic reviews for therapeutic issues is to use the Cochrane Library, produced by the Cochrane Collaboration [8, 9]. The diskette and CD-ROM versions of the Cochrane Library are updated quarterly, and an Internet version is currently being developed. The Library has four sections: the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effectiveness (DARE), the Cochrane Controlled Trials Registry, and the Cochrane Review Methodology Database.

These systematic reviews cover many areas of health care (including consumer concerns) and are often more thorough reports of systematic reviews that have been published elsewhere in limited form. The Cochrane Database of Systematic Reviews and DARE are the sections of the Library that are most useful to clinicians interested in identifying systematic reviews. Version 3 of the Cochrane Database of Systematic Reviews (updated in November 1996) contains 141 systematic reviews that were done under the auspices of the Collaboration. In addition, the authors of the reviews are committed to updating the reviews as new information becomes available. The reviews include listings of excluded trials and the reasons for exclusion, information that most traditional systematic reviews do not report. Produced by the National Health Services Centre for Reviews and Dissemination (located at the University of York, United Kingdom), DARE contains citations to 1422 non-Cochrane systematic reviews along with structured abstracts of many of the reviews.

To address the question raised by the patient in our scenario, we search the November 1996 Cochrane Library by using the term “nonsteroidal.” This identifies one protocol in the Cochrane Database of Systematic Reviews and six systematic reviews in DARE. One systematic review seems potentially relevant to our question but is different from the two identified by the MEDLINE search.

The Cochrane Library is a quick and valuable resource for locating systematic reviews, but it has some limitations. The first is its modest size; however, the number of reviews is increasing as more systematic reviews are published. The second limitation is that searching can be difficult, especially when complex search strategies are used. This area, however, will be improved in future releases. The third limitation is that few clinicians have access to the Cochrane Library. Increasing subscriptions and the Internet version (http://www.medlib.com) will help to rectify this situation.

Assessing the Quality of a Systematic Review

The article by Henry and colleagues [3] may answer our question about which NSAID is associated with the fewest serious gastrointestinal complications. However, the strength of inference we can draw from the review depends on the review methods used. Assessment of the validity of a review article requires evaluation of each step in the review process before consideration of the results and how they might apply to our patient.

Oxman and colleagues [10] have proposed one set of simple criteria for evaluating systematic reviews that builds on criteria published in a validated index for the assessment of the quality of review articles [11, 12]. This index includes questions on the reporting of the adequacy of search methods, comprehensiveness of the search, inclusion criteria, assessment of selection bias, documentation and appropriateness of the validity criteria used, reporting of methods used to combine study results, appropriateness of pooling of studies, the extent to which the report's conclusions were supported by the data, and a global assessment of scientific quality. In the following analysis, we consider eight major determinants of the quality of the review and examine how they apply to the systematic review by Henry and colleagues [3].

1. Did the review article address a focused question? Henry and colleagues did not examine all complications associated with NSAID use in any setting. They did, however, define their research question-to evaluate different NSAIDs and focus on the association between these drugs and peptic ulcer complications that required hospitalization.

2. Is it likely that important, relevant studies were missed? Our confidence in the results of a review is greater when we are certain that no relevant and high-quality studies, either published or unpublished, were missed. A comprehensive search for unpublished work may be important in some situations (for example, evaluation of new technologies, an area in which much of the data may not be published) if the data are amenable to the same careful assessment of quality as the published work. Resource constraints may also limit search strategies. Assessing the comprehensiveness of the search obviously requires that the authors of reviews explicitly report their methods.

Henry and colleagues used a CD-ROM system to search MEDLINE for articles published between 1985 and 1994, but they did not describe their exact search strategy. They also examined the bibliographies of two published reviews and contacted authors of relevant studies, asking them to identify additional research. Although Henry and colleagues could have searched additional databases such as EMBASE or hand-searched selected journals, their approach was reasonable.

3. Were the inclusion criteria used to select articles appropriate? These criteria may vary according to the population studied, interventions or exposures, outcomes, and methods of each study. Henry and colleagues clearly stated that cohort and case–control studies were selected if the patients had been living in the community, had been taking NSAIDs, and had been hospitalized for gastrointestinal hemorrhage or perforation. They stated which studies were included and why; they also presented their rationale and made their list of excluded trials available upon request.

4. Was the validity of the included studies assessed? Although the conclusions we derive from a systematic review depend in large part on the rigor of the review methods, they obviously also depend on the quality of the included studies. The appropriate criteria for this assessment of quality depend on the type of studies included in the review [10]. For example, if the systematic review deals with treatment, it is important to ascertain whether the trials were randomized; whether the randomization process was concealed from patients or investigators; whether patients, caregivers, or persons assessing outcome were blinded to the treatment allocation; and the extent to which follow-up was complete. For systematic reviews that address questions of harm, the most important considerations include documentation of the similarity of the comparison groups and the methods used to establish that patients had the exposure and outcome of interest [13]. Duration of follow-up is also important if a cohort design was used.

In their article on the relative risk for gastrointestinal complications with different NSAIDs, Henry and colleagues state that they evaluated each of the factors mentioned in the preceding paragraph but did not report them in the article. The summary tables, however, are available on request from the authors.

5. Were the assessments of studies reproducible? Even when explicit criteria are used to include studies in a review and evaluate their methodologic quality, the judgment of the review's authors is still required. If the authors did each of the review steps independently and in duplicate and then reported their level of agreement, we can assess how open to judgment each of these steps was. Agreement beyond that expected by chance is often reported using the κ statistic [14], which ranges from 0 to 1. The closer the value is to 1, the greater the level of agreement. Henry and colleagues reported that they extracted data in duplicate and resolved differences by consensus. They did not assess the eligibility or quality of the articles in duplicate.

6. Were the results similar from study to study? Synthesizing the results of studies (whether qualitative or quantitative) requires assessing the similarity of the studies to each other. This means that the patients, exposures or interventions, outcomes, and other features of study design must be considered. Pooling the results of several studies is not appropriate if the studies differ in a clinically important fashion with regard to any of these design elements. If, on the other hand, all the studies appear similar after this initial assessment, it is then important to evaluate whether the results of the studies were similar. If studies have different findings, pooling results may lead to meaningless or even misleading results. Such variability in results often suggests that the trials may have differed in some important way, more than initially seemed to be the case; the sources of the differences then become the appropriate focus of interest.

How can we determine whether the results of trials included in a meta-analysis are similar? The size of the treatment effect (and its CI) from each trial can be graphed. If the magnitude or direction of the effect sizes differs greatly among studies and if the CIs do not substantially overlap, one could question whether it is appropriate to pool the results.

Another common approach is to use a statistical test to ascertain whether the study results differ more than would be expected by chance. If the studies measure approximately the same effect and any differences occur because of chance (that is, if the results are consistent with a common effect size), the test for homogeneity (sometimes, unfortunately, called the test of heterogeneity) is not significant (usually reported as P > 0.05). A significant test result means that the difference in results among the individual studies is not likely to have been caused by chance. This calls into question whether it is appropriate to pool the results; it may also suggest that a priori subgroup analyses may be appropriate. However, when the results of large trials are pooled, the test for homogeneity may indicate that statistically significant (but perhaps clinically unimportant) differences exist in the results. In this situation, it may still be reasonable to pool the results statistically.

Henry and colleagues established that the results of their included studies were consistent. They calculated the risk for gastrointestinal complications associated with each NSAID relative to the risk associated with ibuprofen and then tested whether the relative risk for each drug was consistent across the studies. Table 1, originally published in the systematic review by Henry and colleagues, shows the relative risk, CIs, and P values for each of these tests for consistency (homogeneity). Each P value is greater than 0.05.

Table 1. Risk for Gastrointestinal Complications in Patients Receiving Nonsteroidal Anti-Inflammatory Drugs*

7. What are the overall results and how precise are they? We have considered the key methodologic questions to be asked when appraising a review article and believe that the methods used by Henry and colleagues are satisfactory. Because a future article in the systematic review series will focus on measures of effect, we only briefly address this issue here.

Henry and colleagues identified 12 studies that were relevant and met their inclusion criteria. They then abstracted the data in duplicate, calculated the relative risks associated with each NSAID, and pooled the relative risk estimates. They found that each NSAID was associated with a higher risk for gastrointestinal complications than was ibuprofen and ranked the drugs in order of increasing size of risk (ranging from 1.6 for fenoprofen to 9.2 for azapropazone). The authors also calculated CIs around the pooled estimates. All NSAIDs except fenoprofen were associated with an increased risk for serious gastrointestinal hemorrhage compared with ibuprofen.

8. Will the results help in caring for patients? Determining this involves asking several questions: Can I apply the results to my patients? Did the studies consider all the clinically important outcomes? Are the benefits worth any associated risks or costs?

It is important to consider the patients in the individual studies and to ascertain whether your patient is similar with regard to age, comorbid conditions, or other risk factors (such as smoking and family history). Does he or she have a comparable baseline risk for the outcome of interest, or is the risk higher or lower in a clinically meaningful way? A systematic review that finds that a new treatment delays death but that does not address any of the potential adverse events associated with use of the treatment may prompt us to seek additional information from other sources or to refer back to some of the more detailed original articles. We would want to discuss these issues with our patient (or we may choose not to offer the intervention in the first place).

We decide that the review by Henry and colleagues is rigorous, the results are convincing, and the patient in our clinical scenario is similar to the study patients (all of whom were living in a community setting before hospitalization). Because this patient with severe osteoarthritis insists on trying a medication other than acetaminophen, you prescribe ibuprofen on the basis of the systematic review and then follow him to assess his response.

Summary

Several methods can be used to identify systematic reviews. These include bibliographic databases, such as MEDLINE, Best Evidence, and the Cochrane Library. In the future, the Cochrane Library could become the source of choice for systematic reviews because it provides the full text of Cochrane reviews and citations to many other systematic reviews. Moreover, the Library is growing rapidly and becoming more readily available, and its searching capabilities are being improved with each update. Although Best Evidence contains fewer systematic reviews than the Cochrane Library, it is specifically designed for practicing internists and primary care physicians and includes systematic reviews on diagnosis, cause, prognosis, and quality improvement. At present, however, MEDLINE and other bibliographic databases are probably the most up-to-date and readily available sources of systematic reviews.

Systematic reviews are a powerful and useful way to assemble evidence; however, just because a review has been done using systematic review methods does not guarantee that its results are credible. Regardless of the source, all systematic reviews (like all types of research evidence) require critical appraisal to determine their validity and to establish whether and how they will be useful in practice. (Table 2)

Table 2. Key Points To Remember

Appendix: Terminology and Strategies for MEDLINE Searches

MEDLINE searching uses many concepts and terms. To facilitate optimal use of MEDLINE, several of the most important are described below. The examples are drawn from the search for systematic reviews on NSAIDs and gastrointestinal complications.

Indexing

All citations in MEDLINE are indexed for content and methods using the MeSH vocabulary. This vocabulary consists of 14 000 specific terms and 18 000 synonyms. Two aspects of MEDLINE indexing are particularly worth keeping in mind. First, terminology is not always intuitive, so the vocabulary should be checked in a printed or electronic compendium of MeSH terms. Second, all articles are indexed according to the most specific MeSH heading or headings available. In other words, if you wanted to find an article specifically about ibuprofen, you would not find it by simply looking under the parent MeSH heading “anti-inflammatory agents, non-steroidal.”

Major Subject Headings (Starring or Majoring)

Many articles deal predominantly with one or two topics and briefly mention several other subjects (usually 5 to 15). When articles are indexed, they are assigned MeSH terms for each topic referred to in the paper. To make searching more powerful and selective, MeSH terms that indicate the major focus or emphasis of the paper are specially coded. Some MEDLINE searching systems do this by placing an asterisk (*) before the MeSH heading, hence the term “starring.” Other systems simply refer to these as major aspects of the article (“majoring”).

Exploding

“Exploding” refers to a MEDLINE search technique that enables users to circumvent the fact that all articles are indexed using the most specific MeSH heading available. It also allows the user to gather similar MeSH terms together. Using the term “exploding” instructs MEDLINE to identify all articles that have been indexed using a broad “family” MeSH term itself (for example, gastrointestinal diseases), as well as all articles indexed by more specific MeSH terms that are listed in the MeSH hierarchy under the broad term. To use the NSAID example again, a MEDLINE search that uses the MeSH term “anti-inflammatory agents, non-steroidal” would identify only articles that deal with NSAIDs in general. If, on the other hand, “explode” was used along with “anti-inflammatory agents, non-steroidal,” all articles on any of the 38 specific NSAIDs and those on NSAIDs in general would be identified.

Textwords

If you are searching for an article on a subject that has not been well indexed using MeSH terms, it is often helpful to have MEDLINE search the text of the titles and abstracts in the database for certain “free text” words or phrases. Our search strategy for systematic reviews illustrates this point. systematic reviews are not indexed in MEDLINE. Thus, to identify them, our strategy largely relies on textword searching using the various free terms for systematic reviews that authors have used in their titles and abstracts. The search strategy for systematic reviews also illustrates another feature of textword searching. If you are unsure of the final letters that an author may have used at the end of a word, you can insert a symbol such as “:” (the symbol varies from system to system). For example, the instruction “random:” tells MEDLINE to search for the words “random,” “randomized,” “randomization,” “randomised,” “randomisation,” and “randomly.”

Systematic Review Series

Series Editors: Cynthia Mulrow, MD, MSc, Deborah Cook, MD, MSc.

From McMaster University, Hamilton, Ontario, Canada

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
« Previous | Next Article »Table of Contents