Annals
Established in 1927 by the American College of Physicians
:
Advanced search
 
box Article
 arrow  Table of Contents                
space
 arrow  Abstract of this article
space
 arrow  PDF of this article
(PDFs free after 6 months)
space
 arrow  Figures/Tables List
space
 arrow  Articles citing this article
space
box Services
 arrow  Send comment/rapid response letter
space
 arrow  Notify a friend about this article
space
 arrow  Alert me when this article is cited
space
 arrow  Add to Personal Archive
space
 arrow  Download to Citation Manager
space
 arrow  ACP Search                        
space
 arrow  Get Permissions
space
box PubMed
Articles in PubMed by Author:
  arrow  Hartling, L.
space
  arrow  Klassen, T. P.
space
 arrow  Related Articles in PubMed
space
 arrow  PubMed Citation
space
 arrow  PubMed
space

CHALLENGES OF SUMMARIZING BETTER INFORMATION FOR BETTER HEALTH: THE EVIDENCE-BASED PRACTICE CENTER EXPERIENCE

Mark Helfand, MD, MPH; Sally Morton, PhD; Eliseo Guallar, MD, PhD; and Cynthia Mulrow, MD, MSc, Editors

Challenges in Systematic Reviews of Therapeutic Devices and Procedures

right arrow Lisa Hartling, MSc; Finlay A. McAlister, MD, MSc; Brian H. Rowe, MD, MSc; Justin Ezekowitz, MB, BCh, MSc; Carol Friesen, MA, MLIS; and Terry P. Klassen, MD, MSc

21 June 2005 | Volume 142 Issue 12 Part 2 | Pages 1100-1111

The authors discuss 3 challenges in conducting and interpreting any systematic review that are particularly relevant for systematic reviews of therapeutic devices or surgical procedures: 1) inclusion or exclusion of grey literature, 2) the role of nonrandomized studies, and 3) issues in applying the results to clinical care that are unique to the surgical and therapeutic device literature. The authors also discus s empirical evidence related to these topics and illustrate how reviewers in the Agency for Healthcare Research and Quality's Evidence-based Practice Center program have dealt with these challenges in developing evidence reports for decision makers and clinicians about therapeutic devices or surgical procedures.


Therapeutic devices and surgical procedures are often evaluated in nonrandomized studies or small single-center trials. While some may question whether systematic reviews in this area should be performed given the limitations of such studies, we believe that these reviews should play a key role in helping to inform decisions about the implementation of new technologies or procedures. By summarizing the available evidence, systematic reviews can highlight gaps in the evidence base that clinicians and policymakers require to make informed decisions.

We highlight 3 challenges that may arise in conducting and interpreting any systematic review that evaluates the efficacy or effectiveness of therapeutic devices or surgery. We also review the empirical evidence relevant to these methodologic issues in general and present the evidence specific to devices or procedures where it is available. We do not focus on the importance of assessing study quality because another article in this supplement addresses this issue (1).

In this article, we use the U.S. Food and Drug Administration (FDA) definition of a therapeutic device (2) as "an instrument, apparatus, implement, machine, contrivance, implant, in vitro reagent, or other similar or related article, including a component part, or accessory which is . . . intended to affect the structure of any function of the body . . . and which does not achieve any of its primary intended purposes through chemical action within or on the body . . . and which is not dependent upon being metabolized for the achievement of any of its primary intended purposes."

Therapeutic devices and surgical procedures obviously vary greatly in complexity and costs; however, the issues we discuss here are universal irrespective of the type of device or procedure. They include the following: 1) Should grey literature be included in systematic reviews of devices or procedures? 2) Should nonrandomized studies be included in systematic reviews of devices or procedures? and 3) What applicability issues are unique to studies of devices or procedures?

In discussing these issues, we use examples to illustrate how reviewers in the Evidence-based Practice Center (EPC) program and other authors dealt with them.


Challenge: Including Grey Literature
space

What Is Grey Literature?

Grey literature generally refers to reports that are difficult to locate or retrieve by using the electronic databases commonly employed to identify studies for inclusion in systematic reviews (for example, MEDLINE, EMBASE, and CINAHL). A common misconception is that the grey literature is a homogeneous collection of works. Rather, it includes many different types of documents that can vary substantially in design, quality, and extent of peer review (including internal company reports, documents submitted to the FDA, theses and other dissertations, conference abstracts, book chapters, personal correspondence, and even personal Web pages or blogs) (3, 4). Table 1 highlights several online resources for grey literature databases, such as SIGLE (System for Information on Grey Literature in Europe) and Web sites from North America and Europe. While abstracts are the most common type of grey literature included in systematic reviews (5-7), the type most relevant to systematic reviews of devices is reports from manufacturers or the FDA. The FDA enforces regulations to ensure the effectiveness and safety of a medical device before granting marketing clearance (2). The FDA reviews of the safety and efficacy data collected through this process for approved devices are publicly available (except for data considered proprietary or confidential) and generally contain far more detail than is typically presented in journal publications.


View this table:
[in this window]
[in a new window]
 
Table 1. Online Resources for Grey Literature Databases and Web Sites (North America and Europe)

 

A second common misconception is that grey literature is static. Given the substantial time lags between completion of studies and their publication in medical journals (4.2 to 4.8 years for studies with significant results and 6.4 to 8.0 years for those with nonsignificant results), it is not surprising that some studies initially identified as "grey literature" have been published by the time a systematic review is published or read (8, 9).

Should Grey Literature Be Sought in All Systematic Reviews?

While the goal of a systematic review should be to compile the evidence in an unbiased manner, opinions on the appropriateness of including grey literature in a review differ. For example, while 78% of meta-analysts stated that unpublished data should definitely or probably be included in systematic reviews, only 47% of journal editors agreed; 30% of editors reported that they would not publish a review that included unpublished data (10). Of the first 988 systematic reviews published in the Cochrane Library, 56% included grey literature; in most cases, however, the unpublished information merely provided data that supplemented published studies (11). Among the 27 evidence reports on devices and surgery produced through the EPC program in 2004, only 9 included any grey literature (Appendix Table).


View this table:
[in this window]
[in a new window]
 
Appendix Table. How Methodologic Challenges Were Addressed in Evidence-based Practice Center Evidence Reports of Devices or Surgery{webonly}

 
Proponents of searching for and including grey literature in all systematic reviews argue that excluding grey literature may result in biased estimates of the effectiveness of an intervention (7, 12, 13). While the existence of publication bias (that is, significant results are more likely to be published, and more likely to be published in English, than nonsignificant findings) has been well documented (5, 13), the question should really be framed in terms of whether including grey literature removes the potential for bias related to sample size (which includes publication bias as well as bias arising from systematic differences in methodologic quality). Indeed, it could be argued that including grey literature in a systematic review may introduce bias if the search for grey literature is not systematic, if only some of the grey literature is uncovered, or if only low-quality trials are uncovered (5, 10). Thus, it is not surprising that even systematic reviews that include unpublished trials can still demonstrate substantially asymmetric funnel plots (a graphical indication that sample size–related bias may exist) (5).

If we accept that including the grey literature does not remove the potential for sample size–related bias in a systematic review, a second (and perhaps more relevant) question is whether inclusion of grey literature substantially affects the results of systematic reviews. For example, of 159 systematic reviews reporting comprehensive searches for grey literature, only 38% found any unpublished trials and only 9% of the 1635 trials eventually included in these reviews were unpublished (indeed, despite the comprehensive search strategies, only 10% "were published in a journal not indexed in MEDLINE") (5). Although the amount and importance of grey literature will vary by topic area (Table 2), 3 empirical studies comparing systematic reviews across a wide variety of topic areas that did include grey literature versus those that did not found little difference between the effect estimates derived from published trials and those derived from published and unpublished trials (5, 14, 15). Furthermore, in stratified analyses of 159 systematic review comparisons, nondrug interventions showed less difference between published and unpublished trials than drug interventions (5). Of course, these are only 3 analyses. Given the heterogeneity between the systematic reviews within each of these studies, it is appropriate to acknowledge that in some areas of health care, particularly those in which there is little published evidence or the intervention is new or changing (as is frequently the case for devices or surgical procedures), discrepancies between published and "grey" trials may be sufficient to justify devoting resources to systematically searching for grey literature.


View this table:
[in this window]
[in a new window]
 
Table 2. Publication Status of Trials by Medical Specialty

 

In addition to weighing the often substantial opportunity costs of devoting time and resources to searching for grey literature, 3 other concerns drive us not to recommend routinely including unpublished data in systematic reviews. First, results presented in abstract form may be inaccurate or unhelpful. For example, only 33% of abstracts presented at a pediatric surgical meeting contained the same data as the subsequent publication, and the conclusions were similar in only 70% of the abstract–manuscript pairs (the conclusions were consistently weaker in the full manuscript than in the abstract) (16). Moreover, there may be a "window of opportunity" during which abstracts are potentially useful: Early abstracts may assist in identifying randomized, controlled trials (RCTs) but may have only preliminary, incomplete, or inaccurate data, while late abstracts may not provide any data in addition to those already published (17).

Second, it is sometimes difficult to judge the quality of reports in the grey literature, an essential part of any systematic review since low-quality trials are associated with overestimates of treatment effects (18). Indeed, assessment of trial quality is especially pertinent for surgical trials, which are often small and difficult to blind (19, 20). For example, only 2% of abstracts of randomized trials presented at the American Society of Clinical Oncology conferences reported the method of allocation concealment, and only 14% reported the method of blinding (21). Indeed, reflecting our skepticism about data presented in abstract form, only 3 of the 27 EPC reports on devices and surgical procedures accepted abstracts for inclusion. While some may argue that this problem is unique to abstracts, it has been shown that even FDA reports are less likely to appropriately describe methods of randomization, blinding, and allocation concealment than published journal articles (15).

Third, we believe that readers should be skeptical about unpublished data if the data are provided directly by the manufacturer of the device without the opportunity for peer review (22). It has already been well documented that industry-funded research is less often published or presented (23), takes longer to be published when it is accepted for publication (23), and is almost 4 times more likely to report outcomes favoring the sponsor than nonindustry funded studies (24).

In sum, while in an ideal world reviewers would attempt to identify all relevant unpublished literature (and would be successful in doing so), time and resource constraints often compromise the identification and inclusion of grey literature. The search for unpublished trials should become easier as prospective trial registries (for example, Current Controlled Trials [http://www.controlled-trials.com]) become fully functional; however, until these registries attain their goal of capturing 100% of trials, reviewers must continue to carefully consider whether grey literature is likely to be common and influential for their topic of interest. A particularly useful source of information related to devices is available through unpublished FDA reports. We believe that grey literature, including FDA data, should be sought when little evidence for a topic has been published (15) and when the intervention is new or changing, but that exhaustive searches of the grey literature are less necessary when large trials have already been published (15, 25). Regardless of the approach taken, researchers undertaking a systematic review should explicitly state whether they sought or included grey literature (26), and they should conduct sensitivity analyses to assess the impact of grey literature on treatment effect when they include unpublished studies (10). If reviewers choose to search for grey literature, they should use a systematic approach that targets known sources of grey literature rather than relying on select references to which the reviewer has been alerted in an ad hoc manner. Reviewers should also evaluate the findings with respect to the quality of reports, be they published or unpublished (10), and to the sponsoring agency (24). Regardless of whether a review contains grey literature, reviewers should evaluate and discuss the possibility of sample size–related bias and present the results and recommendations in light of the potential for bias (26).


Challenge: Including Nonrandomized Studies
space

How Common Are Nonrandomized Studies?

Nonrandomized studies include experimental studies (such as quasi-randomized trials) and observational studies with controls (such as controlled before–after studies, concurrent cohort studies, and case–control studies) or without concurrent controls (such as before–after studies, cross-sectional studies, and case series) (27). The fact that most published articles (68% to 87% of feature articles and brief communications in Annals of Internal Medicine, BMJ, and The New England Journal of Medicine) are nonrandomized studies underscores the value placed on this type of research by clinicians (28). Historically, there has been little randomized-trial evidence in the areas of devices and procedures, and it is estimated that RCTs account for less than 10% of the evidence base for surgical interventions (19). In fact, of 7295 trials indexed as RCTs in MEDLINE from 1990 to 1996, only 1% had device or devices as key words (29); of 9373 references in MEDLINE for pediatric surgery, only 0.3% were RCTs (30).

Why Not Include Nonrandomized Studies?

The inclusion of study designs other than RCTs in systematic reviews of therapy has been discouraged for decades given concerns about the various and well-described biases inherent in studies with nonrandomized designs (27, 31, 32). The extent of bias associated with different nonrandomized study designs, however, can vary tremendously and is at times unpredictable in direction and magnitude (33). A comprehensive review compared results obtained from randomized and nonrandomized studies for 82 clinical topics (27). Among the 8 meta-epidemiologic studies reviewed, 2 studies found close agreement in their estimates of treatment effect (34, 35), while the other 6 found that randomized and nonrandomized studies produced different results; for 5 of these studies, however, the differences were not consistent in direction (33, 36-40). The results are likewise inconsistent when restricted to surgical interventions; the results of randomized and nonrandomized studies do not consistently differ in their estimates, and the design that produces the more extreme result varies. Little evidence is available to compare results from randomized and nonrandomized studies for devices; therefore, no specific conclusions can be drawn. On the basis of the existing evidence, Deeks and colleagues (27) could not make firm conclusions regarding the value of randomization because of the conflicting results and limitations in the studies reviewed. In their analyses, Deeks and colleagues (27) used a resampling procedure to assess the effect of comparing patients within 2 multicenter trials (1 of which was a large trial of endarterectomy) with nonrandom concurrent or historical controls. They found that the biases associated with these 2 designs can significantly affect the results of a systematic review and that the effects are sensitive to differences in case mix. In the surgical example, historical controlled studies overestimated the benefits of endarterectomy, while the concurrent controlled studies produced results similar to those of the RCTs (Figures 1 and 2). Although multivariate models can be used to adjust for differences in covariates and case mix between comparison groups in nonrandomized studies, even advanced statistical techniques, such as instrumental variable analyses, can never completely remove concerns about confounding by indication (41).



View larger version (9K):
[in this window]
[in a new window]
 
Figure 1. Comparison of the results of 8000 randomized, controlled trials (RCTs) and 8000 historical controlled studies (HCTs) obtained from resampling within the European Carotid Surgery Trial. The distribution of results indicates systematic bias (average odds ratios in the randomized, controlled trials and historical controlled studies were 1.23 and 1.06, respectively). Adapted with permission from reference 27: Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7:iii-x, 1-173.

 


View larger version (10K):
[in this window]
[in a new window]
 
Figure 2. Comparison of the results of 8000 randomized, controlled trials (RCTs) and 8000 concurrently controlled trials (CCTs) obtained from resampling within the European Carotid Surgery Trial. The distribution of results revealed that 9% of studies within each design had statistically significant findings. Adapted with permission from reference 27: Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7:iii-x, 1-173.

 

An Argument for Inclusion: Evaluations of Safety

While RCTs are usually cited as the highest level of evidence for judging the efficacy of therapeutic interventions (42), "randomization should not be seen as a reliable proxy for overall quality" (43). Indeed, well-conducted nonrandomized studies may be more valid than poorly conducted RCTs (36, 44). In some situations, moreover, RCTs are unethical or impractical, and clinicians and policymakers must rely on lower levels of evidence (45). Certainly, nonrandomized studies may provide evidence that complements RCTs (particularly concerning issues of effectiveness in clinical practice versus efficacy in the trial setting) (33, 44, 46). Furthermore, for questions on patient safety of a new intervention, observational studies may in fact be a better source of evidence than RCTs (which are almost never sufficiently powered to detect rare adverse events because of inadequate sample size or duration of follow-up) (47, 48). These RCTS also tend to enroll younger and healthier patients who have conditions other than those usually encountered in clinical practice. This may lead to underestimates of adverse event rates when the intervention is applied in real-world settings. Indeed, 14% of trials published in 7 general medical journals in 1997 did not refer to adverse effects at all, 38% did not provide information on adverse effects by treatment group, and 46% provided no details on the severity of adverse effects (49). Thus, it is not surprising that only 27% of 2467 published systematic reviews evaluated safety as a secondary outcome (and only 4% included it as a primary outcome) (47). Nonrandomized studies were included for safety outcomes in all 18 EPC reports on devices and surgery that evaluated safety (Appendix Table).

Randomized, controlled trials are also often insufficient to assess safety outcomes because of inadequate or differing definitions of adverse events and severity (45, 49-54); variable or inadequate methods of monitoring or detection within trials (51, 52, 55, 56); or poor reporting of the numerators and denominators in safety data (45, 49-51, 53, 54, 57). For example, an analysis of 82 studies reported that these studies used 41 different definitions and 13 different grading scales for surgical wound infection (58). Indeed, definitions of "surgical death" often vary among studies, and deaths after discharge from the hospital are not often reported (59). Recognizing these flaws, trial reporting guidelines now contain specific recommendations for documenting information on adverse events and effects (60).

Many possible sources other than RCTs provide evidence on safety. Although large, observational studies are necessary to evaluate rare, serious adverse effects (61, 62), we believe that studies with noncomparative designs should not be included in systematic reviews because of their inability to establish causality (45). Thus, while postmarketing surveillance studies can be useful (47, 50), information from this source should be used cautiously because reporting may be nonsystematic (50), rare adverse effects may be underreported (61), and the lack of a control group may overestimate the incidence of common, less serious adverse effects that are not directly related to the specific therapy (63).

An Argument for Inclusion: Little or No RCT Evidence

Evidence from nonrandomized studies may be particularly relevant in areas lacking RCT evidence: Seven percent of systematic reviews in the Cochrane Database of Systematic Reviews (those published in The Cochrane Library, Issue 2, 2004) found no randomized trials and concluded that there was no evidence upon which to make informed decisions. Randomized evidence is especially lacking for devices (29) and surgical procedures (19, 30, 64, 65)—for example, nearly one quarter of systematic reviews of orthopedic surgery reported no RCTs in their specific topic area (66).

The lack of RCT evidence for medical devices or surgical procedures arises from several factors (29). First, the FDA does not require RCTs for new devices (67), in contrast to the general requirement of 2 RCTs for new-drug applications (29). Second, unlike researchers conducting drug trials, clinicians require specific training or skills for device or procedural trials (29). Once a surgical procedure has been developed, refined, and standardized to the point at which it would be appropriate to be evaluated in a randomized trial, collective clinical equipoise may no longer be present (68, 69). Furthermore, despite examples of the usefulness of sham surgeries in demonstrating the lack of efficacy of certain surgical procedures (such as arthroscopic knee surgery [70], internal mammary artery ligation [71, 72], and intracerebral fetal tissue grafting in older patients with Parkinson disease [73]), they are controversial because they expose "control" participants to surgical risks without any potential for benefit (74, 75). Finally, 2 facets of the device industry limit rigorous evaluation. In general, the device industry consists of small enterprises that may lack economic resources to do research of any kind (29, 67). Furthermore, the constant and rapid evolution of devices makes it difficult to determine the optimal time for evaluation (67). In the early stages of development, there may still be uncertainties over application of the technology (for example, which patients are most likely to benefit and how the device will interact with other evolving technologies) (69) and less experience and familiarity with the technical skills necessary for optimal outcomes (76). If the evaluation comes too late, the device may have already been adopted into clinical practice and consumer expectations (69).

In 21 EPC reports, reviewers included nonrandomized evidence for evaluating efficacy: 20 of these included studies without concurrent controls. Moreover, 7 EPC reports relied solely on nonrandomized evidence for evaluation of some surgical interventions: liver transplantation (77); total knee replacements (78); and procedures for managing uterine fibroids (79), chronic central neuropathic pain after spinal cord injury (80), breast abnormalities (81), incidental adrenal masses (82), and cataracts or glaucoma (83).

Other Issues in Weighing Whether To Include Nonrandomized Studies

Reviewers should consider several practical issues when deciding whether to include nonrandomized studies in systematic reviews. First, because nonrandomized studies make up most of the medical literature and the indexing of designs other than RCTs is less precise and reliable (84), the number of studies identified by an open literature search is dramatically greater than a search restricted to RCTs. Second, controversy exists over the appropriateness of performing a meta-analysis of nonrandomized data and the concern that such analyses may produce spurious results (85-87). Third, the assessment of methodologic quality, an essential part of any systematic review, is more problematic for nonrandomized studies than for RCTs. For example, 1 report identified 194 quality assessment tools for nonrandomized studies but found that almost all were flawed (27). Indeed, only 2 of the 25 EPC evidence reports evaluating devices or surgery that included nonrandomized studies used validated tools to assess methodologic quality. In most cases, the EPC reviewers generated their own list of quality criteria. Finally, inclusion of nonrandomized studies has implications for data extraction (for example, different forms may be required for different designs) (88). Despite these challenges, the development of reporting guidelines for meta-analyses of observational studies is an important evolution in the area of systematic reviews, and increasing numbers of meta-analyses of nonrandomized studies are being produced (87).

In sum, when enough RCTs are available to examine the efficacy of a given intervention, these should form the evidence base for decision making (27). Nonrandomized studies should be used to complement RCT evidence when information on long-term effects and safety outcomes is required. When RCTs are unavailable, reviewers should identify reasons for lack of RCT evidence and review the evidence from nonrandomized studies. Reviewers should define the designs to be included a priori and document them in the review protocol. Reviewers should also present nonrandomized evidence in the context of potential biases and discuss the likely influence of these biases on treatment effect estimates. When nonrandomized studies are included, methodologically stronger studies (89) should be considered first—in particular, we believe that inclusion of a control group (preferably concurrent) is essential to allow valid conclusions to be drawn from nonrandomized studies. All evidence, be it from randomized trials or not, should be graded for methodologic quality by using components that are informed by empirical evidence and, where possible, by using validated methods. Results from systematic reviews based on nonrandomized studies need to be interpreted on a case-by-case basis and should consider the magnitude and consistency of the observed effects, as well as the biases and limitations inherent in different study designs. Existing guidelines should be followed in reporting the results from systematic reviews of nonrandomized studies (87).


Challenge: Assessing Applicability Issues Unique to Studies of Devices or Procedures
space

After deciding that a research study (whether a nonrandomized study, an RCT, or a systematic review) is internally valid, the clinician or policymaker then must decide whether this evidence applies to their patients or situation, respectively. This is no easy task, even when the evidence is not hampered by the all-too-common problems of imprecision due to small numbers, brief follow-up periods, surrogate outcomes of limited clinical relevance, and restricted enrollment of highly select subsets of patients. These threats to applicability are common to studies of drugs and medical interventions, as well as those of devices and surgical interventions; other articles discuss these problems fully (90, 91). In this section, we focus on threats to applicability that are unique to studies evaluating devices or procedures.

Threat 1: Eligibility Criteria Include Providers and Institutions as well as Patients

First and foremost, patient eligibility criteria are of particular importance in interpreting studies (and systematic reviews) of devices and surgical procedures. This encompasses 3 key points. First, devices or procedures should be considered only for patients similar to those in whom they have been tested. For example, an EPC report demonstrated that cardiac resynchronization therapy was efficacious in patients with heart failure who have underlying bundle-branch block and low ejection fraction. Whether this intervention benefits patients with heart failure who do not exhibit these features is unknown (indeed, benefit is doubtful since the device addresses the hemodynamic problems resulting from bundle-branch block).

Second, and perhaps less apparent, the assumption that "overall trial results apply to most patients with that condition" (91) does not hold for surgical or device intervention studies. Thus, while the relative benefits of drugs generally do not differ across patient subgroups, at least across the usual spectrum of underlying risks (92-94), the mortality relative risk reduction associated with such surgical procedures as carotid endarterectomy, coronary artery bypass grafting, and devices (such as those providing cardiac resynchronization therapy) are all greater in patients at higher baseline risk (95-97). This paradox arises because the potential long-term positive effects on the outcome of interest are balanced against short-term negative effects on that same outcome for surgical or device interventions: Since periprocedural risks are absolute and similar irrespective of the patient's long-term baseline risk for the outcome, the long-term relative benefits are greater in patients who are more likely to develop the outcome without intervention. For example, the risk for death with implantation of a cardiac resynchronization device is the same—0.4% (95% CI, 0.2% to 0.7%)—for patients who have a 1% 1-year mortality risk without cardiac resynchronization therapy and for patients with a 10% 1-year mortality risk (97).

Third, while we can often safely assume that drugs beneficial in young patients (such as antihypertensive agents or statins) are also beneficial in older patients with the same target conditions, this assumption does not hold for surgical procedures or devices. For example, while the perioperative mortality rate in 2 trials that proved that carotid endarterectomy prevents stroke in patients with high-grade carotid stenosis was only 0.1% to 0.6% (98, 99), both trials restricted enrollment to younger patients, and population-based studies have shown that average perioperative mortality rates in older patients are substantially higher than in younger patients (1.9% to 3.6%) (100, 101). However, a recent analysis of data from acute care hospitals in 7 states showed that carotid endarterectomy procedure rates increased more in older patients than in younger patients after publication of these trials (100).

In addition to considering the patient eligibility criteria in device or surgical trials, the reviewer should also focus on the eligibility criteria for providers and institutions—abundant literature has shown clear relationships between hospital and physician volume and outcomes (102-104). For example, both of the carotid endarterectomy trials cited earlier were conducted in large-volume hospitals by surgical teams with low perioperative complication rates (98, 99). Indeed, the benefits in both trials were highly sensitive to perioperative complication rates: It is estimated that the relative risk reduction for disabling stroke with carotid endarterectomy decreases by 20% for every 2–percentage point increase in the absolute rate of perioperative stroke (105). Although both groups of trialists explicitly cautioned that "readers not apply our conclusions too broadly . . . the study surgeons were selected only after audits . . . confirmed a high level of expertise" (98, 99), subsequent analyses of carotid endarterectomy procedures in the United States have shown that surgical teams whose complication rates and operative volumes would have rendered them ineligible for the trials now perform most endarterectomies (100, 106). Not surprisingly, the in-hospital mortality rates after carotid endarterectomy are almost 10-fold higher in the "real-world" setting than in the trials included in the systematic review.

Threat 2: Randomization after the Procedure

Some trials of devices randomly assign patients after the device is implanted. For example, 8 of the 9 trials in an EPC report on cardiac resynchronization therapy used this approach, and only patients who had the device successfully implanted (approximately 90% of those who underwent the procedure) were randomly assigned to have the device turned on or off (97). This design, similar to the run-in period used in some pharmaceutical trials, does not affect the internal validity of the trials since the randomly assigned groups should still be balanced for unmeasured confounders. However, it does affect the tests of statistical significance (leading to narrower CIs and greater chance of type I errors) and may lead to overestimates of treatment benefits and underestimates of adverse effects (since these studies do not include patients who could not tolerate the procedure or those in whom implantation was unsuccessful) (107). Unfortunately, there are no accepted methods for adjusting results for the effects of the "run-in period" before randomization. While some authors advocate recalculating effect estimates as if the run-in had not been used (thus including prerandomization events in the relevant treatment group) (107), we suggest that the effect estimates should be derived from the postrandomization data but the conclusion prominently state that the reported effect estimates are a "best-case" scenario and probably represent the ceiling of what may be expected from a device or surgical procedure when used in clinical practice.

Threat 3: Rapid Evolutions in Technology

The effects of most technologies, such as devices and surgical procedures, tend to change over time. The benefits of technological innovations should theoretically improve over time (since, as providers become more experienced with the techniques, procedural complications should decline and selection of patients likely to benefit most should improve); however, as outlined above, this trend is often countered by the trend for innovations to diffuse nonselectively beyond those settings in which it was shown to be beneficial (thus increasing complication rates and reducing, if not negating, potential benefits). The uncertain effects of a device or procedure over time are compounded when the design of the device or the features of the procedure have rapidly evolved. Thus, earlier studies may show different outcomes than later studies. Furthermore, our ability to extrapolate from published studies to clinical practice for devices or procedures may be limited by any imprecision in the description of the device or procedure in the literature.

Threat 4: Effects Depend on Length of Follow-up

Given periprocedural complication rates, almost all interventions that involve a surgical procedure (including those to implant a device) have survival curves that cross at some point—that is, patients in the procedure group will have worse short-term outcomes but better long-term outcomes (if the procedure is beneficial). Thus, the benefits of the procedure appear smaller the closer to the procedural date one looks. This has implications for the pooling of data in a systematic review (implying that effect estimates from different time periods should not be pooled indiscriminately) as well as applying the results of the systematic review to make projections on long-term patient outcomes.

In sum, in making or drawing conclusions from systematic reviews of devices or surgical procedures, it should not be assumed that the efficacy and safety seen in clinical trials conducted in highly select subsets of patients cared for by highly select providers from highly select institutions will translate into similar safety and effectiveness rates when applied in usual practice, particularly over time as devices and surgical techniques evolve. Thus, the systematic reviewer and the reader must be particularly cautious in highlighting the patient, provider, and institutional eligibility criteria; the type of device or procedure; and the length of follow-up. Furthermore, the reviewer should highlight in the report that the reported effect estimates for any device trials that randomly assigned patients after the procedure had been performed probably represent the best-case estimates for efficacy of the device when used in clinical practice.


Conclusion and Recommendations
space

While systematic reviews of RCTs are considered the gold standard for evidence, they are not infallible. Indeed, in many instances, large RCTs have disproved the results of previous systematic reviews (108). While advances in the methods of systematic reviews in the past decade should result in more valid conclusions, many of these advances are most relevant to reviews of pharmacologic interventions. Many issues specific to devices and surgery require further evaluation. The merits and drawbacks of including grey literature and nonrandomized studies need to be carefully considered on a case-by-case basis for each clinical topic, and reviewers need to carefully consider the external validity of their findings and comment on issues of applicability for decision makers (Table 3). Unless the issues we raised in this manuscript are explicitly addressed in a systematic review of a therapeutic device or procedure, we believe clinicians faced with extrapolating from the evidence to clinical practice and the policymaker faced with deciding whether to support implementation of that device or procedure should be cautious.


View this table:
[in this window]
[in a new window]
 
Table 3. Recommendations for Improving Systematic Reviews of Therapeutic Devices and Procedures

 


Author and Article Information
space
up arrowTop
dotAuthor & Article Info
down arrowReferences

From The University of Alberta/Capital Health Evidence-based Practice Center and the University of Alberta, Edmonton, Canada.

Acknowledgments: The authors thank John Russell and Michelle Tubman for administrative and technical support.

Grant Support: This paper was produced by the University of Alberta Evidence-based Practice Center under contract to the Agency for Healthcare Research and Quality, Rockville, Maryland. Dr. McAlister is supported by the Alberta Heritage Foundation for Medical Research and is the University of Alberta/Merck Frosst/Aventis Chair in Patient Health Management; Drs. Ezekowitz and McAlister are supported by the Canadian Institutes of Health Research Research (CIHR); and Dr. Rowe is supported by a Canada Research Chair from the CIHR.

Potential Financial Conflicts of Interest: Authors of this paper have received funding for Evidence-based Practice Center reports.

Requests for Single Reprints: Finlay A. McAlister, MD, MSc, 2E3.24 Walter Mackenzie Health Sciences Centre, University of Alberta, 8440 112 Street, Edmonton, Alberta T6G 2R7, Canada; e-mail, Finlay.McAlister{at}ualberta.ca.

Current Author Addresses: Ms. Hartling: Aberhart Centre One, Room 9424, 11402 University Avenue, Edmonton, Alberta T6G 2J3, Canada.

Dr. McAlister: 2E3.24 Walter Mackenzie Health Sciences Centre, University of Alberta, 8440 112 Street, Edmonton, Alberta T6G 2R7, Canada.

Dr. Rowe: 1G1.43 Walter Mackenzie Health Sciences Centre, University of Alberta Hospital, 8440 112th Street, Edmonton, Alberta T6G 2B7, Canada.

Dr. Ezekowitz: 2-51 Medical Sciences Building, University of Alberta, Edmonton, Alberta T6G 2H7, Canada.

Ms. Friesen: Aberhart Centre One, Room 9420, 11402 University Avenue, Edmonton, Alberta T6G 2J3, Canada.

Dr. Klassen: 2C3.00 Walter Mackenzie Health Sciences Centre, University of Alberta, 8440 112 Street, Edmonton, Alberta T6G 2B7, Canada.


References
space
up arrowTop
up arrowAuthor & Article Info
dotReferences

1. Santaguida PL, Helfand M, Raina P. Challenges in systematic reviews that evaluate drug efficacy or effectiveness. Ann Intern Med. 2005;142:1066-72.[Abstract/Free Full Text]

2. Device advice. U.S. Food and Drug Administration. Accessed at http://www.fda.gov/cdrh/devadvice/312.html on 23 February 2005.

3. Haig A, Dozier M. BEME Guide no 3: systematic searching for evidence in medical education—Part 1: Sources of information. Med Teach. 2003;25:352-63. [PMID: 12893544].[Medline]

4. Alberani V, De Castro Pietrangeli P, Mazza AM. The use of grey literature in health sciences: a preliminary survey. Bull Med Libr Assoc. 1990;78:358-63. [PMID: 2224298].[Medline]

5. Egger M, Juni P, Bartlett C, Holenstein F, Sterne J. How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study. Health Technol Assess. 2003;7:1-76. [PMID: 12583822].[Medline]

6. Hopewell S, McDonald S, Clarke M, Egger M. Grey literature in meta-analyses of randomized trials of health care interventions. The Cochrane Library. Chichester, United Kingdom: J Wiley; 2004 (Issue 2).

7. McAuley L, Pham B, Tugwell P, Moher D. Does the inclusion of grey literature influence estimates of intervention effectiveness reported in meta-analyses? Lancet. 2000;356:1228-31. [PMID: 11072941].[Medline]

8. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ. 1997;315:640-5. [PMID: 9310565].[Abstract/Free Full Text]

9. Ioannidis JP. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. JAMA. 1998;279:281-6. [PMID: 9450711].[Abstract/Free Full Text]

10. Cook DJ, Guyatt GH, Ryan G, Clifton J, Buckingham L, Willan A, et al. Should unpublished data be included in meta-analyses? Current convictions and controversies. JAMA. 1993;269:2749-53. [PMID: 8492400].[Abstract]

11. Mallet S, Hopewell S, Clarke M. The use of grey literature in the first 1000 Cochrane reviews. 4th Symposium on Systematic Reviews: Pushing the Boundaries. Oxford, United Kingdom; 2002. Accessed at http://www.ihs.ox.ac.uk/csm/pushingtheboundaries/poster/5_cla.html on 22 April 2005.

12. Simes RJ. Publication bias: the case for an international registry of clinical trials. J Clin Oncol. 1986;4:1529-41. [PMID: 3760920].[Abstract/Free Full Text]

13. Scherer RW, Langenberg P. Full publication of results initially presented in abstracts. The Cochrane Database of Methodology Reviews. 2000;4 : Article no. MR000005. DOI: 10.1002/14651858.MR000005.

14. Burdett S, Stewart LA, Tierney JF. Publication bias and meta-analyses: a practical example. Int J Technol Assess Health Care. 2003;19:129-34. [PMID: 12701945].[Medline]

15. MacLean CH, Morton SC, Ofman JJ, Roth EA, Shekelle PG. How useful are unpublished data from the Food and Drug Administration in meta-analysis? J Clin Epidemiol. 2003;56:44-51. [PMID: 12589869].[Medline]

16. Weintraub WH. Are published manuscripts representative of the surgical meeting abstracts? An objective appraisal. J Pediatr Surg. 1987;22:11-3. [PMID: 3819986].[Medline]

17. Marinovich L, Lord S, Ghersi D. Data maturity and systematic reviews of new health technologies. 12th Cochrane Colloquium, 2-6 October 2004, Ottawa, Ontario, Canada. The Cochrane Collaboration; 2004.

18. Moher D, Cook DJ, Jadad AR, Tugwell P, Moher M, Jones A, et al. Assessing the quality of reports of randomised trials: implications for the conduct of meta-analyses. Health Technol Assess. 1999;3:i-iv. [PMID: 10374081].

19. McCulloch P, Taylor I, Sasako M, Lovett B, Griffin D. Randomised trials in surgery: problems and possible solutions. BMJ. 2002;324:1448-51. [PMID: 12065273].[Free Full Text]

20. Dimick JB, Diener-West M, Lipsett PA. Negative results of randomized clinical trials published in the surgical literature: equivalency or error? Arch Surg. 2001;136:796-800. [PMID: 11448393].[Abstract/Free Full Text]

21. Hopewell S, Clarke M. Quality of trials reported as conference abstracts: how well are they reported? [Abstract]. XI Cochrane Colloquium: Evidence, Health Care and Culture; 26-31 October 2003; Barcelona, Spain. The Cochrane Collaboration; 2003.

22. Bodenheimer T. Uneasy alliance—clinical investigators and the pharmaceutical industry. N Engl J Med. 2000;342:1539-44. [PMID: 10816196].[Free Full Text]

23. Lexchin J, Bero LA, Djulbegovic B, Clark O. Pharmaceutical industry sponsorship and research outcome and quality: systematic review. BMJ. 2003;326:1167-70. [PMID: 12775614].[Abstract/Free Full Text]

24. Bekelman JE, Li Y, Gross CP. Scope and impact of financial conflicts of interest in biomedical research: a systematic review. JAMA. 2003;289:454-65. [PMID: 12533125].[Abstract/Free Full Text]

25. Deeks JJ. Systematic reviews of published evidence: miracles or minefields? Ann Oncol. 1998;9:703-9. [PMID: 9739434].[Free Full Text]

26. Stewart L, Tierney J. Publication bias is a serious threat to the validity of Cochrane Reviews [Abstract]. 8th Annual Cochrane Colloquium, October 2000, Cape Town, South Africa. The Cochrane Collaboration; 2000.

27. Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7:iii-x. [PMID: 14499048].

28. Ray JG. Evidence in upheaval: incorporating observational data into clinical practice. Arch Intern Med. 2002;162:249-54. [PMID: 11822916].[Free Full Text]

29. Ramsey SD, Luce BR, Deyo R, Franklin G. The limited state of technology assessment for medical devices: facing the issues. Am J Manag Care. 1998;4:SP188-99. [PMID: 10185994].

30. Hardin WD Jr, Stylianos S, Lally KP. Evidence-based practice in pediatric surgery. J Pediatr Surg. 1999;34:908-12. [PMID: 10359204].[Medline]

31. Sackett DL. Bias in analytic research. J Chronic Dis. 1979;32:51-63. [PMID: 447779].[Medline]

32. Alderson P, Green S, Higgins JP. Cochrane Reviewers' Handbook 4.2.2 [updated December 2003]. The Cochrane Library. Chichester, United Kingdom: J Wiley; 2004 (Issue 1).

33. Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials. BMJ. 1998;317:1185-90. [PMID: 9794851].[Abstract/Free Full Text]

34. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342:1878-86. [PMID: 10861324].[Abstract/Free Full Text]

35. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342:1887-92. [PMID: 10861325].[Abstract/Free Full Text]

36. Britton A, McKee M, Black N, McPherson K, Sanderson C, Bain C. Choosing between randomised and non-randomised studies: a systematic review. Health Technol Assess. 1998;2:i-iv. [PMID: 9793791].

37. MacLehose RR, Reeves BC, Harvey IM, Sheldon TA, Russell IT, Black AM. A systematic review of comparisons of effect sizes derived from randomised and non-randomised studies. Health Technol Assess. 2000;4:1-154. [PMID: 11134917].[Medline]

38. Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA. 2001;286:821-30. [PMID: 11497536].[Abstract/Free Full Text]

39. Sacks H, Chalmers TC, Smith H Jr. Randomized versus historical controls for clinical trials. Am J Med. 1982;72:233-40. [PMID: 7058834].[Medline]

40. Lipsey MW, Wilson DB. The efficacy of psychological, educational, and behavioral treatment. Confirmation from meta-analysis. Am Psychol. 1993;48:1181-209. [PMID: 8297057].[Medline]

41. Newhouse JP, McClellan M. Econometrics in outcomes research: the use of instrumental variables. Annu Rev Public Health. 1998;19:17-34. [PMID: 9611610].[Medline]

42. Sackett DL. Evidence-based medicine [Editorial]. Spine. 1998;23:1085-6. [PMID: 9615357].[Medline]

43. Ferriter N, Huband N. Does the non-randomised controlled study have a place in the systematic review? A pilot study. 10th Cochrane Colloquium, 31 July–3 August 2002, Stavanger, Norway. The Cochrane Collaboration; 2002.

44. Black N. Why we need observational studies to evaluate the effectiveness of health care. BMJ. 1996;312:1215-8. [PMID: 8634569].[Free Full Text]

45. Price D, Jefferson T, Demicheli V. Methodological issues arising from systematic reviews of the evidence of safety of vaccines. Vaccine. 2004;22:2080-4. [PMID: 15121328].[Medline]

46. Radford MJ, Foody JM. How do observational studies expand the evidence base for therapy? [Editorial]. JAMA. 2001;286:1228-30. [PMID: 11559269].[Free Full Text]

47. Ernst E, Pittler MH. Assessment of therapeutic safety in systematic reviews: literature review. BMJ. 2001;323:546 [PMID: 11546700].[Free Full Text]

48. Li Wan Po A, Herxheimer A, Poolsup N, Aziz Z. How do Cochrane reviewers address adverse effects of drug therapy? 8th Annual Cochrane Colloquium, October 2000, Cape Town, South Africa; 2000.

49. Loke YK, Derry S. Reporting of adverse drug reactions in randomised controlled trials—a systematic survey. BMC Clin Pharmacol. 2001;1:3 [PMID: 11591227].[Medline]

50. Ioannidis JP, Lau J. Completeness of safety reporting in randomized trials: an evaluation of 7 medical areas. JAMA. 2001;285:437-43. [PMID: 11242428].[Abstract/Free Full Text]

51. Price D, Jefferson T. Methodological problems in the interpretation of adverse event data included in a systematic review of adverse events following measles-mumps-rubella (MMR) immunization. 4th Symposium on Systematic Reviews: Pushing the Boundaries, July 2002, Oxford, United Kingdom; 2002. Accessed at http://www.ihs.ox.ac.uk/csm/pushingtheboundaries/poster/12_pri.html on 22 April 2005.

52. Latham NK, Bennett D, Stretton C, Anderson CS. Adverse event reporting in a systematic review of resistance training exercise. 4th Symposium of Systematic Reviews: Pushing the Boundaries, July 2002, Oxford, United Kingdom; 2002.

53. Ioannidis JP, Contopoulos-Ioannidis DG. Reporting of safety data from randomised trials [Letter]. Lancet. 1998;352:1752-3. [PMID: 9848355].[Medline]

54. Edwards JE, McQuay HJ, Moore RA, Collins SL. Reporting of adverse effects in clinical trials should be improved: lessons from acute postoperative pain. J Pain Symptom Manage. 1999;18:427-37. [PMID: 10641469].[Medline]

55. MacLehose HG, Klaes D, Garner P. What methods do trials use to collect adverse data? [Abstract] XI Cochrane Colloquium: Evidence, Health Care and Culture, 26-31 October, Barcelona, Spain; 2003.

56. Hayashi K, Walker AM. Japanese and American reports of randomized trials: differences in the reporting of adverse effects. Control Clin Trials. 1996;17:99-110. [PMID: 8860062].[Medline]

57. Ioannidis JP, Lau J. Improving safety reporting from randomised trials. Drug Saf. 2002;25:77-84. [PMID: 11888350].[Medline]

58. Bruce J, Russell EM, Mollison J, Krukowski ZH. The measurement and monitoring of surgical adverse events. Health Technol Assess. 2001;5:1-194. [PMID: 11532239].[Medline]

59. Russell EM, Bruce J, Krukowski ZH. Systematic review of the quality of surgical mortality monitoring. Br J Surg. 2003;90:527-32. [PMID: 12734856].[Medline]

60. Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med. 2001;134:663-94. [PMID: 11304107].[Abstract/Free Full Text]

61. Cosmi B, Castelvetri C, Milandri M, Rubboli A, Conforti A. The evaluation of rare adverse drug events in Cochrane reviews: the incidence of thrombotic thrombocytopenic purpura after ticlopidine plus aspirin for coronary stenting [Abstract]. 8th Annual Cochrane Colloquium, October 2000; Cape Town, South Africa. The Cochrane Collaboration; 2000.

62. Greenland S, Satterfield MH, Lanes SF. A meta-analysis to assess the incidence of adverse effects associated with the transdermal nicotine patch. Drug Saf. 1998;18:297-308. [PMID: 9565740].[Medline]

63. Rosenzweig P, Brohier S, Zipfel A. The placebo effect in healthy volunteers: influence of experimental conditions on the adverse events profile during phase I studies. Clin Pharmacol Ther. 1993;54:578-83. [PMID: 8222500].[Medline]

64. Baraldini V, Spitz L, Pierro A. Evidence-based operations in paediatric surgery. Pediatr Surg Int. 1998;13:331-5. [PMID: 9639610].[Medline]

65. Solomon MJ, McLeod RS. Clinical studies in surgical journals—have we improved? Dis Colon Rectum. 1993;36:43-8. [PMID: 8416778].[Medline]

66. Audige L, Bhandari M, Griffin D, Middleton P. Systematic review of methodological issues related to summarizing the non-randomized evidence in orthopaedic surgery. 4th Symposium on Systematic Reviews: Pushing the Boundaries, July 2002, Oxford, United Kingdom. The Cochrane Collaboration; 2002.

67. Gelijns AC. Technological Innovation: Comparing Development of Drugs, Devices, and Procedures in Medicine, Background Paper. Committee on Technological Innovation in Medicine, Institute of Medicine, National Academy of Sciences. Washington, DC: National Academy Pr; 1989.

68. Berger RL, Celli BR, Meneghetti AL, Bagley PH, Wright CD, Ingenito EP, et al. Limitations of randomized clinical trials for evaluating emerging operations: the case of lung volume reduction surgery. Ann Thorac Surg. 2001;72:649-57. [PMID: 11515928].[Abstract/Free Full Text]

69. Robert G, Stevens A, Gabbay J. ‘Early warning systems’ for identifying new healthcare technologies. Health Technol Assess. 1999;3:1-108. [PMID: 10632623].[Medline]

70. Moseley JB, O'Malley K, Petersen NJ, Menke TJ, Brody BA, Kuykendall DH, et al. A controlled trial of arthroscopic surgery for osteoarthritis of the knee. N Engl J Med. 2002;347:81-8. [PMID: 12110735].[Abstract/Free Full Text]

71. Cobb LA, Thomas GI, Dillard DH, Merendino KA, Bruce RA. An evaluation of internal-mammary-artery ligation by a double-blind technic. N Engl J Med. 1959;260:1115-8. [PMID: 13657350].

72. Dimond EG, Kittle CF, Crockett JE. Comparison of internal mammary artery ligation and sham operation for angina pectoris. Am J Cardiol. 1960;5:483-6. [PMID: 13816818].[Medline]

73. Freed CR, Greene PE, Breeze RE, Tsai WY, DuMouchel W, Kao R, et al. Transplantation of embryonic dopamine neurons for severe Parkinson's disease. N Engl J Med. 2001;344:710-9. [PMID: 11236774].[Abstract/Free Full Text]

74. Albin RL. Sham surgery controls: intracerebral grafting of fetal tissue for Parkinson's disease and proposed criteria for use of sham surgery controls. J Med Ethics. 2002;28:322-5. [PMID: 12356962].[Abstract/Free Full Text]

75. Horng S, Miller FG. Ethical framework for the use of sham procedures in clinical trials. Crit Care Med. 2003;31:S126-30. [PMID: 12626957].[Medline]

76. Ramsay CR, Grant AM, Wallace SA, Garthwaite PH, Monk AF, Russell IT. Statistical assessment of the learning curves of health technologies. Health Technol Assess. 2001;5:1-79. [PMID: 11319991].[Medline]

77. Beavers KL, Bonis PA, Lau J. Liver transplantation for patients with hepatobiliary malignancies other than hepatocellular carcinoma. Rockville, MD: Agency for Healthcare Research and Quality; January 2001.

78. Kane RL, Saleh KJ, Wilt TJ, Bershadsky B, Cross WW III, MacDonald RM, et al. Total knee replacement. Evidence Report/Technology Assessment No. 86 (Prepared by the Minnesota Evidence-based Practice Center under contract 290-02-0009). Rockville, MD: Agency for Healthcare Research and Quality; December 2003. AHRQ publication no. 04-E006-2.

79. Myers ER, Barber MW, Couchman GM, Datta S, Grey RN, Gustilo-Ashby T, et al. Management of uterine fibroids. Evidence Report/Technology Assessment No. 34 (Prepared by the Duke Evidence-based Practice Center under contract 290-97-0014). Rockville, MD: Agency for Healthcare Research and Quality; July 2001. AHRQ publication no. 01-E052.

80. Jadad A, O'Brien MA, Wingerchuk D, Angle P, Biagi H, Denkers M, et al. Management of chronic central neuropathic pain following traumatic spinal cord injury. Evidence Report/Technology Assessment No. 45 (Prepared by McMaster University Evidence-based Practice Center under contract 290-97-0017). Rockville, MD: Agency for Healthcare Research and Quality; September 2001. AHRQ publication no. 01-E063.

81. Levine C, Armstrong K, Chopra S, Estok R, Zhang S, Ross S. Diagnosis and management of specific breast abnormalities. Evidence Report/Technology Assessment No. 33 (Prepared by MetaWorks, Inc., under contract 290-97-0016). Rockville, MD: Agency for Healthcare Research and Quality; September 2001. AHRQ publication no. 01-E046.

82. Lau J, Balk E, Rothberg M, Ioannidis JPA, DeVine D, Chew P, et al. Management of clinically inapparent adrenal mass. Evidence Report/Technology Assessment No. 56 (Prepared by New England Medical Center Evidence-based Practice Center under contract 290-97-0019). Rockville, MD: Agency for Healthcare Research and Quality; May 2002. AHRQ publication no. 02-E014.

83. Jampel H, Lubomski L, Friedman D, Robinson KA, Congdon N, Quigley HA. Treatment of coexisting cataract and glaucoma. Evidence Report/Technology Assessment No. 38 (Prepared by Johns Hopkins University Evidence-based Practice Center under contract 290-97-0006). Rockville, MD: Agency for Healthcare Research and Quality; June 2003. AHRQ publication no. 03-E041.

84. Gøtzsche PC, Harden H. Searching for non-randomised studies. Accessed at http://www.cochrane.dk/nrsmg/docs/chap3.pdf on 23 February 2005.

85. Egger M, Ebrahim S, Smith GD. Where now for meta-analysis? [Editorial] Int J Epidemiol. 2002;31:1-5. [PMID: 11914281].[Free Full Text]

86. Egger M, Schneider M, DaveySmith G. Spurious precision? Meta-analysis of observational studies. BMJ. 1998;316:140-4. [PMID: 9462324].[Free Full Text]

87. Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008-12. [PMID: 10789670].[Abstract/Free Full Text]

88. Olson O. Collecting data. Accessed at http://www.cochrane.dk/nrsmg/docs/chap5.pdf on 23 February 2005.

89. The Evidence-Based Medicine Working Group. Users' Guides to the Medical Literature. A Manual for Evidence-Based Clinical Practice. Chicago: American Medical Assoc; 2002.

90. McAlister FA, Straus SE, Guyatt GH, Haynes RB. Users' guides to the medical literature: XX. Integrating research evidence with the care of the individual patient. Evidence-Based Medicine Working Group. JAMA. 2000;283:2829-36. [PMID: 10838653].[Abstract/Free Full Text]

91. McAlister FA. Applying evidence to patient care: from black and white to shades of grey [Editorial]. Ann Intern Med. 2003;138:938-9. [PMID: 12779305].[Free Full Text]

92. McAlister FA. Commentary: relative treatment effects are consistent across the spectrum of underlying risks . . . usually. Int J Epidemiol. 2002;31:76-7. [PMID: 11914298].[Free Full Text]

93. Furukawa TA, Guyatt GH, Griffith LE. Can we individualize the ‘number needed to treat’? An empirical study of summary effect measures in meta-analyses. Int J Epidemiol. 2002;31:72-6. [PMID: 11914297].[Abstract/Free Full Text]

94. Schmid CH, Lau J, McIntosh MW, Cappelleri JC. An empirical study of the effect of the control rate as a predictor of treatment efficacy in meta-analysis of clinical trials. Stat Med. 1998;17:1923-42. [PMID: 9777687].[Medline]

95. Rothwell PM. Can overall results of clinical trials be applied to all patients? Lancet. 1995;345:1616-9. [PMID: 7783541].[Medline]

96. Yusuf S, Zucker D, Peduzzi P, Fisher LD, Takaro T, Kennedy JW, et al. Effect of coronary artery bypass graft surgery on survival: overview of 10-year results from randomised trials by the Coronary Artery Bypass Graft Surgery Trialists Collaboration. Lancet. 1994;344:563-70. [PMID: 7914958].[Medline]

97. McAlister FA, Ezekowitz JA, Wiebe N, Rowe B, Spooner C, Crumley E, et al. Systematic review: cardiac resynchronization in patients with symptomatic heart failure. Ann Intern Med. 2004;141:381-90. [PMID: 15353430].[Abstract/Free Full Text]

98. Beneficial effect of carotid endarterectomy in symptomatic patients with high-grade carotid stenosis. North American Symptomatic Carotid Endarterectomy Trial Collaborators. N Engl J Med. 1991;325:445-53. [PMID: 1852179].[Abstract]

99. Endarterectomy for asymptomatic carotid artery stenosis. Executive Committee for the Asymptomatic Carotid Atherosclerosis Study. JAMA. 1995;273:1421-8. [PMID: 7723155].[Abstract]

100. Gross CP, Steiner CA, Bass EB, Powe NR. Relation between prepublication release of clinical trial results and the practice of carotid endarterectomy. JAMA. 2000;284:2886-93. [PMID: 11147985].[Abstract/Free Full Text]

101. Wennberg DE, Lucas FL, Birkmeyer JD, Bredenberg CE, Fisher ES. Variation in carotid endarterectomy mortality in the Medicare population: trial hospitals, volume, and patient characteristics. JAMA. 1998;279:1278-81. [PMID: 9565008].[Abstract/Free Full Text]

102. Luft HS, Bunker JP, Enthoven AC. Should operations be regionalized? The empirical relation between surgical volume and mortality. N Engl J Med. 1979;301:1364-9. [PMID: 503167].[Abstract]

103. Birkmeyer JD, Siewers AE, Finlayson EV, Stukel TA, Lucas FL, Batista I, et al. Hospital volume and surgical mortality in the United States. N Engl J Med. 2002;346:1128-37. [PMID: 11948273].[Abstract/Free Full Text]

104. Halm EA, Lee C, Chassin MR. Is volume related to outcome in health care? A systematic review and methodologic critique of the literature. Ann Intern Med. 2002;137:511-20. [PMID: 12230353].[Abstract/Free Full Text]

105. Chassin MR. Appropriate use of carotid endarterectomy [Editorial]. N Engl J Med. 1998;339:1468-71. [PMID: 9811924].[Free Full Text]

106. Tu JV, Hannan EL, Anderson GM, Iron K, Wu K, Vranizan K, et al. The fall and rise of carotid endarterectomy in the United States and Canada. N Engl J Med. 1998;339:1441-7. [PMID: 9811920].[Abstract/Free Full Text]

107. Pablos-Mãndez A, Barr RG, Shea S. Run-in periods in randomized trials: implications for the application of results in clinical practice. JAMA. 1998;279:222-5. [PMID: 9438743].[Abstract/Free Full Text]

108. LeLorier J, Grãgoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med. 1997;337:536-42. [PMID: 9262498].[Abstract/Free Full Text]

109. McAlister FA, Ezekowitz J, Wiebe N, Rowe B, Spooner C, Crumley E, et al. Cardiac resynchronization therapy for congestive heart failure. Evidence Report/Technology Assessment No. 106 (Prepared by the University of Alberta Evidence-based Practice Center under contract 290-02-0023). Rockville, MD: Agency for Healthcare Research and Quality; November 2004. AHRQ publication no. 05-E001-02.

110. Velmahos GC, Kern J, Chan L, Oder D, Murray JA, Shekelle P. Prevention of venous thromboembolism after injury. Evidence Report/Technology Assessment No. 22 (Prepared by Southern California Evidence-based Practice Center/RAND under contract 290-97-0001). Rockville, MD: Agency for Healthcare Research and Quality; November 2000. AHRQ publication no. 01-E004.

111. McDonagh M, Carson S, Ash J, Russman BS, Stavri PZ, Krages KP, et al. Hyperbaric oxygen therapy for brain injury, cerebral palsy, and stroke. Evidence Report/Technology Assessment No. 85 (Prepared by the Oregon Health & Science University Evidence-based Practice Center under contract 290-97-0018). Rockville, MD: Agency for Healthcare Research and Quality; September 2003. AHRQ publication no. 04-E003.

112. McCrory DC, Samsa GP, Hamilton BB, Govert JA, Matchar DB, Goslin RE. Treatment of pulmonary disease following cervical spinal cord injury. Evidence Report/Technology Assessment No. 27 (Prepared by the Duke Evidence-based Practice Center under contract 290-97-0014). Rockville, MD: Agency for Healthcare Research and Quality; September 2001. AHRQ publication no. 01-E014.

113. Flamm CR, Aronson N, Mark D, Lefevre F, Bohn RL, Finkelstein B, et al. Endoscopic retrograde cholangiopancreatography. Evidence Report/Technology Assessment No. 50. (Prepared by Blue Cross and Blue Shield Association under contract 290-97-001-5). Rockville, MD: Agency for Healthcare Research and Quality; June 2002. AHRQ publication no. 02-E017.

114. Ip S, Glicken S, Kulig J, O'Brien R, Sege R. Management of neonatal hyperbiliru