Home |
Current Issue |
Past Issues |
In the Clinic |
ACP Journal Club |
CME |
Collections |
Audio/Video |
Mobile |
Subscribe |
Tools |
Help |
ACP Online
|
15 April 1994 | Volume 120 Issue 8 | Pages 667-676
Objectives: To introduce guidelines for the conduct, reporting, and critical appraisal of meta-analyses evaluating diagnostic tests and to apply these guidelines to recently published meta-analyses of diagnostic tests.
Data Sources: Based on current concepts of how to assess diagnostic tests and conduct meta-analyses. They are applied to all meta-analyses evaluating diagnostic tests published in English-language journals from January 1990 through December 1991, identified through MEDLINE searching and by experts in the field.
Study Selection: Meta-analyses were included if at least two of three independent readers regarded their main purpose as the evaluation of diagnostic tests against a concurrent reference standard.
Data Extraction: By three independent readers on the extent to which meta-analyses fulfilled each guideline, with consensus defined as agreement by at least two readers.
Data Synthesis: The guidelines are concerned with determining the objective of the meta-analysis, identifying the relevant literature and extracting the data, estimating diagnostic accuracy, and identifying the extent to which variability is explained by study design characteristics and characteristics of the patients and diagnostic test. In general, the guidelines were only partially fulfilled.
Conclusion: Meta-analysis is potentially important in the assessment of diagnostic tests. Those reading meta-analyses evaluating diagnostic tests should critically appraise them; those doing meta-analyses should apply recently developed methods. The conduct and reporting of primary studies on which meta-analyses are based require improvement.
To help researchers do and readers assess meta-analyses of diagnostic accuracy, we suggest guidelines on how they should be conducted, reported, and critically appraised. Our guidelines are based on current concepts of how to assess diagnostic tests and conduct meta-analyses (Table 1). Because other guidelines exist for meta-analysis in general [3-8], we emphasize those issues that are particular to assessing diagnostic accuracy. For each step, the guidelines are used to review 11 journal articles published from January 1990 through December 1991 whose primary purpose was to assess the accuracy of a diagnostic test against a concurrent reference standard using meta-analysis [9-19]. The 11 articles are all the meta-analyses published in this 2-year period that we could identify through various search procedures. The guidelines were applied to each article independently by three of the authors, and the majority view was accepted. (See Appendix 1 for details of search and review procedures.) REVIEW
Guidelines for Meta-analyses Evaluating Diagnostic Tests
Clinicians must decide whether to use a diagnostic test in a patient and how to interpret the result[1]. Policy-makers must assess the overall value of a test, compare it to alternatives, and decide whether the test should be made available. Both clinical and policy decisions should be based on a thorough evaluation of the test [2]. A crucial step in evaluation is the assessment of diagnostic accuracythe ability of the test to determine correctly the presence or absence of the disease of interest. This is done by comparing test results with those obtained using a reference ("gold," criterion, or comparison) standard. Estimates of the diagnostic accuracy of a test may differ among studies. Each study may have included too few patients to give precise estimates or too selected a population to allow general applicability. Therefore, meta-analysis, the critical review and statistical combination of results of previous research [3-6], is potentially useful for assessing diagnostic accuracy. Using meta-analysis, we can 1) provide an overall summary of diagnostic accuracy; 2) determine whether estimates of diagnostic accuracy depend on the study design characteristics [study validity] of the primary studies; 3) determine whether diagnostic accuracy differs in subgroups defined by the characteristics of the patients and test; and 4) identify areas for further research. New hypotheses may be generated or the attempt to meta-analyze data may highlight deficits that need to be addressed in future primary studies before a useful meta-analysis can be done.
|
Determine the Objective and Scope of the Meta-analysis
|
|---|
Not all primary studies identified will have used the meta-analyst's ideal reference standard or cover the exact tests or clinical context desired by the meta-analyst. This can be dealt with by using only the subset of papers that do so or by examining the extent to which deviation from the ideal changes the findings. In general, we suggest the latter approach and describe it further when we discuss how to assess the effect of variation in study validity and in the characteristics of patients and test. Often, the comparative value of several tests is being assessed. As when only a single test is being evaluated, this comparison should be viewed against the background of previous information, which should be equivalent for the tests being compared.
Review. The reference standard was stated in 10 and the test of interest in all of the 11 meta-analyses. Two meta-analyses gave a clear statement about the clinical background against which the tests' incremental value was being evaluated. Some authors, for example, Pinson and colleagues [18], stated that this was a problem in the primary studies, pointing to the need for improving their quality. Seven meta-analyses compared tests, whereas 4 evaluated an individual test.
Retrieve the Relevant Literature
|
|---|
Except for those clearly outside the scope of the meta-analysis, the relevance of all other papers should be judged by applying inclusion and exclusion criteria. This result could be accomplished by using the same methods outlined in the next section for the extraction of data. Reporting the reason for exclusion of potentially eligible papers helps readers understand how the criteria were applied.
Publication bias may arise because of the selection for publication of papers with more extreme results because they are "more interesting" or "statistically significant" [22, 23]. Methods for dealing with publication bias have been developed [24, 25], but their applicability to diagnostic test assessment has not been explored.
Review. Literature retrieval methods were described in 7 of the 11 meta-analyses, all of which used MEDLINE searching. Three articles gave their search terms, and 2 of these explained how these were linked [10, 14]. Criteria for including or excluding papers were given in 8 articles. Four articles gave information about excluded studies. Publication bias was discussed in 3 of the 11 articles, although no estimates could be made of its likely effect.
Extract and Display the Data
|
|---|
Publishing a full list of diagnostic accuracy and study characteristics (for example, design features, patient characteristics) for each primary study allows other researchers to decide if they agree with the judgments made and enables re-analysis applying different analytic techniques, using a subset of studies, or adding studies published after the meta-analysis was done [26].
Review. Two meta-analyses stated that each of the primary studies was assessed by two or more readers [13, 19]. One of the articles [13] also mentioned that disagreements were settled by consensus or a third party. Gianrossi and colleagues [11] offer an example of extensive display of data about diagnostic accuracy and study characteristics.
Estimate Diagnostic Accuracy
|
|---|
These measures rely on a single threshold (cut-point or positivity criterion) for classifying a test result as positive. Changing the threshold to increase sensitivity decreases specificity and vice versa. This trade-off between sensitivity and specificity makes it imperative that they be considered jointly. When studies use different criteria to define positive and negative test results, as in our example, they differ in their explicit threshold. Even when studies use the same explicit threshold, their implicit thresholds may differ, especially if interpretation of the test requires judgment. For example, radiologists may agree to use the same words to describe imaging test results but still differ in what they regard as the boundary between "abnormal" and "probably abnormal"
An alternative to reporting a single pair of sensitivity and specificity estimates is to report a range of pairs, which is obtained as the threshold criterion is varied. Such a range of pairs is often reported as a receiver operating characteristic (ROC) curve, which is a graph of the sensitivity (true-positive rate) on the vertical axis against the false-positive rate (1 specificity) on the horizontal axis [29]. One overall measure of the test's accuracy is the area under the ROC curve, where a value of 0.5 is obtained if the test does no better than chance and a value of 1 is obtained if the test is perfect [29, 30].
Another measure of test performance is the likelihood ratio, defined as the ratio of the probability of a particular test result in people with disease to the probability of the same test result in people without disease. A likelihood ratio at each possible value of a multi-category or continuous test to get a result-specific likelihood ratio can be estimated, thus avoiding the need to decide on a single threshold for dichotomizing a test as positive or negative. More importantly, it avoids the loss of information that the dichotomy causes, whereby a test result just above the threshold is not differentiated from a test result well above the threshold [20]. Likelihood ratios can be used to compute the post-test odds of disease for a patient with known or estimated pretest odds by using a version of Bayes' theorem:
Post-test odds of disease =
(pre-test odds of disease) x (likelihood ratio)
Methods for obtaining a summary estimate of diagnostic accuracy in a meta-analysis are now described, first for when test results are available only as a dichotomy and then for when test results are available in more than two categories.
Test Results Are Available Only as a Dichotomy
|
|---|
|
Test Results Are Available in More than Two Categories
|
|---|
If scale or threshold differences between primary studies are likely, as is probable for tests involving judgment, then likelihood ratios cannot be obtained on pooled data. One approach is to dichotomize the test result for each primary study and to use SROC methods as described above. Better use is made of the data if an ROC curve is constructed for each primary study using several thresholds, and an overall ROC curve is derived using ordinal regression techniques that have been applied previously to the diagnostic setting as a means of controlling for covariates [34]. Summary measures such as the area under the ROC curve can be obtained for the entire curve [35] or a clinically relevant range [36-38]. Difficulties with this approach may arise when studies have different numbers of cut-off values.
Review. All 11 meta-analyses were based on estimates of sensitivity and specificity from the primary studies. Seven articles gave mean estimates separately for sensitivity and specificity and did not examine their interdependence. Two studies estimated SROC curves [14, 15], one of which also estimated an odds ratio as a measure of diagnostic accuracy [15]. Two papers did not give a summary measure of diagnostic accuracy. Hoffman and colleagues [13] plotted sensitivity against the false-positive rate and decided not to pool results because of heterogeneity among the estimates of diagnostic accuracy from the primary studies. One meta-analysis gave a statistical test of association without any summary measure of diagnostic accuracy [19], whereas another based part of its analysis on a prevalence-dependent measure, the proportion of individuals overall correctly classified by the test [10].
Ordinal regression techniques have not yet been used to pool ROC data from primary studies. Data from primary studies that are amenable to the ordinal regression approach are uncommon. One meta-analysis did, however, note that some primary studies had test data at more than one threshold [15].
An example of the use of continuous data is the meta-analysis by Guyatt and colleagues [39], who obtained individual data points from 55 publications on the value of serum ferritin levels in the diagnosis of iron-deficiency anemia. Logistic regression was used on the pooled data to obtain continuous graphs of the likelihood ratio by serum ferritin result.
Assess the Effect of Variation in Study Validity on Estimates of Diagnostic Accuracy
|
|---|
Appropriate Reference Standard
|
|---|
Independence of Observations
|
|---|
Verification Bias
|
|---|
|
|
The investigators can adjust for the bias if the sampling was done randomly and with known proportions within strata defined by test results. For example, Table 2 can be reformulated from Table 3 if it is known that Table 3 includes only a 10% random sample of test-negatives. Methods also exist for the estimation of confidence intervals for the corrected sensitivity and specificity [41]. Usually the situation is more complex, with other clinical information, such as age or other symptoms, influencing the selection of patients who are assessed by the reference standard. Again, the bias can be adjusted for as long as sampling has been random within the categories defined by test result and known clinical information. However, the choice of patients for verification by the reference standard is commonly not random. In this case, estimates of diagnostic accuracy are biased in unpredictable ways and no adjustment is possible.
Issues in Comparative Meta-analyses
|
|---|
We next address how to incorporate variation in study validity into the meta-analysis. One approach is to exclude studies that do not meet standards for scientific validity [20, 43]. This method avoids bias at the expense of decreasing precision, that is, widening the confidence intervals. For meta-analyses of clinical trials, empiric evidence supports the theory that inclusion of nonrandomized trials can bias the results [44, 45], although little empiric evidence supports the importance of other deviations from the theoretically ideal design [46]. For meta-analyses evaluating diagnostic tests, little empiric evidence has been accumulated to determine the practical importance of the potential sources of bias discussed above. Moreover, primary studies often do not report sufficient data for judging the potential for bias. We suggest that the initial analysis should assess the effect of reported study design flaws on estimates of diagnostic accuracy. This goal can be accomplished by doing the meta-analysis separately for studies with and without a particular design flaw or by including the presence or absence of the flaw in regression models as outlined in Appendix 3. Primary studies with a particular design flaw may give a different SROC curve to primary studies without that flaw. In that case, reliance should be placed only on those studies that do not have the flaw. Alternatively, the design flaw may not give a different SROC curve, in which case all studies can be used in the meta-analysis. Because different design flaws are likely to cause different biases, we suggest assessing the effect of each separately rather than summarizing them into an overall "validity score"
Review. Six of the 11 meta-analyses discussed variability in the choice of reference standard between the primary studies. Three articles examined how this variability affected diagnostic accuracy. Seven of the 11 meta-analyses mentioned other study design characteristics. Pinson and colleagues [18] provide a good example of their assessment. One meta-analysis excluded studies because they were prone to verification bias [15]. Five meta-analyses showed data on the variability of study design characteristics between studies, and 5 did analyses to determine how they predicted diagnostic accuracy. All but 1 of these analyses explored the relation between study design characteristics separately for sensitivity and specificity. Therefore, they are unable to assess whether the study design characteristic altered diagnostic accuracy rather than just reflecting differences in the threshold for test positivity between the primary studies.
Of the seven comparative diagnostic test evaluations, two were based on the tests both being applied to the same patients within each primary study [9, 14]. The remaining five meta-analyses used the much weaker design of obtaining estimates of diagnostic accuracy for the different tests at least in part from different studies.
Assess the Effects of Variation in the Characteristics of the Patients and Test on Estimates of Diagnostic Accuracy (Generalizability)
Valid estimates of diagnostic accuracy may not be generalizable (applicable) to the setting in which the reader works. Readers of a meta-analysis will want to know if they can apply the meta-analyzed estimate of diagnostic accuracy to the clinical or policy decision they confront. Although evidence exists for at least one condition that the combination of multiple tests is generalizable between settings [47], there is still reason for concern about the applicability of diagnostic accuracy assessments from a meta-analysis to other medical settings [21, 48]. Readers of meta-analyses may decide that the summary estimate of diagnostic accuracy is applicable to their decision making for any of the following reasons: 1) Characteristics of the patients and test are similar in the meta-analysis and in their target population; 2) characteristics are not associated with diagnostic accuracy; or 3) a particular characteristic (for example, sex) affects diagnostic accuracy and estimates are provided separately for groups defined by this characteristic. The reader can then apply them separately to each group.
Relevant characteristics depend on the topic and will be limited by reporting in the primary studies. The major patient characteristics are concerned with the clinical spectrum under consideration. For example, a test may be very accurate at differentiating patients with advanced cancer from persons in perfect health but much less accurate at differentiating patients with early cancers from those whose symptoms are caused by a range of other diseases [49, 50]. This example illustrates two factors that can influence the estimate of test accuracy: the extent of cancer in the "diseased" group and the occurrence of other medical conditions in the "nondiseased" group. The implication of this phenomenon is that measures of diagnostic accuracy are generalizable only to settings that have a similar spectrum of patients, defined by the type and extent of disease in patients with the disease of interest and the type and extent of differential diagnoses in the controls. This spectrum is likely to vary in different practice settings [20, 47]. Other commonly included patient characteristics are age, sex, presenting complaints, comorbid conditions, and the findings of other diagnostic tests that have been done.
The technical details of tests may also vary from one setting to another and limit generalizability. Variation among studies may be due to different diagnostic accuracy of different test methods used in the primary studies. On the other hand, different test methods may simply vary in their threshold. For example, the diagnostic accuracy of computerized techniques of reading thallium scintigrams for the diagnosis of coronary artery disease can be shown to be no better than visual reading. However, the threshold for computerized reading is at a level that results in a higher sensitivity at a lower specificity (see Figure 1 and Appendix 3).
Review. Nine of the 11 meta-analyses mentioned variability in at least one patient characteristic among the primary studies. Five articles gave information about the distribution of characteristics, and 7 included them in analysis as predictors of variability of diagnostic accuracy. The most commonly considered patient characteristic was the type and extent of disease in patients with the disease of interest. Seven meta-analyses discussed variability in the test, of which 4 examined how this variability affected diagnostic accuracy. Four meta-analyses examined how publication year affected diagnostic accuracy. As discussed in the review of how meta-analyses dealt with study validity, studies generally examined the effect of characteristics on sensitivity and specificity separately and therefore did not assess whether characteristics affected diagnostic accuracy rather than just causing a shift in threshold.
Conclusion
|
|---|
|
|
|---|
Appendix 1. Procedure for Identifying and Evaluating Meta-analyses
|
|---|
Meta-analyses published between January 1990 and December 1991 were identified through searching MEDLINE, by consulting experts in the field, and examining bibliographies of papers already retrieved. MEDLINE searching was done independently by two groups of authors. We then examined the index terms used in MEDLINE for all relevant papers obtained by any means and devised a final search strategy, which was as follows:
(explode diagnosis OR any of the following subheadings: diagnosis, radionuclide imaging, ultrasonography OR explode "sensitivity and specificity") AND (Meta-analysis OR the text words "meta" and any word starting with "analy").
Additional searches linking terms for diagnosis to "overview" and words starting with "pool" as text words had a negligible yield and were not pursued.
Review Procedure
|
|---|
All eligible papers were then reviewed independently by three of the authors to assess whether meta-analysis evaluating a diagnostic test was the main purpose of the paper or whether it was secondary (for example, to the reporting of new study results) and whether the meta-analysis addressed each of the issues outlined in our guidelines.
For each item assessed, the majority view was taken as the final response. Responses about whether our guidelines were addressed are given only for those 11 papers in which meta-analysis was the main purpose. Those papers in which meta-analysis was a secondary purpose (for example, done in the discussion section of the report of new research findings, such as [52, 53]) provided negligible information about how or whether the issues in our guidelines were addressed.
Appendix 2. Methods for Estimating Summary Receiver Operating Characteristic Curves
|
|---|
For statistical reasons, it is advisable to model logit TPR logit FPR as a linear function of logit TPR + logit FPR. Thus, to estimate an SROC curve, we use the following model:
D = a + bS
where D = logit TPR logit FPR, S = logit TPR + logit FPR, a = intercept term, and b = regression coefficient for S.
This model can be fit using conventional least-squares methods unweighted or weighted by the variance of (logit TPR logit FPR) within available statistical packages. Robust techniques can also be used [31]. Regression lines should be drawn only over the range of the data. The final model can be converted back to the conventional ROC axes of TPR against FPR.
The SROC formulation of D and S have convenient interpretations. D is easily shown to be the log odds ratio, which is a common measure of association in epidemiologic studies. Here, the odds ratio represents the odds of a positive test result among diseased persons relative to the odds of a positive test result among nondiseased persons. S is a measure of the threshold for classifying a test as positive, which has a value of 0 when sensitivity equals specificity. It becomes positive when a threshold is used that increases sensitivity (and decreases specificity) and becomes negative when a threshold is used that decreases sensitivity (and increases specificity). The intercept of the model (a) is therefore an odds ratio and the regression coefficient (b) examines the extent to which the odds ratio is dependent on the threshold used. If the regression coefficient is near zero and not statistically significant, test accuracy for each primary study can be summarized as the odds ratio and these odds ratios can be combined using various techniques [54, 55].
The SROC method deals with the problem of different thresholds among studies and is useful for comparing the overall diagnostic accuracy of different tests or the extent to which accuracy depends on study characteristics. However, it does not directly provide an exclusive estimate of sensitivity and specificity. To do so requires fixing a value for either sensitivity or specificity and reading the corresponding value for the other off the SROC curve. The fixed value could be the median or mean of those found on meta-analysis or based on local experience.
To illustrate the SROC approach, we use data from a meta-analysis of over 50 primary studies on the exercise thallium scintigram as a test for angiographic coronary artery disease [56]. Although this paper was published before the years of our formal review, it is used because it gives extensive tabulation of the data from each primary study. This meta-analysis examined sensitivity and specificity separately. The range of sensitivity among the primary studies was from 0.63 to 0.98 (mean, 0.840) and the range of specificity from 0.43 to 1.00 (mean, 0.844). The regression equation for the plot of D on S was obtained after adding 0.5 to the numerator and 1 to the denominator of both the TPR and FPR for each study so that any zero cells did not result in undefined transformations [31]. The intercept of the unweighted model is 3.631 (95% CI, 3.354 to 3.907) and the regression coefficient for S is 0.294(CI, 0.503 to 0.085).The regression coefficient differs significantly from zero, showing that the odds ratio for the association between test and reference standard is dependent on the threshold used. The plot of TPR on FPR is shown (Figure 1). At a mean specificity of 0.844, the sensitivity is estimated at 0.868, which is only slightly higher than the value that was reported when the sensitivity and specificity were examined separately. The difference is small because the correlation between logit TPR and logit FPR is modest in this example (Pearson r = 0.19, P = 0.16).
Appendix 3. Methods for Assessing the Effect on Diagnostic Accuracy of Variation in Study Validity and Characteristics of the Patients and Test
Estimates of diagnostic accuracy may vary by study design characteristics (study validity) or characteristics of the patients or test (generalizability). Differences in diagnostic accuracy by characteristics are more likely to be real rather than caused by the play of chance if the analyses fulfill criteria such as being based on a previous hypothesis and showing large statistically significant differences [57]. Most variability in diagnostic accuracy will probably not be explained by reported characteristics. Formal methods (random-effects models) exist for taking account of heterogeneity between primary studies and estimating summary measures with appropriate confidence intervals in meta-analyses of randomized trials [54, 55] but have not as yet been published for most measures of diagnostic accuracy.
Experience with modeling the effect of characteristics on diagnostic accuracy is limited at present. Because most primary studies only provide test data around a single threshold and methods for exploring the effect of other variables are not well developed, we restrict our comments to examining whether a single variable predicts diagnostic accuracy using SROC. Assessing whether characteristics affect test accuracy requires a method that identifies whether accuracy is better in certain groups or if there is only a shift in threshold along the same ROC. To assess accuracy, the two groups being compared can be shown graphically using different symbols. The magnitude of the difference between groups and its statistical significance can be obtained by including the group variable in the SROC model. Alternatively, one can model the SROC curve based on all the data and compare the residuals around this model for the two groups using an unpaired t-test [14]. We now explore the first method using the example from Appendix 2.
We may wish to know if diagnostic accuracy is improved by having thallium scintigrams read using computerized techniques rather than by visual examination. The data are derived from Tables 2 and 3 of the meta-analysis by Detrano and colleagues [56]. The analysis compares studies that used visual techniques to read the thallium scintigrams with those that used computer or semi-quantitative techniques. The sensitivity appears to improve with computerized reading (Table 4). However, specificity deteriorates. This result could be explained by a threshold difference between the two reading techniques rather than a difference in accuracy of the two techniques. In Figure 1, estimates of diagnostic accuracy for visual reading are more common at lower true-and false-positive rates and computerized reading at higher true- and false-positive rates. This finding could be caused by a shift in threshold and could be confirmed by comparing the means of S for the two techniques. Means are 0.45(CI, 0.79 to 0.12)for visual readings and 0.80 (CI, 0.08 to 1.53) for computerized readings, suggesting that there is a significant (P = 0.005) difference in threshold. Reading technique does not have a statistically significant coefficient if included in the SROC model (Table 5). Setting specificity at 0.880 in the model gives a sensitivity of 0.846 for visual reading and 0.858 for computerized reading. In addition to not being statistically significant, the difference is clinically unimportant. In summary, the difference between reading by computer or visual methods does not change accuracy; it only shifts the threshold to increase the sensitivity by the amount one would expect from the reduction in specificity.
|
|
Author and Article Information
|
|---|
|
|
|---|
References
|
|---|
|
|
|---|
1. Panzer RJ, Black ER, Griner PF, eds. Diagnostic Strategies for Common Medical Problems. Philadelphia: American College of Physicians; 1991.
2. Guyatt GH, Tugwell PX, Feeny DH, Haynes RB, Drummond M. A framework for clinical evaluation of diagnostic technologies. Can Med Assoc J. 1986; 134:587-94.
3. L'Abbe KA, Detsky AS, O'Rourke K. Meta-analysis in clinical research. Ann Intern Med. 1987; 107:224-33.
4. Jenicek M. Meta-analysis in medicine. Where we are and where we want to go. J Clin Epidemiol. 1989; 42:35-44.
5. Fleiss JL, Gross AJ. Meta-analysis in epidemiology, with special reference to studies of the association between exposure to environmental tobacco smoke and lung cancer: a critique. J Clin Epidemiol. 1991; 44:127-39.
6. Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiol Rev. 1987; 9:1-30.
7. Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991; 44:1271-8.
8. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987; 316:450-5.
9. Berman DS, Kiat H, Van Train KF, Friedman J, Garcia EV, Maddahi J. Comparison of SPECT using technetium-99m agents and thallium-201 and PET for the assessment of myocardial perfusion and viability. Am J Cardiol. 1990; 66:72E-79E.
10. Dales RE, Stark RM, Raman S. Computed tomography to stage lung cancer. Approaching a controversy using meta-analysis. Am Rev Respir Dis. 1990; 141:1096-101.
11. Gianrossi R, Detrano R, Columbo A, Froehlicher V. Cardiac fluoroscopy for the diagnosis of coronary artery disease: a meta analytic review. Am Heart J. 1990; 120:1179-88.
12. Goris ML, Basso LV, Keeling C. Parathyroid imaging. J Nucl Med. 1991; 32:887-9.
13. Hoffman RM, Kent DL, Deyo RA. Diagnostic accuracy and clinical utility of thermography for lumbar radiculopathy. A meta-analysis. Spine. 1991; 16:623-8.
14. Hurlbut TA 3d, Littenberg B. The diagnostic accuracy of rapid dipstick tests to predict urinary tract infection. Am J Clin Pathol. 1991; 96:582-8.[Medline]
15. Kardaun JW, Kardaun OJ. Comparative diagnostic performance of three radiological procedures for the detection of lumbar disk herniation. Methods Inf Med. 1990; 29:12-22.
16. Mezger J, Lamerz R, Permanetter W. Diagnostic significance of carcinoembryonic antigen in the differential diagnosis of malignant mesothelioma. J Thorac Cardiovasc Surg. 1990; 100:860-6.
17. Phillips KA. The use of meta-analysis in technology assessment: a meta-analysis of the enzyme immunosorbent assay human immunodeficiency virus antibody test. J Clin Epidemiol. 1991; 44:925-31.
18. Pinson AG, Becker DM, Philbrick JT, Parekh JS. Technetium-99m-RBC venography in the diagnosis of deep venous thrombosis of the lower extremity: a systematic review of the literature. J Nucl Med. 1991; 32:2324-8.
19. Reed JF 3d. Meta-analysis of the reliability of noninvasive carotid studies. Biomed Instrum Technol. 1991; 25:465-71.
20. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology. A Basic Science for Clinical Medicine. 2d ed. Boston: Little Brown; 1991.
21. Knottnerus JA, Leffers P. The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol. 1992; 45:1143-54.
22. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet. 1991; 337:867-72.
23. Clermont RJ, Chalmers TC. The transaminase tests in liver disease. Medicine (Baltimore). 1967; 46:197-207.
24. Chalmers TC, Frank CS, Reitman D. Minimizing the three stages of publication bias. JAMA. 1990; 263:1392-5.
25. Begg CB, Berlin JA. Publication bias and dissemination of clinical research. J Natl Cancer Inst. 1989; 81:107-15.
26. Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992; 327:248-54.
27. Sox HC Jr, Blatt MA, Higgins MC, Marton KI. Medical Decision Making. Boston: Butterworths; 1988.
28. Fletcher RH, Fletcher SW, Wagner EH. Clinical Epidemiology: The Essentials. 2d ed. Baltimore: Williams & Wilkins; 1988.
29. Hanley JA. Receiver operating characteristic (ROC) methodology: the state of the art. Crit Rev Diagn Imaging. 1989; 29:307-35.
30. Centor RM, Schwartz JS. An evaluation of methods for estimating the area under the receiver operating characteristic (ROC) curve. Med Decis Making. 1985; 5:149-56.
31. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993; 12: 1293-316.
32. Albert A. On the use and computation of likelihood ratios in clinical chemistry. Clin Chem. 1982; 28:1113-9.
33. Irwig L. Modelling result-specific likelihood ratios (Letter). J. Clin Epidemiol. 1992; 45:1335-8.
34. Tosteson AN, Begg CB. A general regression methodology for ROC curve estimation. Med Decis Making. 1988; 8:204-15.
35. Hunink MG, Richardson DK, Doubilet PM, Begg CB. Testing for pulmonary maturity: ROC analysis involving covariates, verification bias, and combination testing. Med Decis Making. 1990; 10:201-11.
36. McClish DK. Analyzing a portion of the ROC curve. Med Decis Making. 1989; 9:190-5.
37. McClish DK. Determining a range of false-positive rates for which ROC curves differ. Med Decis Making. 1990; 10:283-7.
38. Wieand S, Gail MH, James BR, James KL. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989; 76:585-92.
39. Guyatt GH, Oxman AD, Ali M, Willan A, McIlroy W, Patterson C. Laboratory diagnosis of iron-deficiency anemia: an overview. J Gen Intern Med. 1992; 7:145-53.
40. Begg CB. Biases in the assessment of diagnostic tests. Stat Med. 1987; 6:411-23.
41. Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics. 1983; 39:207-15.
42. Gray R, Begg CB, Greenes RA. Construction of receiver operating characteristic curves when disease verification is subject to selection bias. Med Decis Making. 1984; 4:151-64.
43. Mulrow CD, Linn WD, Gaul MK, Pugh JA. Assessing quality of a diagnostic test evaluation. J Gen Intern Med. 1989; 4:288-95.
44. Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in comparisons of therapy. I: Medical. Stat Med. 1989; 8:441-54.
45. Miller JN, Colditz GA, Mosteller F. How study design affects outcomes in comparisons of therapy. II: Surgical. Stat Med. 1989; 8: 455-66.
46. Emerson JD, Burdick E, Hoaglin DC, Mosteller F, Chalmers TC. An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials. Control Clin Trials. 1990; 11:339-52.
47. Bernelot Moens HJ, Hirshberg AJ, Claessens AA. Data-source effects on the sensitivities and specificities of clinical features in the diagnosis of rheumatoid arthritis: the relevance of multiple sources of knowledge for a decision-support system. Med Decis Making. 1992; 12:250-8.
48. Lachs MS, Nachamkin I, Edelstein PH, Goldman J, Feinstein AR, Schwartz JS. Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection. Ann Intern Med. 1992; 117:135-40.
49. Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med. 1978; 299: 926-30.
50. Hlatky MA, Pryor DB, Harrell FE Jr, Califf RM, Mark DB, Rosati RA. Factors affecting sensitivity and specificity of exercise electrocardiography. Am J Med. 1984; 77:64-71.
51. Ng PC, Dear PR. The predictive value of a normal ultrasound scan in the preterm babya meta-analysis. Acta Paediatr Scand. 1990; 79:286-91.
52. Banales JL, Pineda PR, Fitzgerald JM, Rubio H, Selman M, Salazar-Lezama M. Adenosine deaminase in the diagnosis of tuberculous pleural effusions. A report of 218 patients and review of the literature. Chest. 1991; 99:355-7.
53. Rosen Y, Rosenblatt P, Saltzman E. Intraoperative pathologic diagnosis of thyroid neoplasms. Report on experience with 504 specimens. Cancer. 1990; 66:2001-6.
54. Laird NM, Mosteller F. Some statistical methods for combining experimental results. Int J Technol Assess Health Care. 1990; 6:5-30.
55. Berlin JA, Laird NM, Sacks HS, Chalmers TC. A comparison of statistical methods for combining event rates from clinical trials. Stat Med. 1989; 8:141-51.
56. Detrano R, Janosi A, Lyons KP, Marcondes G, Abbassi N, Froelicher VF. Factors affecting sensitivity and specificity of a diagnostic test: the exercise thallium scintigram. Am J Med. 1988; 84:699-710.
57. Oxman AD, Guyatt GH. A consumer's guide to subgroup analyses. Ann Intern Med. 1992; 116:78-84.
This article has been cited by other articles:
![]() |
M. P. Astin, M. G. Brazzelli, C. M. Fraser, C. E. Counsell, G. Needham, and J. M. Grimshaw Developing a Sensitive Search Strategy in MEDLINE to Retrieve Studies on Assessment of the Diagnostic Performance of Imaging Techniques Radiology, May 1, 2008; 247(2): 365 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Selman, C. Mann MD, J. Zamora PhD, T.-L. Appleyard MBBS, and K. Khan MSc Diagnostic accuracy of tests for lymph node status in primary cervical cancer: a systematic review and meta-analysis Can. Med. Assoc. J., March 25, 2008; 178(7): 855 - 862. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.E.M. Verhagen, D.J. Hendriks, L.F.J.M.M. Bancsi, B.W.J. Mol, and F.J.M. Broekmans The accuracy of multivariate models predicting ovarian reserve and pregnancy after in vitro fertilization: a meta-analysis Hum. Reprod. Update, March 1, 2008; 14(2): 95 - 100. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. H. Heijenbrok-Kal, M. C. J. M. Kock, and M. G. M. Hunink Lower Extremity Arterial Disease: Multidetector CT Angiography Meta-Analysis Radiology, November 1, 2007; 245(2): 433 - 439. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Vanhoenacker, M. H. Heijenbrok-Kal, R. Van Heste, I. Decramer, L. R. Van Hoe, W. Wijns, and M. G. M. Hunink Diagnostic Performance of Multidetector CT Angiography for Assessment of Coronary Artery Disease: Meta-analysis Radiology, August 1, 2007; 244(2): 419 - 428. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Handler, R. L. Altman, S. Perera, J. T. Hanlon, S. A. Studenski, J. E. Bost, M. I. Saul, and D. B. Fridsma A Systematic Review of the Performance Characteristics of Clinical Event Monitor Signals Used to Detect Adverse Drug Events in the Hospital Setting J. Am. Med. Inform. Assoc., July 1, 2007; 14(4): 451 - 458. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Nishimura, D. Sugiyama, Y. Kogata, G. Tsuji, T. Nakazawa, S. Kawano, K. Saigo, A. Morinobu, M. Koshiba, K. M. Kuntz, et al. Meta-analysis: Diagnostic Accuracy of Anti-Cyclic Citrullinated Peptide Antibody and Rheumatoid Factor for Rheumatoid Arthritis Ann Intern Med, June 5, 2007; 146(11): 797 - 808. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Thangaratinam, J. Daniels, A. K Ewer, J. Zamora, and K. S Khan Accuracy of pulse oximetry in screening for congenital heart disease in asymptomatic newborns: a systematic review Arch. Dis. Child. Fetal Neonatal Ed., May 1, 2007; 92(3): F176 - F180. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Jiang, H.-Z. Shi, Q.-L. Liang, S.-M. Qin, and X.-J. Qin Diagnostic Value of Interferon-{gamma} in Tuberculous Pleurisy: A Metaanalysis Chest, April 1, 2007; 131(4): 1133 - 1141. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Halligan and D. G. Altman Evidence-based Practice in Radiology: Steps 3 and 4--Appraise and Apply Systematic Reviews and Meta-Analyses Radiology, April 1, 2007; 243(1): 13 - 27. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Jones, T. Athanasiou, N. Dunne, J. Kirby, O. Aziz, A. Haq, C. Rao, V. Constantinides, S. Purkayastha, and A. Darzi Multi-Detector Computed Tomography in Coronary Artery Bypass Graft Assessment: A Meta-Analysis Ann. Thorac. Surg., January 1, 2007; 83(1): 341 - 348. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.J. Broekmans, J. Kwee, D.J. Hendriks, B.W. Mol, and C.B. Lambalk A systematic review of tests predicting ovarian reserve and IVF outcome Hum. Reprod. Update, November 1, 2006; 12(6): 685 - 718. [Abstract] [Full Text] [PDF] |
||||
![]() |
S Hoffmann and T Hartung Toward an evidence-based toxicology Human and Experimental Toxicology, September 1, 2006; 25(9): 497 - 513. [Abstract] [PDF] |
||||
![]() |
S. Mallett, J. J Deeks, S. Halligan, S. Hopewell, V. Cornelius, and D. G Altman Systematic reviews of diagnostic tests in cancer: review of methods and reporting BMJ, August 26, 2006; 333(7565): 413. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Gatsonis and P. Paliwal Meta-analysis of diagnostic and screening test accuracy evaluations: methodologic primer. Am. J. Roentgenol., August 1, 2006; 187(2): 271 - 281. [Abstract] [Full Text] [PDF] |