Annals
Established in 1927 by the American College of Physicians
:
Advanced search
 
box Article
 arrow  Table of Contents                
space
 arrow  Abstract of this article Free
space
 arrow  PDF of this article
(PDFs free after 6 months)
space
 arrow  Figures/Tables List
space
 arrow  Appendix Table 1
space
 arrow  Articles citing this article
space
box Services
 arrow  Send comment/rapid response letter
space
 arrow  Notify a friend about this article
space
 arrow  Alert me when this article is cited
space
 arrow  Add to Personal Archive
space
 arrow  Download to Citation Manager
space
 arrow  ACP Search                        
space
 arrow  Get Permissions
space
box Google Scholar
 arrow  Search for Related Content
space
box PubMed
Articles in PubMed by Author:
  arrow  Gould, M. K.
space
  arrow  Owens, D. K.
space
 arrow  Related Articles in PubMed
space
 arrow  PubMed Citation
space
 arrow  PubMed
space

ACADEMIA AND CLINIC

COMMON DIAGNOSTIC TESTS

Series Editors: Alan Garber, MD, PhD, and Harold Sox, MD

Cost-Effectiveness of Alternative Management Strategies for Patients with Solitary Pulmonary Nodules

right arrow Michael K. Gould, MD, MS; Gillian D. Sanders, PhD; Paul G. Barnett, PhD; Chara E. Rydzak, BA; Courtney C. Maclean, BA; Mark B. McClellan, MD, PhD; and Douglas K. Owens, MD, MS

6 May 2003 | Volume 138 Issue 9 | Pages 724-735

Background: Positron emission tomography (PET) with 18-fluorodeoxyglucose (FDG) is a potentially useful but expensive test to diagnose solitary pulmonary nodules.

Objective: To evaluate the cost-effectiveness of strategies for pulmonary nodule diagnosis and to specifically compare strategies that did and did not include FDG-PET.

Design: Decision model.

Data Sources: Accuracy and complications of diagnostic tests were estimated by using meta-analysis and literature review. Modeled survival was based on data from a large tumor registry. Cost estimates were derived from Medicare reimbursement and other sources.

Target Population: All adult patients with a new, noncalcified pulmonary nodule seen on chest radiograph.

Time Horizon: Patient lifetime.

Perspective: Societal.

Intervention: 40 clinically plausible combinations of 5 diagnostic interventions, including computed tomography, FDG-PET, transthoracic needle biopsy, surgery, and watchful waiting.

Outcome Measures: Costs, quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios.

Results of Base-Case Analysis: The cost-effectiveness of strategies depended critically on the pretest probability of malignancy. For patients with low pretest probability (26%), strategies that used FDG-PET selectively when computed tomography results were possibly malignant cost as little as $20 000 per QALY gained. For patients with high pretest probability (79%), strategies that used FDG-PET selectively when computed tomography results were benign cost as little as $16 000 per QALY gained. For patients with intermediate pretest probability (55%), FDG-PET strategies cost more than $220 000 per QALY gained because they were more costly but only marginally more effective than computed tomography-based strategies.

Results of Sensitivity Analysis: The choice of strategy also depended on the risk for surgical complications, the probability of nondiagnostic needle biopsy, the sensitivity of computed tomography, and patient preferences for time spent in watchful waiting. In probabilistic sensitivity analysis, FDG-PET strategies were cost saving or cost less than $100 000 per QALY gained in 76.7%, 24.4%, and 99.9% of computer simulations for patients with low, intermediate, and high pretest probability, respectively.

Conclusions: FDG-PET should be used selectively when pretest probability and computed tomography findings are discordant or in patients with intermediate pretest probability who are at high risk for surgical complications. In most other circumstances, computed tomography-based strategies result in similar quality-adjusted life-years and lower costs.


The solitary pulmonary nodule is a single, well-circumscribed, spherical radiographic opacity that measures less than 3 to 4 cm in diameter and is surrounded completely by aerated lung (1). There is no associated atelectasis, hilar enlargement, or pleural effusion. Most pulmonary nodules are discovered incidentally on chest radiographs, and 15% to 75% of such nodules are malignant, depending on the population studied (2, 3). Patients with pulmonary nodules and their physicians confront difficult decisions about the risks and rewards of different management strategies. When present, malignancy must be promptly identified to permit timely resection.

Pulmonary nodule evaluation typically begins with imaging studies. Computed tomography (CT) localizes the nodule within the lung parenchyma, and CT density characteristics sometimes indicate occult calcification that suggests a benign cause (4). Other CT findings, such as spiculation, are strongly associated with malignancy (5). Positron emission tomography (PET) with the glucose analogue 18-fluorodeoxyglucose (FDG) identifies malignant tumors on the basis of their increased metabolic rate. The use of FDG-PET is rapidly gaining acceptance in clinical oncology to diagnose tumors, stage disease, and evaluate treatment response (6, 7). Because FDG-PET is believed to be highly sensitive for identifying malignant nodules, proponents argue that observation with serial chest radiographs is safe when PET results are negative (8).

Management alternatives for patients with pulmonary nodules include surgical resection, transthoracic needle biopsy, and watchful waiting (9). Surgery is the diagnostic gold standard and the definitive treatment for malignant nodules that are resectable, but surgery should be avoided in patients with benign nodules. Needle biopsy often establishes a specific malignant or benign diagnosis, but biopsy is invasive, potentially risky, and frequently nondiagnostic. Watchful waiting avoids unnecessary surgery for benign nodules but may delay diagnosis and treatment of malignant nodules.

We developed a decision analytic model to identify the most effective approaches to diagnose and manage solitary pulmonary nodules. We performed a cost-effectiveness analysis to quantify the health effects and economic costs associated with various management strategies. Because of the considerable recent interest in the use of FDG-PET, we specifically compared diagnostic strategies that used FDG-PET with strategies that did not include this potentially useful but expensive test.


Methods
space
up arrowTop
dotMethods
down arrowResults
down arrowDiscussion
down arrowAuthor & Article Info
down arrowReferences

We performed a cost-effectiveness analysis by following the recommendations of the Panel on Cost-Effectiveness in Health and Medicine for conducting and reporting a reference-case analysis (10). The analysis adopted a societal perspective that would permit comparisons across different health care interventions. We expressed our results in terms of costs, quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios. All costs and health effects were discounted at an annual rate of 3%. Additional details on our methods, data sources, and results can be found in the Appendix. An electronic decision aid that is based on our results will be available at http://www.annals.org in July 2003.

Clinical Problem

The target population for this analysis was all adult patients with a new, noncalcified solitary pulmonary nodule on chest radiograph and no known extra-thoracic malignancy. Our base-case analysis considered a hypothetical cohort of 62-year-old men and women.

Decision Model Structure and Assumptions

We considered 40 clinically plausible sequences of five diagnostic interventions: CT, FDG-PET, transthoracic needle biopsy, surgery, and watchful waiting (Appendix Figures 1 and 2. We assumed that CT and FDG-PET were never performed after needle biopsy or surgery because performing an imaging test after an invasive diagnostic procedure is unusual. Similarly, needle biopsy and observation were never performed after surgery.



View larger version (31K):
[in this window]
[in a new window]
 
Appendix (Figure 1). Decision model. The square decision node (A) indicates that computed tomography (CT), observation (watchful waiting), surgery, positron emission tomography with 18-fluorodeoxyglucose (FDG-PET), or transthoracic needle biopsy may be selected as the initial diagnostic test. If CT is selected first, the result may be possibly malignant or benign. Observation, surgery, biopsy, or FDG-PET may be the next diagnostic test, depending on the results of CT (B). If FDG-PET is selected as the first test, results may be positive or negative; CT, observation, surgery, or biopsy may be selected as the next test, depending on results (C). Observation, surgery, or biopsy may be selected after both CT and FDG-PET have been performed (D).

 


View larger version (26K):
[in this window]
[in a new window]
 
Appendix (Figure 2). Decision model subtrees. Needle biopsy may result in fatal or nonfatal complications or no complications (biopsy subtree). If no fatal complications occur, the biopsy may be diagnostic or nondiagnostic, depending on whether it yields a specific malignant or benign diagnosis. If the biopsy reveals malignancy, we assumed that surgery would be performed. If the biopsy reveals a specific benign diagnosis, we assumed that the patient would be treated accordingly and monitored with serial chest radiographs. After a nondiagnostic biopsy, surgery or observation may be selected as the next diagnostic option. Surgery may result in fatal or nonfatal complications, or no complication (surgery subtree). At surgery, most malignant nodules will be local-stage lung cancer, but metastases to regional lymph nodes may be detected in some cases. Some nodules will be benign, depending on the prevalence of benign disease in the target population.

 
Needle biopsy was considered to be nondiagnostic unless a specific benign or malignant diagnosis was obtained. We assumed that surgery would be performed if the biopsy revealed malignancy. If the biopsy revealed a specific benign diagnosis, the patient would be managed accordingly. After a nondiagnostic needle biopsy, either surgery or watchful waiting could be selected as the next diagnostic intervention.

A final diagnosis was established at the time of surgery or, alternatively, after 24 months of observation. In the watchful waiting strategy, serial chest radiographs were obtained at 1, 2, 4, and 6 months and every 3 months thereafter. We assumed that surgery would be performed if nodule growth was observed at any time. If no growth was observed after 24 months, we assumed that the nodule was benign.

Modeling Long-Term Costs and Clinical Outcomes

We developed a Markov model to estimate long-term outcomes and costs for patients with malignant and benign pulmonary nodules (Appendix Figure 3). The model followed patients in the hypothetical study cohort over their remaining life span. We estimated the monthly probability of cancer recurrence after surgical treatment for patients with malignant nodules by using survival data from the Surveillance, Epidemiology and End Results (SEER) tumor registry (11). We used SEER data and a model of the natural history of untreated lung cancer to estimate the probability of disease progression in patients with malignant nodules who were managed by watchful waiting.



View larger version (35K):
[in this window]
[in a new window]
 
Appendix (Figure 3). Markov model.

 
Data and Assumptions

We derived estimates for patient and nodule characteristics, diagnostic testing variables, costs and utilities from clinical and administrative sources (Appendix Table 1).

Patient and Nodule Characteristics

In the base-case analysis, we assumed that the pulmonary nodule measured 2 cm in diameter. We assumed that 12.5% of patients with malignant nodules would have regional lymph node involvement, the median prevalence of mediastinal metastases in eight studies of CT for staging in patients with T1 tumors (12-19). We derived the distribution of tumor growth rates for patients with malignant nodules from the Veterans Administration–Armed Forces Cooperative Study on Asymptomatic Pulmonary Nodules (20).

Pretest Probability

We performed separate analyses for representative patients with low (26%), intermediate (55%), and high (79%) pretest probabilities of malignancy. Although most clinicians assess this intuitively, investigators have developed quantitative models to estimate the probability of cancer. One model that has undergone preliminary validation used logistic regression to identify six independent predictors of malignancy: age, smoking status, history of cancer, nodule diameter, spiculation, and upper lobe location (21). Additional information about this model, including the prediction equation, can be found in the Appendix.

Diagnostic Test Performance

We performed a meta-analysis to estimate the diagnostic accuracy of FDG-PET; our methods and results have been published elsewhere (22). We identified 13 studies of FDG-PET that enrolled 450 patients with pulmonary nodules (23-35). We used the method of Moses and colleagues (36, 37) to construct a summary receiver-operating characteristic (ROC) curve for FDG-PET. For our base-case estimates, we selected an operating point on the ROC curve that corresponded to the median specificity of FDG-PET in the 13 studies. At this point on the curve, sensitivity and specificity for identifying malignancy were 94.2% and 83.3%, respectively.

To estimate diagnostic performance for CT and transthoracic needle biopsy, we searched MEDLINE and applied Moses and colleagues' method to construct summary ROC curves for these tests. For CT, base-case estimates of sensitivity and specificity for identifying malignancy were 96.5% and 55.8%, respectively (2-4, 38-47). In this report, the terms "possibly malignant" and "benign" describe CT results that are positive and negative for malignancy, respectively.

We estimated that CT-guided needle biopsy would not reveal a specific diagnosis in 8% of patients with malignant nodules and 44% of patients with benign nodules (27, 48-55). When fluoroscopic guidance was used, we assumed that the frequency of nondiagnostic biopsy results would be 10% higher (56-58). When needle biopsy revealed a specific benign or malignant diagnosis, we estimated that the false-negative and false-positive rates were 3.7% and 2.0%, respectively (27, 48-55). Base-case estimates of the probability of minor pneumothorax and major pneumothorax requiring chest tube drainage were 24% and 5%, respectively (27, 48-55).

We assumed that video-assisted thoracoscopy would be used to perform surgical biopsy and that the procedure would be converted to a thoracotomy with lobectomy if the frozen section revealed malignancy. We derived estimates for probabilities of fatal and non-fatal surgical complications from sources in the clinical literature (59-67).

Costs

We converted all costs to 2001 U.S. dollars by using the gross domestic product deflator (68, 69). To derive costs for imaging tests and needle biopsy, we added procedure costs and professional fees that were based on Medicare reimbursement rates (70, 71). To estimate costs for surgical procedures and complications, we added professional fees (70) and median cost-adjusted charges from the 1996 Health Care Utilization Project database (72). To estimate long-term costs for patients with local, regional, and distant-stage lung cancer, we analyzed Medicare claims files linked with data from the SEER tumor registry for the years 1990 to 1993 (73). To estimate health care costs for patients with benign nodules and for patients with malignant nodules who survived more than 5 years after diagnosis, we used age-specific, annual health care expenditures from the Consumer Expenditures Study (74).

Utilities

We adjusted life expectancy for quality of life by using age- and sex-specific utilities (preference-based weights for health states) from the Beaver Dam Health Outcomes study (75) and available data to estimate reductions in utility associated with regional and distant-stage lung cancer (76). We also adjusted life expectancy for time spent in the hospital and time spent having diagnostic procedures. When possible, we used data on average length of hospital stay to make these adjustments (71). Because we could not identify studies that measured utilities in patients with undiagnosed pulmonary nodules, we assumed that the relative utility for time spent during observation was normal and we used age- and sex-specific values. To account for the possibility that some patients might be uncomfortable not knowing whether a nodule was benign or malignant, we tested lower values in a sensitivity analysis.

Sensitivity Analysis

One-way, multiway, and probabilistic sensitivity analyses were performed to identify important model uncertainties. When possible, ranges for variables were based on reported or calculated 95% CIs for means and interquartile ranges for medians. For diagnostic accuracy, several points on summary ROC curves and their 95% confidence intervals were evaluated. Clinical judgment was used to determine ranges for utilities. For costs, ranges were determined by adding or subtracting 25% from the base-case estimate. To determine ranges for transition probabilities in the Markov model, 50% was added or subtracted from the base-case value because these estimates were highly uncertain.

We performed probabilistic sensitivity analysis by stratifying patients according to pretest probability and risk for surgical complications. We assigned logit-normal distributions to all costs and probabilities for all diagnostic test variables by using the method of Doubilet and colleagues (77) and performed 10 000 simulations by randomly sampling values from these distributions. We then recorded the number of simulations in which the strategy under consideration was cost saving (more effective and less expensive than the alternative) or economically attractive (more effective and with an incremental cost < $100 000 per QALY gained). For description of the software used in our analysis, see the Appendix.

Role of the Funding Source

The funding source had no role in the design, conduct, or reporting of the study or in the decision to publish the manuscript.


Results
space
up arrowTop
up arrowMethods
dotResults
down arrowDiscussion
down arrowAuthor & Article Info
down arrowReferences

Because of the complexity of the analysis, we begin by summarizing our major findings. First, we found that the effectiveness and cost-effectiveness of management strategies depended critically on the pretest probability of malignancy and, to a lesser extent, the risk for surgical complications. Second, we found that CT was recommended as the initial test in nearly all circumstances, except when pretest probability was extremely high. Third, while nonselective use of FDG-PET was highly effective for pulmonary nodule diagnosis, we found that it was most cost-effective to use FDG-PET selectively, typically when pretest probability and CT results were discordant. Finally, we found that it was both highly effective and highly cost-effective to use surgery and needle biopsy aggressively once the results of imaging tests were known.

In patients with low pretest probability (26%), watchful waiting was the least effective and least expensive strategy (Table 1). A strategy that used CT but not FDG-PET was much more effective and cost less than $11 000 per QALY gained relative to watchful waiting. However, two strategies that used FDG-PET selectively were even more effective and cost less than $50 000 per QALY gained. In both of these strategies, CT was performed as the initial test and FDG-PET was used when CT results were possibly malignant. Surgery was recommended when the results of FDG-PET were positive, and needle biopsy was recommended when FDG-PET results were negative. Another strategy that used CT and FDG-PET nonselectively in all patients was most effective, but it cost almost $300 000 per QALY gained.


View this table:
[in this window]
[in a new window]
 
Table 1. Expected Costs, Quality-Adjusted Life-Years, and Incremental Cost-Effectiveness Ratios for Nondominated Strategies in Patients with Low, Intermediate, and High Pretest Probability of Malignancy

 

In patients with intermediate pretest probability (55%), watchful waiting was the least effective and least expensive approach (Table 1). Three strategies that used CT without FDG-PET cost less than $20 000 per QALY gained. In the most effective of these strategies, surgery was performed when CT results were possibly malignant and needle biopsy was performed when CT results were benign. Two strategies that included FDG-PET were more expensive and only marginally more effective than CT-based approaches and therefore cost more than $220 000 per QALY gained.

In patients with high pretest probability (79%), watchful waiting was again the least effective and least expensive approach (Table 1). A strategy that used CT but not FDG-PET was much more effective and cost less than $7000 per QALY gained relative to watchful waiting. Three strategies that used FDG-PET selectively were even more effective and cost less than $70 000 per QALY gained. In all three strategies, surgery was recommended when CT results were possibly malignant and FDG-PET was recommended when CT results were benign. The most effective of these strategies used surgery when FDG-PET results were positive and needle biopsy when FDG-PET results were negative.

Sensitivity Analysis

Figure 1 shows the importance of pretest probability in greater detail. Use of FDG-PET cost less than $100 000 per QALY gained when pretest probability was low (10% to 50%) and CT results were possibly malignant or when pretest probability was high (77% to 89%) and CT results were benign. Both of these situations resulted in intermediate post-test probabilities (20% to 69%). Surgery was favored at higher post-test probabilities (≥ 70%), biopsy was preferred at lower post-test probabilities (2% to 20%), and watchful waiting was preferred only at very low post-test probabilities (<2%). Surgery without any preliminary diagnostic testing was preferred when pretest probability was at least 90%. In patients at high risk for surgical complications, FDG-PET strategies cost less than $100 000 per QALY gained when post-test probability ranged between 35% and 84%.



View larger version (25K):
[in this window]
[in a new window]
 
Figure 1. Recommended sequence of diagnostic testing in patients who are at average risk for surgical complications, according to pretest probability and the results of computed tomography (CT). The recommended sequence of tests when CT results are possibly malignant (top) and when CT results are benign (bottom) is shown. Subsequent test selection is shown to be a function of pretest probability and the corresponding post-test probability once the results of CT are known. Note that surgery is preferred when positron emission tomography (PET) results are positive, biopsy is preferred when PET results are negative, and watchful waiting is preferred when biopsy results are nondiagnostic. Recommendations are based on the assumption that society is willing to pay $100 000 per quality-adjusted life-year gained. Results were very similar when willingness to pay was assumed to be $25 000 or $50 000 per quality-adjusted life-year gained. FDG-PET = positron emission tomography with 18-fluorodeoxyglucose.

 

Several other variables affected the choice of strategy for patients with intermediate pretest probability, including the sensitivity (but not the specificity) of CT, the probability of nondiagnostic needle biopsy in patients with malignant nodules, and patient preferences for time spent under observation. Uncertainty exists about the sensitivity of CT, in part because there are no widely accepted CT criteria for determining whether a nodule is possibly malignant or benign. When we assumed that the true sensitivity of CT was less than 92.5% (compared with the base-case value of 96.5%), a strategy that used FDG-PET selectively cost less than $70 000 per QALY gained. In this strategy, FDG-PET was used when CT results were benign and surgery was recommended when CT results were possibly malignant.

There is also uncertainty about the diagnostic yield of needle biopsy (because this depends on operator experience) and patient preferences for time spent in observation. Selective use of FDG-PET (when CT results were benign) cost less than $25 000 per QALY gained when we assumed that the probability of nondiagnostic needle biopsy in patients with malignant nodules was 19% (base-case value of 8%), or when the relative utility of the time spent in observation was assumed to be 0.97 or less (base-case value of 1.00). A relative utility of 0.97 implies that an individual would accept a 3% risk for instant death in order to know whether the nodule was malignant or benign.

The choice of strategy was not affected by varying other model parameters within the ranges tested, including the discount rate, the diagnostic accuracy of FDG-PET, or the cost of diagnostic tests (including FDG-PET). Similarly, the choice of strategy was not affected when we assumed that FDG-PET was 25% less sensitive and 16% more specific when CT results were benign, as has been observed in studies of FDG-PET and CT for mediastinal staging in patients with non–small-cell lung cancer (78).

Probabilistic sensitivity analysis showed that in patients with low and high pretest probability, FDG-PET strategies were cost saving or cost less than $100 000 per QALY gained in 76.7% and 99.9% of all simulations, respectively. For patients with intermediate pretest probability, FDG-PET strategies were cost saving or economically attractive in fewer than 25% of all simulations.

Clinical Recommendations

Figure 2 outlines a clinical algorithm for managing patients with new, noncalcified pulmonary nodules that is based on our results. In patients with low pretest probability (10% to 50%), FDG-PET should be used selectively when CT results are possibly malignant. When FDG-PET results are positive, surgery is both highly cost-effective and slightly more effective than needle biopsy. When FDG-PET results are negative, needle biopsy is more effective than observation. This is because, although uncommon, false-negative results of FDG-PET have potentially serious consequences (such as delayed diagnosis and missed opportunities for curative surgery). When CT results are benign, observation or needle biopsy should be used. Our analysis suggests that the latter approach is slightly more effective.



View larger version (18K):
[in this window]
[in a new window]
 
Figure 2. Suggested algorithm for clinical management of patients with solitary pulmonary nodules who are at average risk for surgical complications. The algorithm pertains to patients with low (10% to 50%), intermediate (51% to 76%), and high (77% to 90%) pretest probability of malignancy. Note that in patients with very low pretest probability (<10%), biopsy is preferred when computed tomography (CT) results are possibly malignant and watchful waiting is preferred when CT results suggest a benign diagnosis. In patients with very high pretest probability (>90%), surgery without diagnostic testing is the preferred strategy. FDG-PET = positron emission tomography with 18-fluorodeoxyglucose.

 

For patients with intermediate pretest probability (51% to 76%), we recommend surgery or needle biopsy when CT results are possibly malignant and needle biopsy or observation when CT results are benign (Figure 2). More aggressive use of surgery and needle biopsy results in slightly better health outcomes and slightly higher costs. The choice between more or less aggressive approaches should depend on factors such as the risk for surgical complications, the expected yield of needle biopsy, and patient preferences. For example, in patients who have severe comorbid conditions that increase the risk of surgery, it may be preferable to establish a malignant diagnosis with needle biopsy before sending the patient to surgery.

For patients with high pretest probability (77% to 90%), we recommend surgery when CT results are possibly malignant, unless the patient is at very high risk for operative complications (Figure 2). Patients should undergo FDG-PET when CT results are benign. When FDG-PET results are positive, surgery should be performed. When FDG-PET results are negative, needle biopsy is marginally more effective than watchful waiting, although some clinicians might prefer observation in this situation.


Discussion
space
up arrowTop
up arrowMethods
up arrowResults
dotDiscussion
down arrowAuthor & Article Info
down arrowReferences

Pulmonary nodule diagnosis is challenging because the clinician and patient must consider many factors when discussing management options, including the risks and benefits of several possible diagnostic tests, patient preferences for invasive and noninvasive procedures, and the uncertain consequences of delayed diagnosis when watchful waiting is used as a management strategy. In this analysis, we used quantitative methods to synthesize these and other factors.

Pulmonary nodule diagnosis should always begin with a careful review of the chest radiograph and comparison with previous radiographs. Most experts agree that if a central, laminated, diffuse, or popcorn pattern of calcification is seen, a benign diagnosis is likely and observation with serial radiographs is appropriate (1, 79). Likewise, because doubling times for malignant nodules rarely exceed 700 days, 2-year radiographic stability strongly implies a benign cause. In the absence of benign calcification or documented radiographic stability, the clinician should estimate the pretest probability of malignancy. Once these steps have been taken, our findings can be used to guide subsequent management.

We confirmed that CT should be the initial test in the management of nearly all patients with pulmonary nodules. Computed tomography is inexpensive and noninvasive and may be highly specific for identifying some benign nodules. In contrast, FDG-PET as the initial test is never economically attractive. We also found that the choice of subsequent tests depends most critically on the pretest probability of malignancy and the risk for surgical complications. Other potentially important factors include the sensitivity of CT, the probability of nondiagnostic needle biopsy, and patient preferences for time spent in watchful waiting.

Although FDG-PET strategies were highly effective over a wide range of pretest probabilities, these strategies were not necessarily cost-effective. Our results indicate that FDG-PET should be used selectively when pretest probability and CT results are discordant; in these cases, post-test probability will be intermediate. Selective use of FDG-PET limits costs by reducing the total number of FDG-PET studies performed and ensures that FDG-PET is used when the diagnosis is most in doubt. Table 2 summarizes specific recommendations on when to use CT, FDG-PET, watchful waiting, needle biopsy, and surgery.


View this table:
[in this window]
[in a new window]
 
Table 2. Recommendations on the Use of Computed Tomography, Positron Emission Tomography with 18-Fluorodeoxyglucose, Watchful Waiting, Transthoracic Needle Biopsy, and Surgery

 

Our results support and extend the findings of other studies. In a decision analysis that did not consider costs, Cummings and colleagues (9) found that the choice of strategy depended on the pretest probability of malignancy. Watchful waiting was preferred over biopsy when the probability of cancer was less than 3%, and surgery was preferred over biopsy when the probability of cancer was greater than 68%. Our threshold values for watchful waiting and surgery were very similar to theirs, despite the fact that they made less pessimistic assumptions about the consequences of delayed diagnosis in patients with malignant nodules who were managed by observation. We extend their work by demonstrating that FDG-PET replaces needle biopsy when the post-test probability of malignancy ranges from 20% to 69%. In a recent cost-effectiveness analysis based on reimbursement rates in Germany, Dietlein and colleagues (80) reported that their threshold for performing surgery instead of FDG-PET occurred at a similar probability of 75%. In another cost-effectiveness analysis, results for patients with low pretest probability were similar to ours, although we recommend strategies in patients with intermediate and high pretest probability that these authors did not consider (81).

Our analysis has several limitations. First, the natural history of untreated malignant pulmonary nodules is not known. We modeled the consequences of delayed diagnosis when watchful waiting was used in patients with malignant nodules. Few empirical data exist to validate our model. However, our results were very similar when we adopted Cummings and colleagues' less pessimistic assumptions about the consequences of delayed diagnosis.

Second, our base-case analysis assumed that test performance was conditionally independent, or that the sensitivity and specificity of FDG-PET did not differ depending on the results of CT. Although FDG-PET and CT identify malignant nodules by different mechanisms, their results might be correlated. For example, our group and others have observed that, when used for mediastinal staging in patients with non–small-cell lung cancer, FDG-PET is less sensitive and more specific when CT findings are negative (78, 82). We are not aware of any data on the conditional performance of these tests for pulmonary nodule diagnosis. In fact, most studies of FDG-PET limited enrollment to participants with possibly malignant findings on CT. Thus, our base-case estimates of sensitivity and specificity for FDG-PET best reflect its performance when CT results are possibly malignant, and concerns about conditional test performance do not compromise our recommendation to use FDG-PET selectively when pretest probability is low and CT results are possibly malignant. Still, if FDG-PET is less sensitive and more specific than our base-case estimates when CT results are benign, this could raise doubts about our recommendation to use FDG-PET when pretest probability is high. However, sensitivity analysis showed that our results did not change when we assumed that the sensitivity of FDG-PET was as low as 50% in patients with benign results on CT. Improving the specificity of FDG-PET would only strengthen the recommendation to use it in this setting.

Our analysis did not consider several other potential benefits of FDG-PET imaging. Use of FDG-PET is more accurate than CT for detecting regional lymph node metastases (82, 83), which occur in approximately 12% of patients with T1 lung cancer (12-19), and may also detect occult distant metastases (82). Finally, future advances in FDG-PET technology may improve accuracy or reduce costs. Several groups have recently described an FDG imaging technique that does not require a dedicated PET scanner but rather performs coincidence imaging by using a modified dual-detector {gamma} camera (84-87). While this technique is less expensive than FDG imaging with a dedicated PET scanner, its diagnostic accuracy has not been evaluated in large, well-designed studies.

We conclude that patients with new, noncalcified pulmonary nodules should first be classified according to pretest probability and the risk for surgical complications, after which CT should be performed. For patients who are at average risk for surgical complications, FDG-PET should be used selectively when pretest probability and CT results are discordant. For patients at high risk for surgical complications with low or intermediate pretest probability, FDG-PET should be used when CT results are possibly malignant. In most other circumstances, CT-based strategies result in similar quality-adjusted life expectancy and lower costs.


Appendix
space

In this appendix, we describe in greater detail the methods and results of our analysis of alternative management strategies for patients with solitary pulmonary nodules. Readers should consult the print version of the manuscript for background information, results of the base-case analysis, results of one-way sensitivity analysis, and a discussion of the results. We focus on describing the assumptions of the Markov model that we used to estimate long-term costs and outcomes for patients with malignant pulmonary nodules. In addition, we provide detailed information regarding studies of diagnostic test performance and a critique of their methods, as well as more information regarding our sources of data for cost and utility estimates. Finally, we present selected results that did not appear in the print version of the manuscript because of space limitations, including complete results of probabilistic sensitivity analysis.

Methods

The target population for this analysis was all adult patients found to have a new, noncalcified solitary pulmonary nodule on chest radiograph and no known extra-thoracic malignancy. We assumed that there was no absolute contraindication to invasive biopsy or surgery, because patients with such contraindications would not be likely to undergo aggressive diagnostic evaluation.

Decision Model Structure and Assumptions

Appendix Figures 1 and 2 illustrate the structure of the decision model. The model compared 40 clinically plausible sequences of five diagnostic interventions: CT, FDG-PET, transthoracic needle biopsy, surgery, and watchful waiting. Appendix Table 2 is a complete list of strategies. We evaluated all plausible sequences of diagnostic tests to compare strategies with the next most effective alternative when making cost-effectiveness comparisons. Comparing an intervention with a suboptimal alternative may result in overestimating the intervention's true cost-effectiveness (10). In addition, strategies that might seem to be counterintuitive often prove to be highly cost-effective under certain conditions.


View this table:
[in this window]
[in a new window]
 
Appendix Table 2. Alternative Strategies for Management of Patients with Solitary Pulmonary Nodules

 

The order of possible test sequences was unconstrained, with two exceptions. First, CT and FDG-PET were never performed after needle biopsy or surgery, because performing an imaging test after an invasive diagnostic procedure would be unusual. Similarly, needle biopsy and observation were never performed after surgery. Computed tomography, FDG-PET, needle biopsy, surgery, or watchful waiting could be selected as the initial diagnostic intervention (Appendix Figure 1).

We assumed that most biopsies were performed under CT guidance, but fluoroscopic guidance was used in test sequences that did not include CT. We considered needle biopsy to be nondiagnostic unless a specific benign or malignant diagnosis was obtained. We assumed that surgery would be performed if the biopsy revealed malignancy. If the biopsy revealed a specific benign diagnosis, we assumed that the patient would be managed accordingly. After a nondiagnostic needle biopsy, either surgery or watchful waiting could be selected as the next diagnostic intervention (Appendix Figure 2).

A final diagnosis was established at the time of surgery or, alternatively, after 24 months of observation. In the observation (watchful waiting) strategy, serial chest radiographs were obtained at 1, 2, 4, and 6 months, and every 3 months thereafter. We assumed that surgery would be performed if nodule growth was detected at any time. If no growth was observed after 24 months, we assumed that the nodule was benign. It is important to note that the optimal timing of serial radiographs has not been determined. However, in our protocol, imaging was used more frequently than the protocol recommended by the Early Lung Cancer Action Project (ELCAP) investigators, who recommended that CT be performed at 3, 6, 12, and 24 months after identification of nodules that measured less than 1 cm in diameter (88). Furthermore, we assumed that chest radiography had a sensitivity of 100% for detecting growth, defined as one doubling in tumor volume or a change in nodule size from 2 cm to 2.5 cm in diameter. We believe that by making this assumption, the modeled performance of chest radiography compares favorably with the actual performance of CT in everyday practice. In addition, we suspect that chest radiography is more widely used than CT in practice settings for watchful waiting in patients with nodules that measure 2 cm in diameter. However, because of the better spatial resolution of CT and the difficulty in detecting growth in small pulmonary nodules, we believe that CT should be used for watchful waiting in patients with nodules that measure less than 1 cm to 1.5 cm in diameter.

Modeling Long-Term Costs and Clinical Outcomes

We developed a state-transition (Markov) model to estimate long-term outcomes and costs for patients with malignant and benign pulmonary nodules (Appendix Figure 3). The model followed individual patients in the hypothetical study cohort over their remaining life span. Individuals were assumed to make transitions from one health state to another over time. Before the time of diagnosis, all patients were considered to be in the "unknown" health state, reflecting the unknown nature of the diagnosis. Within this state, patients with malignant nodules who were managed by watchful waiting were at risk for disease progression from local to regional disease and from regional to distant disease during the observation period. At the time of diagnosis, patients were assumed to transition from the "unknown" state to one of three other health states ("benign," "local," or "regional"), depending on the diagnosis and stage of disease. We assumed that all patients with malignant nodules eventually underwent surgery, although surgery was inevitably delayed in patients whose management strategy included watchful waiting. After surgery, all patients with local-stage malignant disease were at risk for recurrence. Similarly, patients with regional-stage disease were at risk for disease progression. Patients who remained in the "local" and "regional" health states for 5 years after the time of diagnosis were considered to be free of cancer (89). After this time, we assumed that life expectancy was normal for age. We assumed that all patients with distant-stage lung cancer eventually died of their cancer, if they did not die first of some other cause.

Determining Markov Model Transition Probabilities

To estimate the monthly probability of cancer recurrence for patients with malignant pulmonary nodules, we constructed survival curves for 1207 Medicare beneficiaries with surgically treated, local-stage, malignant pulmonary nodules (T1N0M0) from the SEER tumor registry for 1990 to 1993 (11). We assumed that some patients would have recurrent disease and then die of cancer and that the remainder would eventually die of other causes. We estimated the monthly probability of death from recurrent lung cancer by using SEER tumor registry data for 10 835 Medicare beneficiaries with distant-stage disease. We determined the monthly probability of death from other causes by using age-specific values from 1996 U.S. life tables (90). We assumed that the probability of recurrence decreased gradually over time. We then identified monthly probabilities of recurrence that produced a modeled survival curve that most closely approximated observed survival in the SEER cohort. We fit this curve by minimizing the sum of the squared differences between points representing the probability of survival at years 1 through 5. Appendix Figure 4 shows the survival curve for the observed cohort and modeled survival when probabilities of recurrence were set at our base-case values (Appendix Table 1).



View larger version (16K):
[in this window]
[in a new window]
 
Appendix (Figure 4). Observed and modeled survival for patients with local, regional, and distant lung cancer. Survival curves for patients with pathologically staged lung cancer (T1N0M0), pathologically staged regional lung cancer (any T N1–3 M0), and distant lung cancer (any T any N M1) are from the linked Medicare claims–Surveillance, Epidemiology and End Results (SEER) tumor registry. Modeled survival was based on Markov transition probabilities. For patients with local and regional lung cancer, modeled survival closely approximated observed survival.

 

We used an identical procedure to estimate the monthly probability of progression from regional to distant-stage lung cancer. We approximated a survival curve for a cohort that included 1954 Medicare enrollees with pathologically confirmed, regional-stage lung cancer (any T N1–3 M0) from the SEER registry. Appendix Figure 4 shows observed and modeled survival when probabilities of progression from regional to distant disease were set at our base-case values (Appendix Table 1).

Determining the Probability of Disease Progression during Watchful Waiting

We used similar methods to model the monthly probability of disease progression during watchful waiting. We assumed that monthly probabilities for disease progression depended on the doubling time of the nodule, a measure of the tumor growth rate. We used data from the Veterans Administration–Armed Forces Cooperative Study on Asymptomatic Pulmonary Nodules to estimate the distribution of doubling times for malignant nodules (20). The mean doubling time was 5.24 months (median, 4 months).

To predict life expectancy for patients with malignant nodules with different doubling times, we adopted a simple model of the natural history of lung cancer (91-93). The model assumes that a tumor starts as a single cell that measures 10 microns in diameter and doubles in volume at a constant rate. Under these assumptions, a nodule that measures 2 cm in diameter has doubled in volume 33 times. It is further assumed that death occurs, on average, after 40 tumor doublings when the diameter of the tumor measures 10 cm. The model predicts that in the absence of treatment, life expectancy is 36.7 months for a patient with a 2-cm nodule that doubles in volume every 5.24 months. Because empirical data on the natural history of untreated malignant nodules are lacking, we validated this prediction by surveying a group of academic clinicians for their expert opinion. We asked a convenience sample of 21 internists, pulmonary specialists, and thoracic surgeons to estimate the life expectancy of an otherwise healthy, 62-year-old man with a 2-cm malignant pulmonary nodule who declined treatment, assuming that the growth rate of the nodule was average. Estimated mean life expectancy (±SD) was 35.12 ± 16.33 months (median, 32 months), which closely agreed with what the model predicted (Gould MK. Unpublished data).

We used the declining exponential approximation of life expectancy (DEALE) to construct survival curves for patients who had nodules with different doubling times (94, 95). The DEALE assumes that survival is approximated by a simple declining exponential function. Under the assumptions of the DEALE, life expectancy is the reciprocal of the average compound mortality rate. A life expectancy of 36.7 months corresponds to a constant mortality rate of 0.327 per year. Predicted survival (S) at time (t) is given by the formula: S = e rt , where r is the mortality rate, and t is measured in years. For example, when r = 0.327, the 1-year survival rate is 72.1%, and the 5-year survival rate is 19.5%.

To determine the monthly probability of disease progression, we assumed that untreated lung cancer progresses sequentially from local to regional to distant disease and then to death. We assumed that the transition probabilities for local to regional disease and regional to distant disease were equal. We determined the monthly probability of death from distant lung cancer and the probability of death from other causes by using data from the SEER tumor registry and U.S. life tables, respectively. We then identified transition probabilities that produced survival curves that most closely approximated the curves that we obtained by using the natural history model and the assumptions of the DEALE. To fit these curves, we minimized the sum of the squared differences between points representing the probability of survival at years 1 through 5. In the case of a malignant nodule with a doubling time of 5.24 months, we calculated that the monthly probability of progression from local to regional disease during the observation period was 8.4%. Appendix Figure 5 shows the distribution of doubling times for malignant pulmonary nodules and the corresponding transition probabilities that we derived.



View larger version (13K):
[in this window]
[in a new window]
 
Appendix (Figure 5). Distribution of tumor doubling times and corresponding probabilities of disease progression during the observation period. The frequency plot demonstrates the distribution of observed doubling times for 67 pulmonary nodules and mass lesions from the Veterans Administration–Armed Forces Cooperative Study on Asymptomatic Pulmonary Nodules (20). The monthly probability of disease progression (black circles) was assumed to be a function of the tumor doubling time.

 

Other investigators have used a different approach to estimate the negative consequences of delayed diagnosis in patients with malignant nodules who were managed by observation. Cummings and colleagues (9) observed that there was a linear relationship between tumor size and survival after resection of malignant lung tumors. The relationship that they observed was described by the equation r = (0.039 x d)–0.0145, where r is the disease-specific mortality rate (which is assumed to be constant over time) and d is the diameter of the nodule in centimeters. Like us, they assumed that growth would be detected when the nodule doubled once in volume (for example, in a patient with a nodule that measured 2 cm in diameter, growth would be detected when the nodule measured 2.5 cm in diameter). Under these assumptions, they estimated that 5-year survival would be reduced by 5% in patients who were managed by observation at some point in their evaluation. Using the same set of assumptions, Gambhir and colleagues (81) calculated that life expectancy for a 64-year-old man with a 2.5-cm nodule would be reduced by 14%, from 6.62 years to 5.67 years, if diagnosis and treatment were delayed by one doubling time. The main limitation of these previous analyses is that the linear relationship between tumor size and survival was derived from studies in which the definition of pulmonary nodules included lesions that measured up to 6 cm in diameter (96, 97). In one of the studies, almost 40% of patients had pulmonary masses 3.5 cm to 6 cm in diameter (98). Recent evidence suggests that nodule diameter may not predict survival within the subgroup of patients with malignant nodules that measure no more than 3 cm in diameter, who are the focus of our analysis (99).

Data and Assumptions

We derived estimates for patient and nodule characteristics, diagnostic testing variables, costs, and utilities from clinical and administrative data sources (Appendix Table 1). Estimates of the diagnostic accuracy of FDG-PET imaging were obtained from a published meta-analysis, whose methods and results are summarized below (22). A single reviewer evaluated most studies of CT, needle biopsy, and surgery. For these tests, we used a descriptive approach to highlight aspects of study quality.

Patient and Nodule Characteristics. Our base-case analysis considered a hypothetical cohort of 62-year-old men and women because in eight recent studies of FDG-PET for pulmonary nodule diagnosis, roughly 60% of the participants were men and the mean age was 61.8 years (25-28, 30, 31, 33, 35). The base-case analysis assumed that the pulmonary nodule measured 2 cm in diameter. We assumed that 12.5% of patients with malignant nodules would have regional lymph node involvement, based on the median prevalence of mediastinal metastases in eight studies of CT for mediastinal staging in patients with stage T1 bronchogenic carcinoma (12-19). The monthly probability that a benign nodule would grow in the first month was 28%, based on the median proportion of benign nodules that were caused by an acute or subacute inflammatory process in 10 recent studies of FDG-PET for pulmonary nodule diagnosis (23, 25-28, 30, 32-35). Because most benign nodules grow rapidly or not at all, the monthly probability that a benign nodule would grow in subsequent months was assumed to be 0.5%.

Diagnostic Test Performance: FDG-PET. To determine the diagnostic accuracy of FDG-PET, we performed a meta-analysis. Although our methods and results have been published elsewhere (22), we summarize them below. Our computerized search strategy is outlined in Appendix Table 3. We identified 13 studies of FDG-PET that enrolled 450 patients with pulmonary nodules (23-35). The number of participants ranged between 19 and 100, and the mean age of participants ranged between 58 and 71 years (Appendix Table 4). The median prevalence of malignancy was 65.8% (range, 46% to 79%). Six studies limited enrollment to participants with pulmonary nodules (25, 30-32, 34, 35). Seven other studies enrolled more heterogeneous groups of patients with pulmonary nodules and larger mass lesions but provided separate results for participants with pulmonary nodules (23, 24, 26-29, 33). We only used data from participants with pulmonary nodules to perform quantitative analyses.


View this table:
[in this window]
[in a new window]
 
Appendix Table 3. MEDLINE Search for Studies of Positron Emission Tomography with 18-Fluorodeoxyglucose

 

View this table:
[in this window]
[in a new window]
 
Appendix Table 4. Studies of Positron Emission Tomography with 18-Fluorodeoxyglucose for Pulmonary Nodule Diagnosis

 

To identify high-quality studies, we adapted criteria for methodologic quality proposed by Kent and colleagues (100), who evaluated imaging tests to diagnose lumbar spinal stenosis. These criteria have also been used to assess the quality of studies of polymerase chain reaction to diagnose HIV infection (37, 101). The revised criteria cover seven dimensions: technical quality of the index test, technical quality of the reference test, independence of test interpretation, description of the study sample, cohort assembly, sample size, and unit of data analysis. Eleven of 13 studies satisfied at least 70% of our study quality criteria (24-33, 35). The other two studies satisfied between 50% and 69% of our criteria (23, 34). Most studies met our criteria for the technical quality of FDG-PET, although three studies administered doses of FDG that were lower than recommended (34, 35, 102), and three studies did not specify whether participants were examined in the fasting state (23, 25, 27). All studies but one adequately described reference tests that were used to confirm the presence of malignancy or to establish a benign diagnosis (23). Three studies did not report whether FDG-PET readers were blinded to the results of the reference test (23, 30, 34), and several studies did not indicate whether FDG-PET readers were blinded to the clinical characteristics of the participants and other radiographic data. All studies prospectively enrolled a relevant cohort of participants, and all but two studies indicated that the individual patient was the unit of the analysis (27, 35).

To quantitatively summarize study results, we used a meta-analytic method to construct a summary ROC curve for FDG-PET (36, 37). The ROC curves illustrate the tradeoff between sensitivity and specificity as the threshold for defining a positive test result varies from most stringent to least stringent. Our meta-analytic method rests on the assumption that individual study estimates of sensitivity and specificity represent unique points on a common ROC curve.

For each study, we constructed 2 x 2 contingency tables in which all participants were classified as being FDG-PET–positive or–negative and as having a malignant or benign pulmonary nodule. We calculated the true-positive rate (TPR = sensitivity), the false-positive rate (FPR = 1–specificity) and the log odds ratio (log odds TPR–log odds FPR). The log odds ratio is a measure of diagnostic test performance that accounts for the fact that the TPR and FPR are positively correlated. Next, we logistically transformed the TPR and FPR and fit a summary ROC curve with linear regression, by using the log odds ratio as the dependent variable and an implied function of the test threshold (log odds TPR + log odds FPR) as the independent variable (36).

A limitation of this method is that the transformation requires the use of a correction factor when the 2 x 2 table for a study contains one or more zero values (that is, when reported sensitivity or specificity are perfect). An advantage of the method is that it provides a statistical test of the hypothesis that the variance in the group with malignant disease and the variance in the group without malignant disease are equal. The variances are not equal when the slope of the regression line is significantly different from zero. When the slope is not significantly different from zero, the resulting ROC curve is symmetrical and can be described by a common or summary log odds ratio. When this condition was met, we used the Mantel–Haenszel method for pooling odds ratios because this method does not require a correction factor (103). The two methods produced nearly identical results. Estimates of uncertainty were derived by using the Mantel–Haenszel method and were expressed in terms of 95% CIs.

In the 13 studies, the mean sensitivity and specificity were 93.9% and 85.8%, and the median sensitivity and specificity were 98.0% (interquartile range, 90% to 100%) and 83.3% (interquartile range, 80% to 100%). Appendix Figure 6 displays the summary ROC curve for FDG-PET. For our base-case estimates, we selected an operating point on the ROC curve that corresponded to the median specificity of FDG-PET in the 13 studies. We used the median specificity because we wanted to estimate where FDG-PET operates in current practice. Other approaches are possible, but not necessarily better. At this point on the ROC curve, sensitivity and specificity were 94.2% (95% CI, 89.1% to 97.0%) and 83.3%, respectively (22).



View larger version (18K):
[in this window]
[in a new window]
 
Appendix (Figure 6). Summary receiver-operating characteristic (ROC) curve for positron emission tomography with 18-fluorodeoxyglucose (FDG-PET). The ROC curves illustrate the tradeoff between sensitivity and specificity as the threshold that defines a positive test result varies from most stringent to least stringent. The ROC curve for FDG-PET is shown with 95% CIs (dotted lines). Black diamonds represent individual study estimates of sensitivity and specificity. Four studies reported perfect sensitivity and specificity (black square). The point on the summary ROC curve that corresponds to the median specificity reported in 13 studies of FDG-PET for pulmonary nodule diagnosis is shown (black circle). At this point, sensitivity and specificity were 94.2% and 83.3%, respectively.

 

Diagnostic Test Performance: CT. To determine the diagnostic accuracy of CT, we searched MEDLINE for English-language studies published before January 2000 by combining the MeSH term tomography, X-ray computed with a list of MeSH terms and keywords for lung cancer and pulmonary nodules. In addition, we scanned the reference lists of retrieved studies and review articles. We updated this literature search in November 2001. We identified 18 studies of CT for the diagnosis of focal pulmonary lesions (Appendix Tables 5, 6, and 7) (2-4, 38-47, 104-108). Interpreting the literature on CT is challenging because several different techniques have been described, including noncontrast CT, CT densitometry, high-resolution CT, and CT with nodule enhancement. In addition, many of the studies were performed more than 10 to 20 years ago.


View this table:
[in this window]
[in a new window]
 
Appendix Table 5. Studies of Computed Tomography Densitometry for Diagnosis of Pulmonary Nodules and Mass Lesions

 

View this table:
[in this window]
[in a new window]
 
Appendix Table 6. Studies of High-Resolution Computed Tomography for Pulmonary Nodule Diagnosis

 

View this table:
[in this window]
[in a new window]
 
Appendix Table 7. Studies of Dynamic Computed Tomography with Nodule Enhancement

 

We identified nine studies that evaluated noncontrast CT or CT densitometry (4, 38-45), two studies that evaluated high-resolution CT (46, 47), and one study that evaluated both CT densitometry and high-resolution CT (2) (Appendix Tables 5 and 6). Computed tomography densitometry attempts to identify benign nodules on the basis of increased density characteristics that suggest occult calcification. In past years, a reference "phantom" was used to account for inter- and intrascanner differences in measuring nodule density, but this is no longer used in clinical practice. More recent studies of thin-section (high-resolution) CT have used different criteria to distinguish benign from malignant nodules (Appendix Table 6). One study used a set of criteria that proved to be sensitive but not specific for identifying malignancy (47), while another study used criteria that were more specific and less sensitive (46).

Studies of noncontrast CT and high-resolution CT enrolled between 35 and 720 participants, and most pulmonary lesions measured 3 cm or less in diameter (Appendix Tables 5 and 6). The prevalence of malignancy varied greatly, ranging between 15% and 78%. All but two studies reported the technical characteristics of the CT examinations in detail (43, 45). However, only three studies reported prospective enrollment of participants (2, 45, 47), and only one study reported that CT readers were blinded to the final diagnosis (47). In the other studies, these aspects of study design were not mentioned. All studies used acceptable reference tests to confirm a diagnosis of malignancy. Most studies required at least 18 months of clinical and radiographic follow-up for confirmation of benign disease without histologic proof, but four studies permitted shorter follow-up periods (38-40, 45).

Several studies have examined functional or dynamic imaging with CT (3, 104-108). In these studies, dynamic enhancement of lung nodules with iodinated contrast material is thought to identify increased vascularity that is strongly associated with malignancy (Appendix Table 7). In a recent multicenter study involving 356 participants, Swensen and colleagues (108) found that absence of enhancement strongly predicted a benign cause. The sensitivity and specificity of dynamic CT for identifying malignancy were 98% and 58%, respectively. Although this test is extremely promising, it is not widely used outside of research settings. In addition, it is used only when the nodule is radiographically indeterminate (for example, when thin-section CT shows no evidence of calcification). We chose to derive estimates for the sensitivity and specificity of CT from studies of CT densitometry and high-resolution CT, despite their limitations, because we aimed to examine the incremental benefits and costs of adding FDG-PET imaging to diagnostic strategies in current use. However, we also examined the potential role of dynamic CT with nodule enhancement in a sensitivity analysis.

In 12 studies of noncontrast CT and high-resolution CT, the mean sensitivity and specificity were 92.5% and 53.6% and the median sensitivity and specificity were 99.2% (interquartile range, 91% to 100%) and 55.8% (interquartile range, 45% to 60%). To derive base-case estimates of sensitivity and specificity, we used the same meta-analytic method that we used for studies of FDG-PET to construct a summary ROC curve for CT (Appendix Figure 7). Because the variance in the group with malignant disease and the variance in the group with benign disease were not equal, we used the method of Moses and colleagues (36) to obtain our base-case values and estimates of uncertainty. For our base-case estimates, we selected an operating point on the ROC curve that corresponded to the median specificity in the studies. At this point on the ROC curve, the sensitivity and specificity for identifying malignant nodules were 96.5% (CI, 80.9% to 99.5%) and 55.8%, respectively. We obtained similar results when we restricted our analysis to more recent studies of high-resolution CT. In this analysis, sensitivity and specificity were 96.5% (CI, 82.9% to 99.4%) and 58.5%, respectively.



View larger version (18K):
[in this window]
[in a new window]
 
Appendix (Figure 7). Summary receiver-operating characteristic (ROC) curve for computed tomography (CT). The ROC curve for CT is shown with 95% CIs (dotted lines). Black diamonds represent individual study estimates of sensitivity and specificity. The black circle represents the point on the summary ROC curve that corresponds to the median specificity reported in 12 studies of noncontrast CT and high-resolution CT for pulmonary nodule diagnosis. At this point, sensitivity and specificity were 96.5% and 55.8%, respectively. Note that the summary ROC curve for CT is not symmetrical.

 

In the radiology literature, the term "indeterminate" designates a positive CT result and the term "benign" designates a negative CT result. Because "indeterminate" is a potentially confusing term, we use the phrase "possibly malignant" to describe indeterminate, or positive, results. Benign nodules typically have smooth borders and diffusely increased density, suggesting the presence of calcification. It is important to note that this definition implies that most pulmonary nodules will be indeterminate (or possibly malignant) by CT criteria.

Conditional Performance of FDG-PET and CT. In the base-case analysis, we assumed that the results of FDG-PET and CT were independent, or that the sensitivity and specificity of FDG-PET were the same regardless of the results of CT. This implies that the test results are not correlated. This is plausible because CT and FDG-PET characterize nodules by distinct mechanisms: FDG-PET identifies malignant nodules on the basis of increased glucose uptake and metabolism, while CT sometimes identifies benign lesions on the basis of density characteristics that suggest calcification. However, in previous work that examined imaging tests for mediastinal lymph node staging in patients with non–small-cell lung cancer, we found that the sensitivity and specificity of FDG-PET for identifying mediastinal metastases depended on the results of CT (78). More specifically, FDG-PET was 25% less sensitive and 16% more specific when CT revealed no lymph node enlargement relative to when CT detected enlarged nodes. Nevertheless, this relationship may not hold true for pulmonary nodule diagnosis because CT identifies lymph node metastases based on size criteria, rather than by density characteristics, as is true for pulmonary nodules.

Although data on the conditional performance of FDG-PET and CT for mediastinal staging have been published, we could not identify any studies that examined the conditional performance of FDG-PET and CT to diagnose pulmonary nodules. In fact, most studies of FDG-PET for pulmonary nodule diagnosis excluded participants with nodules that appeared benign on CT and limited enrollment to participants with CT findings that were possibly malignant. Thus, our base-case estimates of sensitivity and specificity apply directly when CT results are indeterminate and less well when CT results are benign. However, to explore the potential importance of the independent test assumption, we performed a sensitivity analysis in which we assumed that the conditional performance of FDG-PET and CT for pulmonary nodule diagnosis was similar to their conditional performance for mediastinal staging. We tested even more extreme values of the sensitivity and specificity of FDG-PET when CT results were benign.

Diagnostic Test Performance: Needle Biopsy. To determine the diagnostic accuracy of CT-guided needle biopsy and the risk for biopsy-related complications, we searched MEDLINE for English-language studies published before January 2000 by combining the MeSH term biopsy, needle with a list of MeSH terms and keywords for lung cancer and pulmonary nodules. We also scanned the reference lists of retrieved studies and review articles. We updated this literature search in November 2001. We included studies that limited enrollment to participants with pulmonary nodules that measured no more than 4 cm in diameter, as well as studies that reported results separately for participants with pulmonary nodules. We excluded studies that used means other than CT guidance for localizing nodules in more than 10% of participants. We identified nine studies that met these criteria (27, 48-55) (Appendix Table 8). Another study was excluded because we strongly suspected that it presented previously reported data (109).


View this table:
[in this window]
[in a new window]
 
Appendix Table 8. Studies of Computed Tomography–Guided Needle Biopsy for Pulmonary Nodule Diagnosis

 

The studies enrolled between 22 and 220 participants with pulmonary nodules. Two studies included patients with pulmonary nodules that measured up to 4 cm in diameter (48, 49), while the remaining studies reported results for patients with nodules that measured less than 1.5 to 3 cm in diameter. The prevalence of malignancy was very high, ranging from 62% to 85%. All studies except one described in detail the technical aspects of how needle biopsies were performed (27). There was little heterogeneit