We thank Drs. Steyerberg and Pencina for bring up an important point with regard to evaluating reclassification measures in the presence of survival data. When the outcome is time to an event, such as a cardiovascular event, care needs to be taken to accommodate censoring. The reclassification calibration statistic can easily be calculated using survival data, as indicated in our paper. The Kaplan-Meier estimate of the event rate as of 10 years, for example, can be used to obtain the expected number of events within each cell of the reclassification table. D’Agostino and Nam(1) suggest that with survival data, the degrees of freedom should be k-1 rather than k-2, where k in the setting of reclassification is the number of cells containing at least 20 individuals.
The use of survival data is more problematic for the NRI and IDI, which both condition on case-control status. A similar problem occurs for the c-statistic, but methods to accommodate survival data have been established (2). For the NRI, Steyerberg and Pencina propose using the expected number of cases based on the Kaplan-Meier estimate within each cell, the same calculation needed for the reclassification calibration statistic. While an estimated standard error is not currently available for this measure, a confidence interval as well as the standard error can be determined using bootstrap samples.
We suggest that both the reclassification calibration statistic and the NRI be computed for reclassification tables, even in the presence of survival data.
References
1. D’Agostino RB, Nam B-H. Evaluation of the performance of survival analysis models: Discrimination and calibration measures. Handbook of Statistics. Vol. 23, pp1-25.
2. Harrell FE, Jr. Regression Modeling Strategies. New York: Springer, 2001.
None declared
References
1. Cook NR, Ridker PM. Advances in measuring the effect of individual predictors of cardiovascular risk: the role of reclassification measures. Ann Intern Med. 2009;150(11):795-802.
2. Pencina MJ, D'Agostino RB, Sr., D'Agostino RB, Jr., Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157-72; discussion 207-12.
3. Viallon V, Ragusa S, Clavel-Chapelon F, Benichou J. How to evaluate the calibration of a disease risk prediction tool. Stat Med. 2009;28(6):901-16.
4. Pepe MS, Feng Z, Gu JW. Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’. Stat Med. 2008;27(2):173–181.
None declared
To the Editor:
Readers of Cook and Ridker’s paper should be given an opportunity to look at the unprocessed Women’s Health Study data, not just the statistical analysis. Since their data appears to be well described by a lognormal distribution, which is expected when risk factors interact multiplicatively (1), a model of their data can be created. Continuous risk distribution curves for the Reynolds Risk Score and Reynolds Risk Score without systolic blood pressure fitted to their categorical data are shown in the Figure (available at http://s668.photobucket.com/albums/vv49/sngoonew/). There appears to be little difference between the two distributions, consistent with the minimal difference in ROC curve AUC’s (2). I would ask Drs. Cook and Ridker to provide a similar graph based on the actual data so readers could judge whether the differences in risk stratification are likely to be clinically significant.
It is difficult to understand how reclassification analysis could tell us anything about accuracy that couldn’t be learned from studying each model alone. The only additional information included is the correlation between individual risk estimates, which is best evaluated with a scatterplot (avoiding the categorization of continuous data). When multivariate models differ by a single risk factor, as in their example, discordance between individual risk estimates may be modest. However when multivariate models share few risk factors, the discordance can be substantial (3). It is known that the greater the discordance, the higher the reclassification rate (4). Discordance reflects the fact that different models assign different risk estimates to the same individual and accounts for almost all of the reclassification. Differences in accuracy are not required to generate reclassification and, when they exist, are best evaluated by specific measures of calibration.
References
1. Limpert E, Stahel WA, Abbt M. Log-normal Distributions across the Sciences: Keys and Clues. BioScience. 2001;51:341-352.
2. Stern RH. Evaluating New Cardiovascular Risk Factors for Risk Stratification. J Clin Hyper 2008;10:485-488.
3. Lemeshow S, Klar J, Teres D. Outcome prediction for individual intensive care patients: useful, misused, or abused? Intensive Care Med 1995;21:770-776.
4. Janes H, Pepe MS, Gu W. Assessing the Value of Risk Predictions by Using Risk Stratification Tables. Ann Intern Med 2008;149:751-760.
None declared