1. Author reply

    We thank Drs. Steyerberg and Pencina for bring up an important point with regard to evaluating reclassification measures in the presence of survival data. When the outcome is time to an event, such as a cardiovascular event, care needs to be taken to accommodate censoring. The reclassification calibration statistic can easily be calculated using survival data, as indicated in our paper. The Kaplan-Meier estimate of the event rate as of 10 years, for example, can be used to obtain the expected number of events within each cell of the reclassification table. D’Agostino and Nam(1) suggest that with survival data, the degrees of freedom should be k-1 rather than k-2, where k in the setting of reclassification is the number of cells containing at least 20 individuals.

    The use of survival data is more problematic for the NRI and IDI, which both condition on case-control status. A similar problem occurs for the c-statistic, but methods to accommodate survival data have been established (2). For the NRI, Steyerberg and Pencina propose using the expected number of cases based on the Kaplan-Meier estimate within each cell, the same calculation needed for the reclassification calibration statistic. While an estimated standard error is not currently available for this measure, a confidence interval as well as the standard error can be determined using bootstrap samples.

    We suggest that both the reclassification calibration statistic and the NRI be computed for reclassification tables, even in the presence of survival data.

    References

    1. D’Agostino RB, Nam B-H. Evaluation of the performance of survival analysis models: Discrimination and calibration measures. Handbook of Statistics. Vol. 23, pp1-25.

    2. Harrell FE, Jr. Regression Modeling Strategies. New York: Springer, 2001.

    Conflict of Interest:

    None declared

    Submit response
  2. Reclassification calculations with incomplete follow-up

    Cook and Ridker are to be applauded for their clear discussion of reclassification measures, and the modern developments in the area of judging the incremental value of a biomarker for prediction of outcome (1). A key area of application is in cardiovascular disease, where the time horizon is typically 10 years. One important problem recognized by Cook and Ridker is that not all subjects will have follow-up completed until 10 years. Kaplan-Meier curves and Cox regression analysis have been introduced long ago to deal with such censored observations. Reclassification measures, such as the net reclassification index (NRI) (2), have been proposed for binary data and currently do not have a way of incorporating incomplete follow-up. As with other model performance measures in survival analysis, reclassification statistics can be estimated at different time points within the follow-up window. To address the issue of censored data, Cook and Ridker propose to select only subjects with follow-up complete at a certain time point, 8 years in their example. They were able to include the majority of control participants, since 23,611 of 23,792 women had follow-up of at least 8 years, excluding only 181, or 1%. But only 560 of 766 cases had a cardiovascular event before 8 years of follow-up, leading to exclusion of 206 or 27%. We suggest a simple alternative based on the expected number of cases and non-cases calculated using the Kaplan-Meier estimator. This approach was recently found optimal in assessing calibration of survival models (3). It appropriately handles censored data, and does not throw away useful information. We provide a revised Figure 1 created with our proposal, with cell entries for cases and non-cases obtained by multiplying the 10 year Kaplan-Meier rates by the total numbers of people in each cell at 10 years given in the original table. We then expect 697 cases at 10 years of follow-up, and 23,861 control participants. The reclassification numbers change to some extent. Although the conclusions remain largely the same in this example (NRI 9.9% vs 9.8% originally), we would like to recommend our simple estimation procedure of the NRI for future application with censored observations. Especially when more censoring occurs early during follow- up, our approach is attractive. In this case, choosing one time point for analysis can lead to exclusion of many control participants, or relatively many cases, making the NRI estimate quite unstable. Some specific issues, such as bias and precision, require further research. We note that the asymptotic confidence interval for NRI calculated using the approach outlined in (2) is no longer valid for the current extension. A practical solution would use bootstrap estimation (4), in addition to its use for bias correction as already correctly suggested by Cook and Ridker. Revised Fig 1. Reclassification table comparing 10-year risk strata for models that include risk factors for cardiovascular disease in the Women’s Health Study with and without SBP, using the 10-year Kaplan-Meier estimates to estimate the number of case patients and control participants

    References

    1. Cook NR, Ridker PM. Advances in measuring the effect of individual predictors of cardiovascular risk: the role of reclassification measures. Ann Intern Med. 2009;150(11):795-802.

    2. Pencina MJ, D'Agostino RB, Sr., D'Agostino RB, Jr., Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157-72; discussion 207-12.

    3. Viallon V, Ragusa S, Clavel-Chapelon F, Benichou J. How to evaluate the calibration of a disease risk prediction tool. Stat Med. 2009;28(6):901-16.

    4. Pepe MS, Feng Z, Gu JW. Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’. Stat Med. 2008;27(2):173–181.

    Conflict of Interest:

    None declared

    Submit response
  3. To the Editor:

    Readers of Cook and Ridker’s paper should be given an opportunity to look at the unprocessed Women’s Health Study data, not just the statistical analysis. Since their data appears to be well described by a lognormal distribution, which is expected when risk factors interact multiplicatively (1), a model of their data can be created. Continuous risk distribution curves for the Reynolds Risk Score and Reynolds Risk Score without systolic blood pressure fitted to their categorical data are shown in the Figure (available at http://s668.photobucket.com/albums/vv49/sngoonew/). There appears to be little difference between the two distributions, consistent with the minimal difference in ROC curve AUC’s (2). I would ask Drs. Cook and Ridker to provide a similar graph based on the actual data so readers could judge whether the differences in risk stratification are likely to be clinically significant.

    It is difficult to understand how reclassification analysis could tell us anything about accuracy that couldn’t be learned from studying each model alone. The only additional information included is the correlation between individual risk estimates, which is best evaluated with a scatterplot (avoiding the categorization of continuous data). When multivariate models differ by a single risk factor, as in their example, discordance between individual risk estimates may be modest. However when multivariate models share few risk factors, the discordance can be substantial (3). It is known that the greater the discordance, the higher the reclassification rate (4). Discordance reflects the fact that different models assign different risk estimates to the same individual and accounts for almost all of the reclassification. Differences in accuracy are not required to generate reclassification and, when they exist, are best evaluated by specific measures of calibration.

    References

    1. Limpert E, Stahel WA, Abbt M. Log-normal Distributions across the Sciences: Keys and Clues. BioScience. 2001;51:341-352.

    2. Stern RH. Evaluating New Cardiovascular Risk Factors for Risk Stratification. J Clin Hyper 2008;10:485-488.

    3. Lemeshow S, Klar J, Teres D. Outcome prediction for individual intensive care patients: useful, misused, or abused? Intensive Care Med 1995;21:770-776.

    4. Janes H, Pepe MS, Gu W. Assessing the Value of Risk Predictions by Using Risk Stratification Tables. Ann Intern Med 2008;149:751-760.

    Conflict of Interest:

    None declared

    Submit response
« Parent articleTable of Contents