Are Risk Stratification Tables the Best Way to Evaluate Model Performance?

  1. Holly Janes, PhD;
  2. Margaret S. Pepe, PhD; and
  3. Wen Gu, MS
  1. From Fred Hutchinson Cancer Research Center, Seattle, WA 98109, and University of Washington, Seattle, WA 98195.

    IN RESPONSE:

    We thank Drs. Stern and Smith for their thoughts on improving methods for the evaluation of risk prediction models. We agree wholeheartedly that the distribution of risks predicted by the risk prediction model is key for evaluating model performance. In the statistical literature, this has been called the predictiveness curve, and we have advocated strongly for its use (1, 2). In fact, the margins of a risk stratification table display exactly this: the population distribution of risk according to the 2 models, albeit by using discrete categories. Because the main goal of our article is to emphasize that one should focus on the margins of the risk stratification table rather than the interior cells, our article in fact concurs with the point of view of Drs. Stern and Smith.

    The area under the ROC curve or c-statistic can indeed be viewed as a measure of the dispersion of the risk distribution. However, it seems to be a measure that lacks clinical relevance (3, 4). In addition, dissatisfaction with the ROC curve stems in part from the fact that it does not display risk thresholds. We advocate instead displaying risk distributions for events and nonevents separately as a way to directly view the true-positive rates (for events) and false-positive rates (for nonevents) associated with specific risk thresholds (1). Although mathematically equivalent to reporting the ROC curve and the overall event rate (5), the risk distributions are much easier to interpret. Again, the margins of the risk stratification table show these distributions in categories.

    We demonstrate that the amount of reclassification shown in a risk stratification table is simply a consequence of the extent of correlation between the risks calculated from the 2 models. Knowing the correlation in risks between 2 models is of little use; rather, the calibration, capacity for risk stratification, and classification accuracy should be used as metrics for model comparison, all of which can be viewed from the margins of the risk stratification table. When risk categories are not defined in advance, we agree with Drs. Stern and Smith that plots can be used to display this information (6).

    Holly Janes, PhD

    Margaret S. Pepe, PhD

    Fred Hutchinson Cancer Research Center

    Seattle, WA 98109

    Wen Gu, MS

    University of Washington

    Seattle, WA 98195

    Article and Author Information

    • Potential Financial Conflicts of Interest: None disclosed.

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    « Previous | Next Article »Table of Contents