Home |
Current Issue |
Past Issues |
In the Clinic |
ACP Journal Club |
CME |
Collections |
Audio/Video |
Mobile |
Subscribe |
Tools |
Help |
ACP Online
|
15 October 1997 | Volume 127 Issue 8 Part 2 | Pages 764-768
This paper reviews and compares existing statistical methods for profiling health care providers. It recommends improvements that are based on the use of better statistical models and the adoption of more realistic, medically based criteria for judging the performance of health care providers. Unlike most profiling methods, the proposed hierarchical models allow the probability of acceptable provider performance to be calculated; thus, they can answer such questions as, "What is the probability that a given hospital's true mortality rate for cardiac surgery patients exceeded 3.33% last year?" The commonly encountered problems of regression-to-the-mean bias and small caseloads can be handled by using hierarchical models to extract more information from profiling data.
The 1993 report on coronary artery bypass graft surgery from the New York State Department of Health [1] listed mortality data (deaths before leaving the hospital) and profile statistics for 31 hospitals. These profiles aimed to identify hospitals that had excessively high or low mortality rates associated with coronary artery bypass graft surgery. In this article, we review 3 of the 31 hospitals to see how profile results improve with the use of more information. Two of the hospitals were chosen for their high observed mortality rates (1 had a mortality rate that was substantially higher than the 1992 statewide rate of 2.78%), and 1 was chosen for its low mortality rate. (It is important to keep in mind that the performance of the hospitals may have changed since 1992.)
Hierarchical models use information from the available data obtained from all health care providers being examined. These models are so named because they apply to situations with two or more levels of random variation. In the mortality rate example to follow, the first level of the hierarchy specifies a distribution for the random number of observed deaths at a given hospital and the second level specifies possible distributions for the true mortality rates in several hospitals. To use terminology from analysis of variance, level 1 in the hierarchical model concerns the variation of rates within providers and level 2 concerns the variation between true rates of the hospitals.
We urge the use of medical standards that specify the largest or smallest medically acceptable true mortality rate in the setting being profiled. How standards are set depends on their purpose; standards meant to encourage quality improvement, for example, may differ from standards meant to distribute pay incentives. Major improvements to profiles will result from the use of medically appropriate performance standards.
Case-mix adjustments are made in almost all profile analyses to account for the differences in provider performances attributable solely to differences in the populations served. Hierarchical models accommodate these crucial adjustments. In addition, standard profiling procedures typically ignore units with small caseloads, such as those with fewer than 50 patients. This practice gives little information on the performance of low-volume providers and can provide unfair gaming opportunities (for example, a hospital with a very high mortality rate for 49 patients might refuse to admit further patients in the given category). Hierarchical models require no minimum sample size for a particular health care provider, provided that the ensemble of all providers analyzed has adequate data. Ensemble data are used to correct for regression-to-the-mean bias.
In this paper, we review standard statistical profiling methods, showing how successively better results are obtained as more information is included. We recommend the use of hierarchical models to extract ensemble information and advocate the use of more directly relevant standards.
The advantages of a hierarchical approach are as follows: 1) The probabilities of performance standards are calculated, 2) comparisons of units are based on medically relevant standards, 3) regression-to-the-mean bias is removed, and 4) providers with small sample sizes remain in the analysis. The benefits of hierarchical modeling apply not only to the mortality data considered here but also to profiles of patient satisfaction, referral rates, and other outcomes.
The simplest profile analysis would compare the number of deaths in each hospital. Because 22 deaths occurred at H2, it seems to be a much worse hospital than H1, at which 3 deaths occurred. However, the number of patients served should also be considered: that is, mortality rates, not raw counts, should be analyzed. The disparity in the number of deaths can be completely explained by the caseload of 484 coronary artery bypass graft procedures at H2 compared with 67 at H1. The mortality rates of 4.48% (3 of 67 patients) at H1 and 4.55% (22 of 484 patients) at H2 are almost indistinguishable. This improvement makes a fairer comparison by adjusting for caseload while retaining the basic concept of comparing the number of deaths. The improvements stem from obtaining and using appropriate additional information.
An even better approach accounts for case-mix differences. Data from the New York state study [1] show that the patients who had coronary artery bypass graft surgery at H2 were less healthy than those at H1. On average, patients at H1 had 51.1% of the risk for death of all patients who had coronary artery bypass graft surgery (expected mortality rate, 1.42% for the case mix of patients at H1 compared with 2.78% statewide; 1.42 ÷ 2.78 = 0.511). When we adjusted for this, the 3 deaths at H1 equivalently resulted from 34.2 (0.511 x 67) procedures done on patients with an average case mix. The risk-adjusted mortality rate is therefore 8.77% (3 ÷ 34.2) at H1. The risk-adjusted mortality rate at H2 was 5.77%. (With many deaths and an exceptionally healthy patient case mix, it is possible for the relative risk case-mix adjustment method to produce adjusted rates exceeding 1.00.)
Risk adjustments contribute vitally to reducing unfair profile evaluations. The need for risk adjustment has led to vigorous research on ways to account for case-mix differences [2-8]. For our purposes, we used the case-mix data and the risk adjustment methods as they were used in the report from New York State [1].
Probability Distributions
Once the hospital rates have been adjusted for case mix, a probability distribution is needed to perform a statistical test. The observed count is governed by the hospital's true mortality rate; here, it is the rate that could be observed only if the hospital had treated a very large number of patients with the average case mix. The true mortality rates are unknown and must be estimated from the data.
The Poisson distribution is appropriate for these data because the probability of death after coronary artery bypass graft surgery was small (2.78% state-wide). An alternative, the binomial distribution, is well approximated by the Poisson distribution in this case. If the expected number of deaths is very small or if some individual probabilities are large, then normal, Poisson, and binomial assumptions may not be valid and an alternate calculation may be necessary [9].
Commonly Used Decision Criteria
Profile analyses often require tests of the null hypothesis that a provider's true mortality rate equals the average rate for all providers. The hypothesis is tested at a specified significance level. Following this convention, the New York State report [1] used the statewide mortality rate of 2.78% as the standard and set the significance level at 0.025. This hypothesis is not very useful: Taken literally, it means that if the true hospital mortality rates differ even by tiny amounts (which one would expect), many of the hospitals would have true rates that exceed the population mean.
P Values
When a distribution and a standard are specified, the P value can be calculated [10, 11]. A normal approximation to the Poisson distribution will work poorly for the profiles of New York State hospitals in which coronary artery bypass graft surgery was performed, because fewer than 10 deaths were expected at many of the hospitals. The P value for H1 is the probability of observing 3 or more deaths (because 3 deaths occurred at H1), assuming that H1 performed with a true mortality rate of 2.78% for patients with an average case mix. Had H1 performed at this average rate, 0.95 deaths would have been expected. A normal approximation produces a P value of 0.018. Small P values, such as this, identify high true mortality rates; H1 would therefore be identified by this approximate calculation. However, the exact P value based on the Poisson distribution is 0.072 and is too large to identify H1 as a poor performer. Profile procedures that use normal approximations result in incorrect profile estimates if the approximation is inaccurate. Errors such as this are unnecessary; many statistical computer packages make it easy to calculate exact P values for the Poisson and other common distributions.
The probability of acceptable performance cannot be calculated unless a prior probability-that is, the probability distribution that describes the variation of a hospital's true mortality rate before the data for that hospital were observed-is first specified. If expert opinion is unavailable to specify this prior distribution (or if this opinion is not trusted), the distribution is usually chosen to reflect extreme uncertainty about the true rate. In these cases, the probability of adequate performance will often be close to the exact P value [12]. One can estimate this distribution by using a hierarchical model [13] and the data for all hospitals. By using the data from all the hospitals, we include more information in the analysis and can expect better estimates.
Much recent research has centered on hierarchical statistical methods for normal, Poisson, binomial, and other distributions that could help in medical profiling [14-16]. In a hierarchical model, the first-level distribution for the observed data and the second-level prior distribution (also called the population distribution) for the true mortality rates are used to compute the probability that a hospital has performed acceptably. We recommend that P values be replaced by this probability.
The hierarchical model that we fitted to the Poisson data (by using case-mix information and Poisson Regression Interactive Multilevel Modeling software [17]) has been discussed elsewhere [18]. The fitted prior distribution has a mean mortality rate (±SD) of 2.78% ± 0.61%. Almost all of the hospitals had true mortality rates between 1.56% and 4.00%, within 2 SDs of the mean.
We then used each hospital's observed mortality rate to construct an estimate of a hospital's true mortality rate. Figure 1 shows the observed mortality rates for the 31 hospitals and the estimated true mortality rates. The estimates of the 31 mortality rates done using a hierarchical model are less variable than the observed rates. Estimates of the true mortality rates are based on shrinkage coefficients, which are values between 0 and 1. The shrinkages determine how far the observed rate at the hospital moves (that is, shrinks) toward the population rate of 2.78% (Figure 1) to produce the best estimate for that hospital. Shrinkages close to 1 correspond to small caseloads, like that of H1, for which the best shrunken estimate is close to the overall mortality rate of 2.78%. Larger caseloads, such as that of H3 (which also had a low mortality rate), have smaller shrinkages; therefore, the estimate is relatively close to the hospital's observed rate. The average shrinkage for the 31 hospitals is 65%, indicating that the true (unobserved) hospital rates vary less around the mean than the observed rate estimates vary around their true hospital means. STATISTICAL METHODS
Improving the Statistical Approach to Health Care Provider Profiling
For reports on the performance of health care providers to be effective, profiling must be done using the best statistical methods. Commonly used profiling methods often contain some of the following deficiencies. They ignore important relevant information. They use statistical standards where medical standards would serve better. They use the probability that the observed outcome is extreme, assuming that a hospital's true performance is acceptable to identify providers with extreme rates; this is not the probability that the medical unit's true mortality rate exceeds a given standard. (The true mortality rate is the rate that would have occurred if the hospital had served a very large number of patients.)
Profiling Data
![]()
We estimated the true mortality rates associated with coronary artery bypass graft surgery for 31 hospitals in New York State [1] by using data on the number of patients, the number of deaths, and the case-mix difficulty of the patients treated at each hospital. For this analysis, we focused on data for two hospitals (H1 and H2) that had high observed mortality rates.
Models and Tests of Statistical Significance
![]()
Developing good profiling procedures requires specifying probability distributions for the observed outcomes and choosing the standards of acceptable care that define the hypotheses to be tested. Standards that are based on input from medical professionals and from users of the profiles will be the most useful and the most meaningful. When statistical convention alone determines these choices, they are likely to produce inaccurate conclusions and lead to poor decisions.
Hierarchical Bayesian Models for Profile Analyses
![]()
A P value computed to test the hypothesis that a hospital has performed adequately differs from the probability that the hospital has performed adequately. The P value for H1 is 0.072, but this is not the probability that H1 performed acceptably. Hierarchical models and Bayesian reasoning allow the probability of acceptable performance to be calculated.
|
Each shrinkage coefficient estimates how much regression-to-the-mean is expected at a hospital. The mortality rate for future patients at a hospital would be expected to regress away from the observed rate and toward the population rate in proportion to the estimated shrinkage. Mortality rate estimates that are unadjusted for regression to the mean are biased in the sense that the observed mortality rate estimate will usually be farther than the true rate from the population mean. Hierarchical models provide more accurate estimates by making the appropriate adjustment. Our example treats all hospitals as having the same mean because the rates were standardized to have this population case mix, but hierarchical models could be fitted by shrinking each observed estimate toward a different fitted value.
Better Criteria and New Standards
|
|---|
If the maximum acceptable mortality rate for coronary artery bypass graft surgery were set at 3.33% for an average case mix (about 20% greater than the mean of 2.78%), hospitals whose true mortality rates are greater than the average are acceptable, provided that these rates do not exceed 3.33%. With this less stringent standard, a hospital's performance might now be judged as substandard if the probability that it has exceeded the standard is greater than 0.50 (that is, it is more likely than not that the true rate is >3.33%). These two values, the standard for acceptable care and the minimum probability for compliance, should ideally be determined with the collaboration of all knowledgeable and interested parties.
Calculating the Probability of Poor Performance
|
|---|
|
The second highest probability of exceeding 3.33% among the 31 hospitals was found in a fourth hospital with a probability of 0.48. Although that hospital barely meets the specified standard, the 48% probability of exceeding 3.33% provides directly interpretable and relevant information to the profile user. P values are not directly interpretable this way.
Limitations of Hierarchical Models
|
|---|
Although hierarchical models are only slightly more advanced than ordinary regression models, they are relatively unknown to most profile users and are not always well understood by them. Interpreting probability models requires careful thought and knowledge of their underlying assumptions and limitations [20]. Until hierarchical models are commonly found in medical and health service curricula, we recommend that they be used only with the help of expert statisticians.
Conclusions
|
|---|
Hierarchical models are expected to increasingly appear in medical research and journal articles [26]. These models are particularly appropriate for profiling applications because several providers are usually profiled simultaneously. The use of ensemble information in hierarchical models allows the probability of each provider's performance to be calculated, yielding directly interpretable results and relevant answers to the questions being asked about health care quality, access, and satisfaction. Careful use of such models, together with improved standards, will make medical profiles more accurate and more meaningful for health care policy and decision making.
Author and Article Information
|
|---|
|
|
|---|
References
|
|---|
|
|
|---|
1. Coronary artery bypass surgery in New York State 1990-1992. New York: New York State Department of Health; 1993.
2. Salem-Schatz S, Moore G, Rucker M, Pearson SD. The case for case-mix adjustment in practice profiling. When good apples look bad. JAMA. 1994; 272:871-4.
3. Iezzoni LI, Daley J, Heeren T, Foley SM, Hughes JS, Fisher ES, et al. Using administrative data to screen hospitals for high complication rates. Inquiry. 1994; 31:40-55.
4. Iezzoni LI, Ash AS, Shwartz M, Daley J, Hughes JS, Mackiernan YD. Judging hospitals by severity-adjusted mortality rates: the influence of the severity-adjustment method. Am J Public Health. 1996; 86:1379-87.
5. Hannan EL, Siu AL, Kumar D, Racz M, Pryor DB, Chassin MR. Assessment of coronary artery bypass graft surgery performance in New York. Is there a bias against taking high-risk patients?. Med Care. 1997; 35:49-56.
6. Hannan EL, Racz MJ, Jollis JG, Peterson ED. Using Medicare claims data to assess provider quality for CABG surgery: does it work well enough? Health Serv Res. 1997; 31:659-78.
7. Rosen AK, Ash AS, McNiff KJ, Moskowitz MA. The importance of severity of illness adjustment in predicting adverse outcomes in the Medicare population. J Clin Epidemiol. 1995; 48:631-43.
8. Thomas JW, Holloway JJ, Guire KE. Validating risk-adjusted mortality as an indicator for quality of care. Inquiry. 1993; 30:6-22.
9. Luft HS, Brown BW Jr. Calculating the probability of rare events: why settle for an approximation? Health Serv Res. 1993; 28:419-39.
10. Casella G, Berger RL. Statistical Inference. Pacific Grove, CA: Wadsworth & Brooks; 1990:101-2.
11. Ulm K. A simple method to calculate the confidence interval of a standardized mortality ratio (SMR). Am J Epidemiol. 1990; 131:373-5.
12. Morris CN. Reconciling Bayesian and frequentist evidence by Casella G, Berger RL. Testing a point null hypothesis by Berger JO, Selleck T. Journal of the American Statistical Association. 1987; 82:131-3.
13. Morris CN. Parametric empirical Bayes inference: theory and applications. Journal of the American Statistical Association. 1983; 78:47-54.
14. Bryk AS, Raudenbush SW. Hierarchical Linear Models. California: Sage; 1992.
15. Goldstein H. Multilevel Models in Education and Social Research. London: Griffin; 1986.
16. Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman and Hall; 1996.
17. Christiansen CL, Morris CN. Poisson Regression Interactive Multilevel Modeling (PRIMM) Online. StatLib-Software and extensions for the S (Splus) language. Available at http://www.stat.cmu.edu/S/primm. 9 October 1996.
18. Christiansen CL, Morris CN. Hierarchical Poisson regression modeling. Journal of the American Statistical Association. 1997; 92:618-32.
19. Christiansen CL, Morris CN. Fitting and checking a two-level Poisson model: modeling patient mortality rates in heart transplant patients. In: Berry DA, Stangl DK, eds. Bayesian Biostatistics. New York: Marcel Dekker; 1996:467-501.
20. Braitman LE, Davidoff F. Predicting clinical states in individual patients. Ann Intern Med. 1996; 125:406-12.
21. Goldstein H, Spiegelhalter DJ. League tables and their limitations: statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society A. 1996; 159:385-443.
22. Normand SL, Glickman ME, Ryan TJ. Modeling mortality rates for elderly heart attack patients: profiling hospitals in the cooperative cardiovascular project. In: Gatsonis C, Hodges J, Kass R, Singpurwalla N, eds. Case Studies in Bayesian Statistics. New York: Springer-Verlag; 1997 [In press].
23. Thomas N, Longford NT, Rolph JE. Empirical Bayes methods for estimating hospital-specific mortality rates. Stat Med. 1994; 13:889-903.
24. Kreft IG, De Leeuw J, van der Leeden R. Review of five multilevel analysis programs: BMDP-5V, GENMOD, HLM, ML3, VARCL. American Statistician. 1994; 48:324-35.
25. Spiegelhalter D, Thomas A, Best N, Gilks W. BUGS. Bayesian Inference Using Gibbs Sampling. Version 0.50, MRC Biostatistics Unit, Cambridge, UK. Available at http://www.biostat.umn.edu/mirror/methodology/bugs. 17 May 1996.
26. Altman DG, Goodman SN. Transfer of technology from statistical journals to the biomedical literature. JAMA. 1994; 272:129-32.
This article has been cited by other articles:
![]() |
E. E. Drye and J. Chen Evaluating Quality in Small-Volume Hospitals Arch Intern Med, June 23, 2008; 168(12): 1249 - 1251. [Full Text] [PDF] |
||||
![]() |
S. M. O'Brien, E. R. DeLong, and E. D. Peterson Impact of Case Volume on Hospital Performance Assessment Arch Intern Med, June 23, 2008; 168(12): 1277 - 1284. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Ferraris, F. H. Edwards, D. M. Shahian, and S. P. Ferraris Risk Stratification and Comorbidity Card. Surg. Adult, January 1, 2008; 3(2008): 199 - 246. [Full Text] |
||||
![]() |
M. Coory and I. Scott Analysing low-risk patient populations allows better discrimination between high-performing and low-performing hospitals: a case study using inhospital mortality from acute myocardial infarction Qual. Saf. Health Care, October 1, 2007; 16(5): 324 - 328. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Livingston, A. C. Elliott, L. S. Hynan, and E. Engel When Policy Meets Statistics: The Very Real Effect That Questionable Statistical Analysis Has on Limiting Health Care Access for Bariatric Surgery Arch Surg, October 1, 2007; 142(10): 979 - 987. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. O'Brien, D. M. Shahian, E. R. DeLong, S.-L. T. Normand, F. H. Edwards, V. A. Ferraris, C. K. Haan, J. B. Rich, C. M. Shewan, R. S. Dokholyan, et al. Quality Measurement in Adult Cardiac Surgery: Part 2--Statistical Considerations in Composite Measure Scoring and Provider Rating Ann. Thorac. Surg., April 1, 2007; 83(4_Supplement): S13 - S26. [Full Text] [PDF] |
||||
![]() |
R. M. Werner and E. T. Bradlow Relationship Between Medicare's Hospital Compare Performance Measures and Mortality Rates JAMA, December 13, 2006; 296(22): 2694 - 2702. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. M. Krumholz, Y. Wang, J. A. Mattera, Y. Wang, L. F. Han, M. J. Ingber, S. Roman, and S.-L. T. Normand An Administrative Claims Model Suitable for Profiling Hospital Performance Based on 30-Day Mortality Rates Among Patients With an Acute Myocardial Infarction Circulation, April 4, 2006; 113(13): 1683 - 1692. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. M. Krumholz, Y. Wang, J. A. Mattera, Y. Wang, L. F. Han, M. J. Ingber, S. Roman, and S.-L. T. Normand An Administrative Claims Model Suitable for Profiling Hospital Performance Based on 30-Day Mortality Rates Among Patients With Heart Failure Circulation, April 4, 2006; 113(13): 1693 - 1701. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Korst, C. Reyes, M. Fridman, M. C. Lu, C. J. Hobel, and K. D. Gregory Gestational Pyelonephritis as an Indicator of the Quality of Ambulatory Maternal Health Care Services. Obstet. Gynecol., March 1, 2006; 107(3): 632 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Shahian, D. F. Torchiana, R. J. Shemin, J. D. Rawn, and S.-L. T. Normand Massachusetts Cardiac Surgery Report Card: Implications of Statistical Methodology Ann. Thorac. Surg., December 1, 2005; 80(6): 2106 - 2113. [Abstract] [Full Text] [PDF] |
||||
![]() |
D J Spiegelhalter Handling over-dispersion of performance indicators Qual. Saf. Health Care, October 1, 2005; 14(5): 347 - 351. [Abstract] [Full Text] [PDF] |
||||
![]() |
I-C. Huang, F. Dominici, C. Frangakis, G. B. Diette, C. L. Damberg, and A. W. Wu Is Risk-Adjustor Selection More Important Than Statistical Approach for Provider Profiling? Asthma as an Example Med Decis Making, January 1, 2005; 25(1): 20 - 34. [Abstract] [PDF] |
||||
![]() |
E. H. Blackstone Monitoring surgical performance J. Thorac. Cardiovasc. Surg., December 1, 2004; 128(6): 807 - 810. [Full Text] [PDF] |
||||
![]() |
D.J. Spiegelhalter Monitoring clinical performance: A commentary J. Thorac. Cardiovasc. Surg., December 1, 2004; 128(6): 820 - 822. [Full Text] [PDF] |
||||
![]() |
D. B. Rubin, E. A. Stuart, and E. L. Zanutto A Potential Outcomes View of Value-Added Assessment in Education Journal of Educational and Behavioral Statistics, January 1, 2004; 29(1): 103 - 116. [PDF] |
||||
![]() |
J M Simpson, N Evans, R W Gibberd, A M Heuchan, and D J Henderson-Smart Analysing differences in clinical outcomes between hospitals Qual. Saf. Health Care, August 1, 2003; 12(4): 257 - 262. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. O. Lim Statistical process control tools for monitoring clinical performance Int. J. Qual. Health Care, February 1, 2003; 15(1): 3 - 4. [Full Text] [PDF] |
||||
![]() |
D. SPIEGELHALTER, O. GRIGG, R. KINSMAN, and T. TREASURE Risk-adjusted sequential probability ratio tests: applications to Bristol, Shipman and adult cardiac surgery Int. J. Qual. Health Care, February 1, 2003; 15(1): 7 - 13. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Shahian, S.-L. Normand, D. F. Torchiana, S. M. Lewis, J. O. Pastore, R. E. Kuntz, and P. I. Dreyer Cardiac surgery report cards: comprehensive review and statistical critique Ann. Thorac. Surg., December 1, 2001; 72(6): 2155 - 2168. [Abstract] [Full Text] [PDF] |
||||
![]() |
F Kee, R H Wilson, C Harper, C C Patterson, K McCallion, R F Houston, R J Moorehead, J M Sloan, B J Rowlands, and R. Shields Influence of hospital and clinician workload on survival from colorectal cancer: cohort study • Commentary: How experienced should a colorectal surgeon be? BMJ, May 22, 1999; 318(7195): 1381 - 1386. [Abstract] [Full Text] |
||||
![]() |
W. D. Spector and D. B. Mukamel Using Outcomes to Make Inferences about Nursing Home Quality Eval Health Prof, September 1, 1998; 21(3): 291 - 315. [Abstract] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||