15 January 1995 | Volume 122 Issue 2 | Pages 125-132
Objective: To compare and contrast a managed care program's analysis of differences in hospital mortality with results obtained by accepted statistical methods.
Design: A re-analysis of computerized discharge data using the same method used by a managed care program, and using conventional methods of categorical data analysis. One thousand computer simulations of a method for comparing hospitals by severity-adjusted mortality were done to determine the probability of falsely identifying hospitals as high-mortality outliers.
Setting: 22 acute care hospitals in central Pennsylvania.
Patients: All adult patients with pneumonia (n = 4587; diagnosis-related groups 089-090) less than 65 years of age who were discharged from the 22 hospitals in 1989, 1990, and 1991, excluding patients with the acquired immunodeficiency syndrome and transplant recipients.
Measurements: In-hospital mortality adjusted for age and severity of illness using MedisGroups admission severity group score.
Results: The hospital that had the highest mortality for adult pneumonia according to the managed care program's analysis did not, according to an appropriate analysis, differ significantly from other area hospitals (likelihood ratio test, P = 0.23). Random variation in this sample of patients with a low average mortality rate (3.5%) showed a 60% chance that 1 or more of the 22 hospitals would be falsely identified as a "high-mortality outlier" when simplistic statistical methods were used.
Conclusion: Organizations seeking to compare the quality of hospitals and physicians through outcome data need to recognize that simplistic methods applicable to large samples fail when applied to the outcomes of typical patients, such as those admitted for pneumonia. Although these comparisons are much in demand, careful attention must be paid to their statistical methods to ensure validity and fairness.
A less frequently discussed but equally vital issue in the comparison of patient outcomes is the appropriateness of the statistical methods used for analyzing data and interpreting results. Raw data are clearly insufficient for comparing hospitals or physicians; some statistical methods must be invoked. Ideally, using a set of techniques called standardization [18], the comparison of different samples of patients in varied clinical settings should be feasible.
We examine one attempt to compare hospitals based on their observed mortality; this attempt was made in 1992 by consultants to a managed care program for a large corporation. The corporation sought to determine which of the hospitals serving the corporation's employees in central Pennsylvania delivered better quality of care as reflected in part by fewer in-hospital deaths in 1989 and 1990. Partly on the basis of the methods reported here, the corporation selected 10 of these hospitals to be eligible for its managed care network.
Focusing on adult pneumonia, which was one of the diagnostic groups used by the consultants, we describe the consultants' methods and re-create their results for 1989 and 1990. Then, we reassess those results using more appropriate methods, and finally, we validate results using newly acquired 1991 data. (A glossary of statistical terms is given in Appendix 1.)
The study sample included data reported to the Pennsylvania Health Care Cost Containment Council on all inpatient hospitalizations for adult pneumonia (diagnosis-related groups 089-090) in 1989 and 1990, and, later, in 1991, for 22 hospitals in central Pennsylvania. Because we were using the consultants' methods, we excluded from analysis all patients older than 65 years of age, for whom Medicare would likely be the primary insurer. In addition, we excluded four patients with the acquired immunodeficiency syndrome and seven awaiting organ transplantation because only one hospital provided these specialized kinds of care. Examination of the 1991 data did not begin until after completion of the re-analysis for the years 1989 and 1990.
The public-use data sets include coded discharge abstracts and the MedisGroups (Mediqual Systems, Westborough, Massachusetts) admission severity group, an indicator of severity of illness [19]. Admission severity group attempts to distill many key clinical findings at patient admission into a single, five-level severity score. In the years 1989-1991, admission severity group was not diagnosis specific; a patient received a given admission severity group score regardless of the cause of his or her illness. All hospitals in Pennsylvania must subscribe to the MedisGroups system, calculate admission severity groups retrospectively from medical records, and report these data and computerized discharge abstracts to the Cost Containment Council.
The Managed Care Consultants' Analysis
The consultants relied only on admission severity group to adjust for both severity of acute illness and comorbid conditions. Patient age, which is not an element in admission severity group, was not used. The consultants' statistical technique is best described as a stratified, standard, normalized analysis. First, they calculated death rates for each admission severity group within each institution. Next, they computed the unweighted mean and standard deviations for admission severity group death rates for all hospitals. They then computed a standard, normalized [20] difference between the hospital and the average death rate for each level of admission severity group by dividing the difference by the standard deviation of death rates. Finally, they combined these normalized differences across diagnosis-related group levels within each hospital by a weighted average. Weights depended on the fraction of the hospital's admissions in each admission severity group. The consultants did not calculate standard errors or P values for these weighted averages of normalized differences; they displayed their results along a horizontal line, indicating the location of the average normalized difference and the relative position of a given hospital along the line. Further details of our implementation of the consultants' methods are given in Appendix 2.
Methods of Re-analysis
Our re-analysis of the data on adult pneumonia from the same 22 hospitals sought to confirm or refute the consultants' results with conventional methods for data with binary outcomes, in this case, death or survival at discharge.
We implemented model-based standardization [21] to compare hospital mortality adjusted for multiple patient risk factors. First, we verified the association between death and the patient's admission severity group by logistic regression as implemented in SAS, Proc Logistic (SAS Institute, Cary, North Carolina). Next, the hospital with the worst outcome in 1989-1990 (hospital 1) as calculated by the managed care consultants' method Table 1, was compared with the others by fitting two "nested" regression models: one with only admission severity group as a covariate, the other with admission severity group plus an indicator variable to detect any difference between hospital 1 and all other area facilities. The likelihood ratio statistic for the differences between the two models showed whether the hospital with the worst outcome, according to the consultants' analysis, was truly different from the other hospitals. ACADEMIA AND CLINIC
Comparing Hospital Mortality in Adult Patients with Pneumonia: A Case Study of Statistical Methods in a Managed Care Program
Comparing hospitals by their outcomes has become a popular method of drawing inferences about their relative quality. Outcome studies are now done regularly by employer- and insurer-sponsored managed care programs [1] and by federal [2] and state agencies [3-5]. Mortality, a well-defined outcome, has received the most study, and the public availability of raw mortality data has led to their frequent use as a measure of quality of care. Differences in mortality are known to be attributable to differences in patients' severity of illness [6, 7] or comorbid conditions [8]; thus, researchers have tried to develop methods to adjust raw data for these factors [9]. A well-described danger of such methods lies in our inability to identify all clinical predictors of mortality from computerized data sets [10-13]. In addition, a fundamental dispute remains about whether there is any correspondence between mortality and quality of care [14, 15]. The desire of health care payers for simple measures of hospital performance conflicts with the skepticism of health care providers about the validity and fairness of the assessment process [16, 17].
Methods
![]()
Top
Methods
Results
Discussion
Author & Article Info
References
Choice of Diagnosis-related Groups
|
Also relevant to the outcome of many acute diseases is patient age, a factor readily available in the public-use data set but not used by the consultants. Based on a preliminary calculation showing that mortality increased with patient age, patients were grouped into three age ranges: 18 to 39 years, 40 to 54 years, and 55 to 64 years. For age to affect interhospital comparisons of pneumonia mortality, two conditions must be met: The distribution of ages must differ across hospitals and age must be associated with death after controlling for admission severity group. We tested the first condition by cross-classifying the three age categories and the 22 hospitals and calculating the Mantel-Haenszel chi-square statistic for differences in mean age across hospitals. We tested the second condition by using the likelihood ratio statistic for the difference in two logistic regression models: one with both admission severity group and age as factors and the other with only admission severity group as a factor.
The last step in testing whether hospital 1 differed significantly from the others again involved the comparison of two nested models. For this comparison, the two models included both age and admission severity group as standardization factors, and one also included an indicator variable for hospital 1. The ratio of the odds of mortality and a P value were computed as previously described.
After determining whether hospital 1 had an adjusted mortality higher than that of the other hospitals, we investigated whether another hospital among the 22 might have had significantly higher mortality than expected. Logistic regression also allows for the calculation of predicted probabilities of death for each patient with a given age and admission severity group level. By summing the number of deaths and this predicted probability of death over all patients with pneumonia within each hospital, we computed at the hospital level both an observed and a predicted (or expected) number of deaths.
Hospital 4 showed the largest positive difference between observed and predicted numbers of deaths and thus became the test hospital for further analysis of differences in mortality. As before, we fit two nested regression models, one with admission severity group and age as covariates and the other with an additional single indicator variable for hospital 4, to test whether patient mortality was significantly higher there than at other facilities.
Multiple Comparisons
Contrasts among hospital death rates in the consultants' study, as in typical outcomes analyses, were determined by the data rather than planned in advance. Each hospital was compared in turn to determine whether it differed from all others. In this situation, a critical P value of 0.05 used repeatedly for each hospital comparison did not function to limit type I error, which was the probability of falsely classifying a hospital as a mortality outlier. Diehr and colleagues [22, 23] have commented on the issue of multiple comparisons, or multiple statistical tests, in a related context. The Bonferroni method [24], one of several available techniques [25], was used to adjust the critical P value to control type I error for multiple comparisons of hospitals.
Computer Simulations
Computer simulations were done to determine the overall type I error rates caused by using various critical P values for significance tests. Multiple comparison procedures can affect type II error rates while they control type I error. In this context, type II error is the probability of failing to identify a true mortality outlier [26]. Simulations that used the actual pneumonia data from the 22 hospitals and that applied the logistic regression model described previously produced estimates of both types of error under different conditions.
Simulation of type I error involved generating deaths randomly by computer for each hospital. Each patient with the same age and the same level of admission severity group was assumed to have the same probability of death, as predicted from the logistic regression model. The computer simulated the chance or random death of a patient with this predicted probability of death. At the hospital level, the sum of the individual random deaths became the simulated number of random hospital deaths; the sum of the patient-level predicted probabilities of death became the expected number of deaths. Each simulation thus produced a difference between a chance observed death rate and an expected death rate, and this difference was transformed into a standardized difference in the two rates for each hospital [27]. This was not, however, the same standardized difference as computed by the consultants: Our differences were computed at the hospital level rather than at the much smaller admission severity group levels within each hospital. The standardized difference, or z statistic, was then compared with two critical values for type I error: the typical critical P value of 0.05 (corresponding to a z statistic
1.96) and a smaller critical P value of 0.002 (corresponding to a z statistic
3.08) adjusted for multiple comparisons by the Bonferroni method. The latter critical P value is simply 0.05 divided by 22, the number of hospitals being compared simultaneously. A hospital with a z statistic larger than that corresponding to the chosen critical P value in any simulation was a false-positive high outlier. The number of false-positive outliers in 1000 simulations provided an estimate of overall type I error.
The estimation of type II error was done in a similar way. Hospital 4 again became the test case. Each simulation generated a random number of deaths for each hospital. This time, however, we altered the true probability of death for each patient at hospital 4 to be greater than average. We estimated type II error as the number of simulations in which hospital 4 was missed as a high outlier. The 1000 simulations also tested whether a Bonferroni-adjusted critical P value of 0.002 would result in the loss of statistical power to detect a true mortality outlier (increased type II error).
Finally, we simulated the effect of larger and smaller overall death rates for pneumonia, and of larger numbers of patients with pneumonia at all 22 hospitals, on the probability of type I and type II errors. Each simulation was done 1000 times.
Validation with 1991 Data
After completing all the analyses described above, we analyzed the 1991 pneumonia data for the 22 hospitals. We computed crude death rates for each hospital Table 1 to compare them with the ranking of hospitals done by the managed care consultants for 1989-1990.
Results
|
|---|
|
|
|---|
|
Managed Care Analysis
The consultants' methods produced hospital-specific standard normalized death rates ranging from 0.69 to 2.35. (Table 1). The consultants considered all eight hospitals with standardized death rates greater than the average (greater than 0.0) to have poorer-than-average performance for this diagnosis. Hospital 1 fared the worst with a raw death rate of 5.32% (5 of 94); its normalized rate (2.35) was more than two standard deviations above the average and was substantially higher than that of the second worst institution (0.90). The hospital with the highest raw death rate did not rank as the worst because of adjustment for patients' severity of illness (admission severity group scores).
Re-analysis
When conventional statistical methods for assessing categorical data were applied, mortality at hospital 1 did not differ significantly from that of the 21 other area hospitals (likelihood ratio test, P = 0.13), even when severity adjustment was made only by admission severity group. Data available to the managed care consultants showed clearly that the 22 central Pennsylvania hospitals had significantly different distributions of patients with pneumonia by age (Mantel-Haenszel chi-square, P < 0.001); hospital 1 had the largest percentage (53.2%) of patients in the highest age group (55 to 64 years). Age was also associated with mortality even after adjustment was made for admission severity group; this omitted factor was therefore a confounding variable [28]. After adjusting for both age and admission severity group, the odds of death were still higher at hospital 1 than at other hospitals (odds ratio, 1.8), but the odds were lower than the comparable Figure adjusted only for admission severity group (odds ratio, 2.3). The P value for the difference between hospital 1 and the other hospitals increased from 0.13 to 0.23 using the likelihood ratio test. The re-analysis, therefore, found no difference in mortality between hospital 1 and other area hospitals.
Our further analysis of hospital mortality, standardized by both admission severity group and age, produced a different ordering of adjusted hospital death rates (Figure 1). Hospital 1 now ranked fourth highest; hospital 4 ranked highest because it showed the largest positive difference between observed and expected death rate (3.7 percentage points). The odds ratio for mortality at hospital 4 compared with all other hospitals was 2.60 (likelihood ratio statistic, P = 0.012). Had we hypothesized before the analysis that this hospital alone would be compared with all others, this result could have been compared with conventional levels of statistical significance, such as the commonly cited critical P value of 0.05 [29]. However, the comparison was post hoc; it was suggested by the data after an analysis that involved multiple comparisons. Therefore, we adopted a smaller critical P value (P = 0.002).
|
Simulations
Computer simulations offered insight into the critical P value (or adjusted z statistic) that should apply to the classification of individual hospital outliers in order to maintain a type I error of approximately 5% for all hospital comparisons. With a critical P value of 0.05, 59.4% of the simulations erroneously classified at least 1 of 22 hospitals as a "high-mortality outlier" (type I error). A conventional critical P value thus produced many false findings when random variation was the only cause of differences in mortality.
Other simulations showed the interrelation of the overall level of mortality, the number of patients in the sample, the size of the true difference between observed and expected mortality, the critical P value, and type I and type II errors. A simulation with a critical P value of 0.05 but with a lower overall mortality for all patients with pneumonia (one fourth the predicted death rate) resulted in a higher chance of falsely classifying at least one hospital as a high-mortality outlier (67.9%); a higher death rate (double the predicted rate) reduced that chance (49.9%). Doubling the number of patients in each hospital while holding mortality constant lowered the chances of false outliers from 59.4% to 51.5%.
The Bonferroni adjustment, made to account for implicit, multiple comparisons among 22 hospitals, reduced the critical P value to 0.002. Computer simulations with this P value showed a decrease in the overall type I error (the chance of a false finding of at least one high outlier) to 7.9% (Table 3). As expected, this adjustment to the critical P value did increase the type II error (the probability of missing a true high-mortality outlier). For example, the type II error for hospital 4 was 0.626 (62.6%) when the true mortality for each patient in that hospital was simulated at twice the expected rate. The type II error decreased to 0.244 (24.4%) when true mortality was simulated at 2.5 times the expected rate, and it decreased to 0.004 (0.4%) with a simulated doubling of the number of patients in each institution.
|
The simulations thus showed the high probability of type I error with mortality data of this type. Statistical power improves (type II error decreases) when a hospital's true death rate is several times higher than expected or when the true death rate is elevated and the number of patients in the sample is larger. When a critical P value of 0.002 was used, hospital 4 did not attain the requisite level of statistical significance for an outlier designation for 1989-1990.
Confirmation with 1991 Data
The follow-up data from 1991 corroborated our assessment that the managed care consultants' methods falsely classified hospital 1 as having the worst mortality record (type I error). In 1991, this hospital had no deaths among 47 patients. The rate of admission was about the same as in the previous 2 years, and the 1991 admissions occurred before the conclusion of the managed care study: The consultants' report could not have affected clinical care for 1991. Two other hospitals that had higher-than-average mortality in 1989-1990 also had no deaths in 1991. Hospital 4, which had the highest but still did not have significant adjusted mortality in our re-analysis, had average mortality in 1991.
Discussion
|
|---|
|
|
|---|
The example of adult pneumonia is particularly instructive. Most hospitals admit patients for pneumonia; inpatient deaths for patients under the age of 65 are relatively rare. Often, there are no deaths in subgroups of patients defined by the cross-classification of diagnosis, age, and severity [32]. In these cases of rare outcomes and limited sample sizes, commonly used statistical methods that might be satisfactory for large samples can work poorly [33].
We used logistic regression to standardize interhospital differences in patients and to test for differences between one hospital and all others. Contingency Table analysis, Cochran-Mantel-Haenszel methods [34], and exact methods [35], which are especially appropriate when deaths are few, confirmed our findings. Grouping patients might provide samples large enough to justify the use of simplistic statistical methods but at the cost of introducing bias and defeating the goal of standardization if dissimilar patients are grouped together. For example, the consultants grouped patients regardless of age, but we identified age as a textbook example of confounding; inclusion of age improved the relative performance of hospital 1, which cared for an older patient population. It is likely that additional factors might explain some of the remaining variation in mortality; for example, the case-fatality rates for pneumonia vary according to causative organisms but such differences [36] might not be reliably ascertained from an administrative database. The omission of relevant clinical factors can lead to biased results. It can also make results more uncertain by increasing variance [37].
Our computer simulations show the danger of implicit, multiple comparisons in these analyses. The interplay of type I error and the number of comparisons is a fundamental concept in statistical methods [38, 39]. Our simulations show that this danger of false-positive findings can be high and will increase with smaller numbers of patients and with lower average mortality rates. Yet, use of 0.05 as the critical P value is common among investigators comparing multiple hospitals simultaneously [4, 40].
Methods other than the one we cite are available to control for multiple comparisons [41]. Computer simulations can also serve this purpose by estimating both type I error (false-positive results) and statistical power (the power to uncover true differences) with actual data. These estimates can then suggest appropriate critical P values for testing hospital differences.
The possibility of finding a false hospital outlier could be confirmed by an examination of data from another year. If the outlier hospitals change from year to year, one should suspect the cause to be random variation rather than secular change; this is precisely what we found. Data from 1991 found the "worst" hospital (hospital 1) becoming one of the "best." In all likelihood, nothing actually changed; random variation explains the difference.
In focusing on statistical issues, we do not advocate using the most elaborate or computer-intensive models. Exploratory data analysis [42] might be a helpful initial step, provided that the investigator appreciates the tendency of the untrained eye to perceive regularity or clustering where outcomes are in fact random [43]. These and other methods can be especially appropriate for screening; for example, the Health Care Financing Administration uses complex survival models. Nevertheless, its publication of hospital-specific mortality contains this cautionary note: "The information in this release is not intended as a direct measure of quality of care" (44 [emphasis in original]). More recently, the Administration has declined to publish any mortality results because of the inadequacies of the data [45].
The adverse consequences of the misapplication of statistical methods or the misinterpretation of sound ones, although perhaps only short term at the economic levels [46], might be severe and lasting on the personal level. A managed care program might be able to re-evaluate data and reverse an earlier decision, but not necessarily before physician-patient relationships are disrupted when patients must change hospitals and clinicians to maintain insurance coverage. Competent and efficient providers of care might lose business or even disappear merely because they care for patients whose diseases are more complex than computerized data can describe or because random deaths happened to cluster in the 1 or 2 years for which data are available. Although new, high-quality institutions might eventually grow to replace the old ones, individual clinical careers and patients might suffer permanent harm. Patients' worries about limited choice under managed care will take on added fervor if ad hoc outcome analyses needlessly restrict their choices.
The statistical issues we address are not likely to be resolved soon. Consumer groups want "report cards" to assure professional accountability [47]; managed care programs and government agencies, under pressure to promote competition in price and quality, produce one-time or annual comparisons. Recent reports on the federal health reforms in the United States cite the importance of releasing hospital and physician performance data to the public and to purchasers of health services [48, 49], but relatively scant attention is paid to the statistical methods routinely applied to administrative data. The risk posed by improper analysis to the reputations of skilled providers and to the health of the patients they serve leads us, as it has others [50], to advocate stringent criteria for judging the appropriateness of analytic methods for comparing hospitals and physicians based on patient outcomes.
Appendix 1: Glossary of Statistical Terms
|
|---|
Model-based standardization: The use of a statistical model to control simultaneously for numerous risk factors that predict whether a patient has a given outcome such as in-hospital death, and thus to standardize patient populations for interhospital comparisons. Equivalent to Mantel-Haenszel methods but more difficult to interpret. Easily handles many risk factors and their interactions and is available in many commercial software packages. Encounters difficulty with small cell sizes, causing program to fail to converge to a solution. Tests of significance must be structured and interpreted with care.
Exact tests: Special statistical methods similar to Mantel-Haenszel procedures but they do not rely on assumptions of large samples for the computation of P values or confidence intervals. Especially useful when samples are small or zero and when outcome rates are low. Increasingly available in specialized commercial software.
Indicator variable: A variable in a regression model that defines two groups (of hospitals, for example). The resulting estimate of this variable from the regression indicates the size and significance of the difference (for example, in mortality) between the two groups.
Nested models: A pair of statistical models, such as those used for standardization, in which the smaller model contains a subset of factors found in the larger model. Used for testing whether the additional variable or variables in the larger model are significant (see likelihood ratio test).
Likelihood ratio test: A statistical test of the difference between two nested statistical models that indicates whether the variables in the larger model and not in the smaller model are significant.
Type I error: The probability that a statistical test or series of tests will find a significant difference or a significant comparison when in truth no difference exists (see multiple comparison procedures).
Type II error: The probability that a statistical test will fail to detect a true difference. Related to sample size and statistical power.
Statistical power: The power of an analysis to detect a true difference with a given sample size and a given type I error.
Multiple comparison procedures: A set of techniques that allows the investigator to make repeated or multiple comparisons among subgroups (such as hospitals) while controlling the level of overall type I error in the analysis.
Simulation: The use of the computer to generate a large number of samples of patients with the characteristics of actual patients and to use these samples to test the performance of statistical tests, such as their power or type I or II errors.
z statistic: Also known as the standard normal deviate, the standard normal variate, the normal variate in standard form, and the standard score. A measurement or proportion that has been rescaled so that its distribution resembles a standard normal distribution with mean = 0 and standard deviation = 1.
Confounding variable: A second risk factor that influences and therefore "confounds" the measure of association between the risk factor of interest and the outcome.
Exploratory data analysis: The use of statistical methods and visual displays of data to identify patterns that warrant further investigation. Possible to do with many commercial software packages.
Appendix 2: Statistical Methods of the Managed Care Program
|
|---|
These ASG-specific standardized rates were combined within each hospital by a weighted average, where the weights for each level of ASG were wha = nha/nh, and nh was the number of cases in hospital h. The weights summed to 1 within hospital. On the basis of this method, the hospital-specific standardized mortality became zh = Sigma wha x zha.
A value of zh greater than zero would indicate a hospital with higher-than-average mortality, and a zh of 2 would represent a mortality that was two standard deviations above the average.
Author and Article Information
|
|---|
|
|
|---|
References
|
|---|
|
|
|---|
1. Woolsey C. Start buying health care results, not just services, employers told. Business Insurance. 14 Sep 1992:3,11.
2. Krakauer H, Bailey RC. Epidemiologic oversight of the medical care provided to Medicare beneficiaries. Stat Med. 1991; 10:521-40.
3. Altman LK. Surgical scorecards: can doctors be rated just like ballplayers? The New York Times. 14 Jan 1992:63.
4. Hannan EL, Kilburn H Jr, O'Donnell JF, Lukacik G, Shields EP. Adult open heart surgery in New York State. An analysis of risk factors and hospital mortality rates. JAMA. 1990; 264:2768-74.
5. Pennsylvania Health Care Cost Containment Council. A Consumer Guide to Coronary Artery Bypass Graft Surgery. Harrisburg, Pennsylvania: Health Cost Care Containment Council; 1992.
6. Green J, Wintfeld N, Sharkey P, Passman LJ. The importance of severity of illness in assessing hospital mortality. JAMA. 1990; 263:241-6.
7. Horn SD, Sharkey PD, Buckle JM, Backofen JE, Averill RF, Horn RA. The relationship between severity of illness and hospital length of stay and mortality. Med Care. 1991; 29:305-17.
8. Greenfield S, Aronow HU, Elashoff RM, Watanabe D. Flaws in mortality data. The hazards of ignoring comorbid disease. JAMA. 1988; 260:2253-5.
9. Alemi F, Rice J, Hankins R. Predicting in-hospital survival of myocardial infarction. A comparative study of various severity measures. Med Care. 1990; 28:762-75.
10. Jencks SF, Daley J, Draper D, Thomas N, Lenhart G, Walker J. Interpreting hospital mortality data. The role of clinical risk adjustment. JAMA. 1988; 260:3611-6.
11. Blumberg MS. Biased estimates of expected acute myocardial infarction mortality using MedisGroups admission severity groups. JAMA. 1991; 265:2965-70.
12. Smith DW, Pine M, Bailey RC, Jones B, Brewster A, Krakauer H. Using clinical variables to estimate the risk of patient mortality. Med Care. 1991; 29:1108-29.
13. Burns R, Nichols LO, Graney MJ, Applegate WB. Mortality in a public and a private hospital compared: the severity of antecedent disorders in Medicare patients. Am J Public Health. 1993; 83:966-71.
14. DuBois RW, Rogers WH, Moxley JH 3d, Draper D, Brook RH. Hospital inpatient mortality. Is it a predictor of quality? N Engl J Med. 1987; 317:1674-80.
15. Iezzoni LI. Severity standardization and hospital quality assessment. In: Couch JB, ed. Health Care Quality Management for the 21st Century. Tampa, FL: American College of Physician Executives; 1991:177-234.
16. Iezzoni LI, Shwartz M, Restuccia J. The role of severity information in health policy debates: a survey of state and regional concerns. Inquiry. 1991; 28:117-28.
17. Topol EJ, Califf RM. Scorecard cardiovascular medicine. Its impact and future directions. Ann Intern Med. 1994; 120:65-70.
18. Mosteller F, Tukey JW. Data Analysis and Regression. Reading, Massachusetts: Addison-Wesley; 1977:221-57.
19. Iezzoni LI, Moskowitz MA. A clinical assessment of MedisGroups. JAMA. 1988; 260:3159-63.
20. Snedecor GW, Cochran WG. Statistical Methods. 7th ed. Ames, IA: Iowa State University Press; 1980:41-2.
21. Kahn HA, Sempos CT. Statistical Methods in Epidemiology. New York: Oxford University Press; 1989:144-59.
22. Diehr P, Cain KC, Kreuter W, Rosenkranz S. Can small-area analysis detect variation in surgery rates? The power of small-area variation analysis. Med Care. 1992; 30:484-502.
23. Diehr P, Grembowski D. A small area simulation approach to determining excess variation in dental procedure rates. Am J Public Health. 1990:80:1343-8.
24. Inglefinger JA, Mosteller F, Thibodeau LA, Ware JH. Biostatistics in Clinical Medicine. New York: Macmillan; 1983:168-70.
25. Miller R. Multiple comparisons. In: Kotz S, Johnson NL, eds. Encyclopedia of Statistical Sciences. v 5. New York: John Wiley & Sons; 1985:679-89.
26. Freiman JA, Chalmers TC, Smith H Jr, Kuebler RR. The importance of ß, the type II error, and sample size in the design and interpretation of the randomized control trial. Survey of 71 negative trials. N Engl J Med. 1978; 299:690-4.
27. Armitage P. Statistical Methods in Medical Research. New York: John Wiley & Sons; 1971:111-4.
28. Moses LE. Statistical concepts fundamental to investigations. N Engl J Med. 1985; 312:890-7.
29. Thomas DC, Siemiatycki J, Dewar R, Robins J, Goldberg M, Armstrong BG. The problem of multiple inference in studies designed to generate hypotheses. Am J Epidemiol. 1985; 122:1080-95.
30. Iezzoni LI, Ash AS, Coffman GA, Moskowitz MA.Predicting in-hospital mortality. A comparison of severity measurement approaches. Med Care. 1992; 30; 347-59.
31. Iezzoni LI. Black box medical information systems. A technology needing assessment (Editorial). JAMA. 1991; 265:3006-7.
32. Luft HS, Hunt SS. Evaluating individual hospital quality through outcome statistics. JAMA. 1986; 255:2780-4.
33. Armitage P. Statistical Methods in Medical Research. New York: John Wiley & Sons; 1971:76-7.
34. Agresti A. Categorical Data Analysis. New York: John Wiley & Sons; 1990:230-5.
35. Mehta CR, Patel NR, Gray R. On computing an exact confidence interval for the common odds ratio in several 2 x 2 contingency tables. Journal of the American Statistical Association. 1985; 80:969-73.
36. Cotton VR, Weitekamp MR. Community-acquired pneumonia. In: Bone RC, Dantzker DR, George RB, Matthay RA, Reynolds HY, eds. Pulmonary and Critical Care Medicine. v 1. St. Louis: Mosby-Year Book; 1993:Part J, 1-15.
37. Aitkin M, Anderson D, Francis B, Hinde J. Statistical Modelling in GLIM. New York: Oxford University Press; 1989:213-6.
38. Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials. A survey of three medical journals. N Engl J Med. 1987; 317:426-32.
39. Maxwell SE, Delaney HD. Designing Experiments and Analyzing Data. Belmont, California: Wadsworth; 1989:171-80.
40. Hartz AJ, Kuhn EM, Kayser KL, Pryor DP, Green R, Rimm AA. Assessing providers of coronary revascularization: a method for peer review organizations. Am J Public Health. 1992; 82:1631-40.
41. Greenland S, Robins JM. Empirical-Bayes adjustments for multiple comparisons are sometimes useful. Epidemiology. 1991; 2:244-51.
42. Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. New York: John Wiley & Sons; 1983.
43. Kahneman D, Tversky A. Subjective probability: a judgment of representativeness. In: Kahneman D, Slovic P, Tversky P, eds. Judgment under Uncertainty: Heuristics and Biases. Cambridge: Cambridge University Press; 1982:32-47.
44. Medicare Hospital Information. Washington, DC: U.S. Dept of Health and Human Services, Health Care Financing Administration; 1992:iii.
45. Vladeck calls temporary halt to annual Medicare death studies. Medical Utilization Review. 1993; 21 [11]:1.
46. Pauly MV. The public policy implications of using outcome statistics. Brooklyn Law Review. 1992; 58:35-53.
47. Meier B. Hurdles await efforts to rate doctors and medical centers. The New York Times. 31 Mar 1994; 143:A1,B8.
48. Anders G. Visits to doctor's office will be different. The Wall Street Journal. 13 Sep 1993; 122[51]:B1,B4.
49. Rosenthal E. Heart bypass death risk in the state less in '92. The New York Times. 27 Dec 1993; 143:B5.
50. Hannan EL, Kilburn H Jr, Lindsey ML, Lewis R. Clinical versus administrative data bases for CABG surgery. Does it matter? Med Care. 1992; 30:892-907.
Related articles in Annals:
This article has been cited by other articles:
![]() |
B. Mozes, Y. Maor, L. Olmer, and E. Shabtai Comparing Medical Centers Treating Hip Fractures in the Elderly: The Importance of Multi-Outcome Measurements American Journal of Medical Quality, May 1, 1999; 14(3): 117 - 123. [Abstract] [PDF] |
||||
![]() |
G. E. Rosenthal, M.-M. Chren, R. J. Lasek, and C. S. Landefeld What Patients Should Ask of Consumers' Guides to Health Care Quality Eval Health Prof, September 1, 1998; 21(3): 316 - 331. [Abstract] [PDF] |
||||
![]() |
A. G. Mainous III and J. Talbert Assessing Quality of Care via HEDIS 3.0: Is There a Better Way? Arch Fam Med, September 1, 1998; 7(5): 410 - 413. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||