Home |
Current Issue |
Past Issues |
In the Clinic |
ACP Journal Club |
CME |
Collections |
Audio/Video |
Mobile |
Subscribe |
Tools |
Help |
ACP Online
|
15 October 1997 | Volume 127 Issue 8 Part 2 | Pages 752-756
Background: Researchers are increasingly interested in examining costs of care, and large administrative and clinical databases have made relevant data readily available. Because a few patients incur high costs relative to most patients, the distribution of cost data is often skewed. How robust are the usual methods of cost analysis against the skewed distribution of cost data?
Objective: To determine the methods commonly used for comparing cost data, describe their limitations, and provide an alternate method of analysis.
Design: Review of statistical methods used in studies of medical costs published in medical journals between January 1991 and January 1996. Description of a Z-score method appropriate for testing the equality of mean costs between two log-normal samples; and reanalysis of published two-sample comparison results done by using the Z-score method.
Results: For two-sample comparisons, three methods were commonly used: the Student t-test on untransformed costs, the Wilcoxon test on untransformed costs, and the Student t-test on log-transformed costs. The t-test on untransformed costs ignores the skewness in cost data, the Wilcoxon test ignores unequal variances, and the t-test on log-transformed costs tests the wrong null hypothesis unless variances in the log-scale are equal.
Eleven articles included two-sample tests and had enough information to allow reanalysis of the data using the Z-score method. These articles described a total of 23 Wilcoxon tests and 24 t-tests on untransformed costs. Most results did not change on reanalysis, but six results changed enough to alter conclusions. Specifically, reanalysis of data for which one Wilcoxon test had shown statistically significant results showed nonsignificant results; reanalysis of data for which two Wilcoxon tests had shown nonsignificant results showed statistically significant results. In articles that used t-tests on untransformed costs, two statistically significant results became nonsignificant on reanalysis and one nonsignificant result became statistically significant on reanalysis.
Conclusions: The methods commonly used to compare costs of two groups have limitations. Some limitations may change some conclusions, and the direction of the change cannot be predicted. The Z-score method is designed to adjust for skewness in cost data and is appropriate for comparing means of log-normally distributed cost data.
The distribution of cost data is often skewed because a small percentage of patients invariably incur extremely high costs relative to most patients. The skewed distribution of cost data is often log-normal; that is, the log-transformed cost data follow a normal distribution [1-3, 5-7]. For the comparison of mean costs between two groups, three commonly used methods are the parametric Student t-test on untransformed costs, the parametric Student t-test on log-transformed costs, and the nonparametric Wilcoxon test.
These methods have some limitations in the analysis of skewed cost data. The use of parametric Student t-tests on untransformed costs is based on the assumption that cost is approximately normally distributed; this is especially important in small to moderate samples. Nonparametric Wilcoxon tests are appropriate for testing the equality of two-sample means only when the shape (and thus the variance) of the distribution of cost is the same in both groups. The use of parametric Student t-tests on log-transformed costs assumes that cost has a log-normal distribution. However, testing the equality of means in the log-scale is equivalent to testing the equality of means in the original scale only if variances in the log-scale are equal. The impact of violating these assumptions when applying these common methods to cost data has not been assessed.
To overcome these limitations, Zhou and coworkers [8] proposed a Z-score method for comparing the means of log-normally distributed costs between two groups. The purpose of this paper is to examine the effect of this approach on the results and conclusions of previously published studies that compare costs. We identified articles recently published in the medical literature and describe how they analyzed their skewed cost data. Wherever enough information was given, we reanalyzed the data using the Z-score method and observed whether the conclusions were altered. Finally, we make some recommendations about the use of the various tests.
When comparing costs between two groups, health services and clinical researchers are often interested in whether mean costs are the same between the two groups. If we denote the mean costs of two groups as M1 and M2, our null hypothesis of interest is H0: M1 M2 = 0.
Assume that log-transformed costs in group 1 and group 2 are normally distributed with means micro1 and micro2, respectively, and variances sigma12 and sigma22, respectively. Because log M1 = micro1 + sigma12 and log M2 = micro2 + sigma22 (Appendix 1), H0 is equivalent to the following:
H0: (micro2 + sigma22/2) (micro1 + sigma12/2) = 0.
The new test [8] is a Z-score based on an estimate of (micro2 + sigma22/2) (micro1 + sigma12/2) divided by its SE; computation of the Z-score is described in Appendix 2.
The Z-score method is appropriate for testing H0 if the cost data are log-normally distributed. It does not ignore the skewness of cost data (as the t-test on untransformed data does), nor does it require equal variances in the original scale (as the Wilcoxon test does) or in the log-scale (as the t-test in the log-scale does). Appendix 1 gives more detail about why the t-test on log-transformed costs is testing a null hypothesis, H0 *: micro1 micro2 = 0, which is different from H0 unless sigma1 = sigma2.
In a simulation study, assuming that the cost data are log-normal, Zhou and colleagues [8] showed that when variances in the log-scale are unequal, the Wilcoxon test and the t-tests on both the log-scale and the original scale are all subject to incorrect type I error rates, but the Z-score method has accurate type I error rates and has adequate power. When variances in the log-scale are equal (that is, when H0 and H0 * are equivalent), the performance of the Z-score method is similar to that of the Wilcoxon test and the t-tests on both the log-scale and the original scale. Thus, the Z-score method is always appropriate for testing the equal mean costs of two log-normal samples, regardless of the equality or inequality of variances in the log-scale.
Identification of Articles
To study the statistical methods used in the literature, we did a MEDLINE search to identify articles published between January 1991 and January 1996 that had the MeSH heading "costs and cost analysis" (n = 2333). Narrowing the focus to "hospital costs" (n = 414), we then identified articles that were either in the "statistical and numerical" subgroup or articles for which hospital costs were the major focus (n = 146). After we eliminated articles not written in English, commentaries, letters, or editorials, 118 articles remained and were reviewed.
For each article, we recorded sample sizes, descriptive statistics, and the types of analyses conducted. We also recorded whether the article was only descriptive or inferential and whether cost data were transformed (logarithmically or in another way). If they were, we recorded whether justification for the transformation was provided. For articles in which statistical analysis was done and sufficient detail was provided, we reanalyzed the data by using the method of Zhou and colleagues [8]. STATISTICAL METHODS
Methods for Comparison of Cost Data
In this era of increasing emphasis on containment of health care costs, the availability of large clinical and administrative databases has facilitated comparisons of the costs of new treatments or policies in health care delivery systems. Therefore, we are seeing more and more cost analyses in the literature. For example, Medicare claims data have been used to study the hospital charges of older adults [1] and variability in patient-level costs for coronary artery bypass graft surgery [2]. Florida Medicaid claims data have also been used to study the effect of a Medicaid home- and community-based waiver program on Medicaid expenditures for persons with AIDS [3]. The Regenstrief Medical Record System, a clinical database, has been used in several studies on the effectiveness of computer reminders about charges of outpatient diagnostic tests [4] and about inpatient charges in an urban hospital [5]. This database was also used to predict inpatient charges [6] and to examine diagnostic charges associated with symptoms of depression in the elderly [7].
Methods
![]()
Top
Methods
Results
Discussion
Author & Article Info
References
Summary of the Z-Score Method
Results
![]()
Top
Methods
Results
Discussion
Author & Article Info
References
As the Figure 1 shows, 69 articles (58.5%) were descriptive only and did not include statistical inferences. Most of the other 49 articles described more than one statistical test, regression analysis, or both.
|
Logarithmic transformation of cost data was more likely to be done when regression analysis was used. Of the regression analyses done on costs in 21 articles, 11 used the natural logarithm of costs as the dependent variable; 1 used the Cox semiparametric model to account for censored data; and the other 9 used cost as the dependent variable. The most frequent justification for transforming data was skewness of cost data. When t-test results, Wilcoxon test results, or CIs were presented, the analyses had been done most frequently on untransformed data. Of the 36 articles that described two-sample tests, only 4 transformed the data to the natural logarithm of cost: 3 for two-sample t-tests and 1 for a paired t-test. Untransformed data were used in the other analyses. These analyses included Wilcoxon tests to compare two groups (12 cases), two-sample t-tests (22 cases), and a paired t-test (1 case). All four one-sample (t-distribution) CIs were calculated on untransformed cost data.
Eleven articles [9-19] included two-sample tests and contained enough information to allow us to reanalyze the data using the Z-score method. These 11 articles described a total of 23 Wilcoxon tests and 24 t-tests. The Table 1 shows the P values reported in the articles and the P values derived by using the new Z-score method. In some cases, the changes are dramatic and could affect interpretation of the findings. For example, one test (test 16.9) had a P value of 0.16 reported in the article, but the Z-score method produced a P value of 0.001.
|
If a P value less than 0.05 indicates a statistically significant result, six results (three Wilcoxon test results and three t-test results) were changed enough on reanalysis to alter some conclusions. Specifically, one Wilcoxon test (test 17.1) showed statistically significant results in the article, but reanalysis showed nonsignificant results; two Wilcoxon tests (tests 16.6 and 16.9) showed nonsignificant results in the article, but reanalysis showed statistical significance. For those articles that used t-tests on untransformed cost data, two statistically significant results (on tests 18.1 and 19.1) became nonsignificant in reanalysis; one nonsignificant result (on test 12.2) became statistically significant.
Discussion
|
|---|
|
|
|---|
Neither the newly proposed Z-score method nor the Wilcoxon test provides an immediate CI for the difference in log-normal means. Appropriate CIs are available only for the one-sample, log-normal mean [21]. Using the Z-score method may lead to P values that differ from those produced with traditional tests. However, until methods are developed to determine whether there is also an appreciable shift in precision (CIs), the importance of the shift in P value cannot be fully interpreted.
For testing the equality of mean costs between two log-normally distributed samples, we make the following recommendation: For large samples (
1000), the t-test on untransformed costs would be appropriate. For small to moderate samples, if the variances in the log-scale are equal, either the t-test on log-transformed costs or the Wilcoxon test could be used; otherwise, the Z-score method should be used.
If cost data cannot be approximated by a log-normal distribution, we recommend the use of the nonparametric bootstrap approach [8]. If it is necessary to adjust for covariates in assessing differences in the means of costs, a commonly used approach is a linear regression model on untransformed costs or log-transformed costs. Alternatively, the standard Cox semiparametric regression model has also been used to analyze cost data by treating costs as potentially right-censored failure times [22]. The validity of these regression methods for cost data needs to be studied. Finally, if cost data contain many zero costs, we cannot perform logarithms on cost data. We are working on extending our Z-score method to address this situation.
In summary, the availability of large administrative and clinical databases has facilitated studies of health care costs in the literature. The methods commonly used for cost analysis have some limitations that could affect the conclusions of some studies. The use of appropriate statistical methods are important in cost analysis; informed decisions depend on them.
Appendix 1: Why H0 and H0 * Might Not Be the Same
|
|---|
|
| (1) |
The t-test of log-transformed cost data is actually testing the null hypothesis H0 * that the means of log-transformed costs are equal in the two groups. This null hypothesis can be expressed as H0 *: micro1 = micro2. The logarithm of the mean of cost data depends not only on the mean of log-transformed cost data but also on the variance of log-transformed cost data. This relation can be expressed by: (Equation 2) where k = 1,2. These formulas were derived from the fact that the mean of a log-normal distribution is exp(microk + sigmak2/2). For a more detailed explanation, see Johnson and Kotz [23].
|
| (2) |
Thus, the original hypothesis can be written as follows: (Equation 3)
|
| (3) |
From this relation, we see that if the variances of log-transformed cost data are unequal (sigma1
sigma2), the null hypothesis H0 (that logarithms of mean costs in the two groups are the same) differs from the null hypothesis H0 * (that means of log-transformed cost in the two groups are the same). Specifically, if sigma1
sigma2, then even after the t-test of the log-transformed costs accepts the hypothesis H0 * (micro1 = micro2), we could still reject the hypothesis H0. If sigma1 < sigma2, even after the t-test of the log-transformed costs concludes that the mean of the log-transformed cost in group 1 is greater than that in group 2 (micro1 > micro2), we could still accept the hypothesis H0 of equal mean costs; if sigma1 > sigma2, even after the t-test of the log-transformed costs concludes that micro1 < micro2, we could still accept the hypothesis H0.
Appendix 2: Definition of the Z-Score Test Statistic
|
|---|
|
| (4) |
|
| (5) |
Author and Article Information
|
|---|
|
|
|---|
References
|
|---|
|
|
|---|
1. Wolinsky FD, Culler SD, Callahan CM, Johnson RJ. Hospital resource consumption among older adults: a prospective analysis of episodes, length of stay, and charges over a seven-year period. J Gerontol. 1994; 49:5240-52.
2. Cowper PA, DeLong ER, Peterson ED, Lipscomb JL, Muhlbaier LH, Jollis JG, et al. Geographic variation in resource use for coronary artery bypass surgery. IHD Port Investigators. Med Care. 1997; 35:320-33.
3. Anderson KH, Mitchell JM. Expenditures on services for persons with acquired immunodeficiency syndrome under a Medicaid home and community-based waiver program. Are selection effects important? Med Care. 1997; 35:425-39.
4. Tierney WM, Miller ME, McDonald CJ. The effect on test ordering of informing physicians of the charges for outpatient diagnostic tests. N Engl J Med. 1990; 233:1499-504.
5. Overhage JM, Tierney WM, Zhou XH, McDonald CJ. A randomized trial of computer reminders about orders to monitor/prevent adverse effects of drugs orders. J Am Med Inform Assoc. 1997; [In press].
6. Tierney WM, Fitzgerald JF, Miller ME, James MK, McDonald CJ. Predicting inpatient costs with admitting clinical data. Med Care. 1995; 33:1-14.
7. Callahan CM, Kesterson JG, Tierney WM. Association of symptoms of depression with diagnostic test charges among older adults. Ann Intern Med. 1997; 126:426-32.
8. Zhou XH, Gao S, Hui SL. Methods for comparing the means of two independent log-normal samples. Biometrics. 1997; [In press].
9. Allison TG, Williams DE, Miller TD, Patten CA, Bailey KR, Squires RW, et al. Medical and economic costs of psychologic distress in patients with coronary artery disease. Mayo Clin Proc. 1995; 70:734-42.
10. Fisher KS, Reddick EJ, Olsen DO. Laparoscopic cholecystectomy: cost analysis. Surg Laparosc Endosc. 1991; 1:77-81.
11. Guzman LA, Simpfendorfer C, Fix J, Franco I, Whitlow PL. Comparison of costs of new atherectomy devices and balloon angioplasty for coronary artery disease. Am J Cardiol. 1994; 74:22-5.
12. Hamilton A, Norris C, Wensel R, Koshal A. Costs of reduction in cardiac surgery. Can J Cardiol. 1994; 10:721-7.
13. Hunink MG, Cullen KA, Donaldson MC. Hospital costs of revascularization procedures for femoropopliteal arterial disease. J Vasc Surg. 1994; 19:632-41.
14. Incarbone R, Peters JH, Heimbucher J, Dvorak D, Bremner CG, DeMeester TR. A contemporaneous comparison of hospital charges for laparoscopic and open Nissen fundoplication. Surg Endosc. 1995; 9:151-5.
15. Naughton BJ, Moran MB, Feinglass J, Falconer J, Williams ME. Reducing hospital costs for the geriatric patient admitted from the emergency department: a randomized trial. J Am Geriatr Soc. 1994; 42:1045-9.
16. Omoigui NA, Marcus FI, Mason JW, Hahn EA, Hartz VL, Hlatky MA. Cost of initial therapy in the Electrophysiological Study Versus ECG Monitoring trial (ESVEM). Circulation. 1995; 91:1070-6.
17. Paladino JA, Fell RE. Pharmacoeconomic analysis of cefmenoxime dual individualization in the treatment of nosocomial pneumonia. Ann Pharmacother. 1994; 28:384-9.
18. Senagore AJ, Kilbride MJ, Luchtefeld MA, MacKeigan JM, Davis AT, Moore JD. Superior nitrogen balance after laparoscopic-assisted colectomy. Ann Surg. 1995; 221:171-5.
19. Stein MD. Injected-drug use: complications and costs in the care of hospitalized HIV-infected patients. J Acquir Immune Defic Syndr. 1994; 7:469-73.
20. Wethrill GB. The Wilcoxon test and non-null hypotheses. Journal of the Royal Statistical Society. 1960; 27:402-18.
21. Zhou XH, Gao S. Confidence intervals for the log-normal mean. Stat Med. 1997; 16:783-90.
22. Dudley RA, Harrell Fe Jr, Smith LR, Mark DB, Califf RM, Pryor DB, et al. Comparison of analytic models for estimating the effect of clinical factors on the cost of coronary artery bypass graft surgery. J Clin Epidemiol. 1993; 46:261-71.
23. Johnson NL, Kotz S. Continuous Univariate Distributions, New York: J Wiley; 1994.
This article has been cited by other articles:
![]() |
N. V. Carroll, J. C. Delafuente, F. M. Cox, and S. Narayanan Fall-Related Hospitalization and Facility Costs Among Residents of Institutions Providing Long-Term Care Gerontologist, April 1, 2008; 48(2): 213 - 222. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A Hlatky, D. K Owens, and G. D Sanders Cost-effectiveness as an outcome in randomized clinical trials Clinical Trials, December 1, 2006; 3(6): 543 - 551. [Abstract] [PDF] |
||||
![]() |
R. Keren, T. E. Zaoutis, S. Saddlemire, X. Q. Luan, and S. E. Coffin Direct Medical Cost of Influenza-Related Hospitalizations in Children Pediatrics, November 1, 2006; 118(5): e1321 - e1327. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Inrig, S. D. Reed, L. A. Szczech, J. J. Engemann, J. Y. Friedman, G. R. Corey, K. A. Schulman, L. B. Reller, and V. G. Fowler Jr. Relationship between Clinical Outcomes and Vascular Access Type among Hemodialysis Patients with Staphylococcus aureus Bacteremia Clin. J. Am. Soc. Nephrol., May 1, 2006; 1(3): 518 - 524. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Huddleston, K. H. Long, J. M. Naessens, D. Vanness, D. Larson, R. Trousdale, M. Plevak, M. Cabanela, D. Ilstrup, R. M. Wachter, et al. Medical and Surgical Comanagement after Elective Hip and Knee Arthroplasty: A Randomized, Controlled Trial Ann Intern Med, July 6, 2004; 141(1): 28 - 38. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Teng, N. E. Mayo, E. Latimer, J. Hanley, S. Wood-Dauphinee, R. Cote, and S. Scott Costs and Caregiver Consequences of Early Supported Discharge for Stroke Patients Stroke, February 1, 2003; 34(2): 528 - 536. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-H. Zhou Inferences about population means of health care costs Statistical Methods in Medical Research, August 1, 2002; 11(4): 327 - 339. [Abstract] [PDF] |
||||
![]() |
D. W. Mapel, J. S. Hurley, F. J. Frost, H. V. Petersen, M. A. Picchi, and D. B. Coultas Health Care Utilization in Chronic Obstructive Pulmonary Disease: A Case-Control Study in a Health Maintenance Organization Arch Intern Med, September 25, 2000; 160(17): 2653 - 2658. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Menzin, K. Lang, M. Friedman, P. Neumann, and J. L. Cummings The Economic Cost of Alzheimer's Disease and Related Dementias to the California Medicaid Program ("Medi-Cal") in 1995 Am J Geriatr Psychiatry, November 1, 1999; 7(4): 300 - 308. [Abstract] [Full Text] |
||||
![]() |
J. A Barber and S. G Thompson Analysis and interpretation of cost data in randomised controlled trials: review of published studies BMJ, October 31, 1998; 317(7167): 1195 - 1200. [Abstract] [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||