The three papers included in this section are based on presentations from a session entitled "Biostatistical Issues in Database Research." This area had the potential to be extremely broad because few issues related to databases do not ultimately have biostatistical implications. However, these articles address the issues that were the major foci of this conference: cost of care, outcomes, and measuring quality.
The cost of medical care is an important issue for society. Interventions, treatments, and even providers are now required to be both effective and cost-effective. However, it may be surprising that the statistical analysis of cost data in the medical literature is inconsistent and often incorrect. The literature review done by Zhou, Melfi, and Hui in their paper, "Methods for Comparison of Cost Data," makes this abundantly clear. Cost data frequently have three problems that make them difficult to analyze: They are highly skewed, have unequal variances, and frequently have a large proportion of persons with no health care utilization. These distributional properties violate the assumptions of both the usual parametric and nonparametric analyses. This paper presents a new Z-score method for comparing costs between two independent groups that addresses the first two problems and can be implemented by using existing software. Use of this information is an important first step for analyzing cost data from large databases or prospective studies. Future steps should include methods analogous to analysis of variance for comparing multiple groups and to multiple regression for modeling costs.
The use of nonrandomized studies to compare outcomes of competing treatments is also a longstanding controversy in medical research. The late Dr. David Byar was a strong opponent of this practice, and as he noted in his paper at the Third Regenstrief Conference, the inherent biases are magnified, not minimized, by large databases [1]. New statistical methods are required if we are to use databases in this manner because traditional statistical methods using models that adjust for covariates do not eliminate these biases. In his paper, "Estimating Causal Effects from Large Data Sets Using Propensity Scores," Rubin presents one such method. Propensity scores were first introduced in 1983 but have not been widely used in the medical literature. However, Rubin provides a clear explanation of their implementation and limitations. Even with new methods, it remains clear that databases cannot replace randomized trials because even the best statistical methods do not adjust for unobserved covariates and cannot compensate for completely different groups.
Measuring quality of care has been a primary point of contention at least since the Health Care Financing Administration began releasing mortality statistics to the press. The analysis of this database and the reported figures have become more sophisticated and complex over time, but the primary complaints from hospitals with higher-than-expected rates remain the same: case mix, sample size, and misinterpretation of the numbers by the public. Similar issues have arisen as quality report cards are issued for other health care providers, such as HMOs based on the HEDIS (Health Plan Employer Data and Information Set) criteria or physicians within health care organizations. The initial argument that it is not possible to use databases to obtain meaningful report cards of quality is now moot. The current issue is developing methods to obtain the best possible performance measures. In their paper "Improving the Statistical Approach to Health Care Provider Profiling," Christiansen and Morris initially review the strengths and weaknesses of many of the provider profiling measures that have been used previously. Ultimately, they present a new method that better accounts for sample size differences and uses a measure, the probability of poor performance, which has a more intuitive interpretation than P values or confidence intervals.
On the surface, these three papers seem to be related only because each presents biostatistical methods that can be used for large databases. However, if each one is included in every biostatistician's armamentarium, then their combined use seems inevitable. For example, a propensity score analysis of costs would be more appropriate if within-stratum comparisons were made using the Z-score method instead of t-tests or Wilcoxon rank sum tests. Similarly, the Z-score, with its improved distributional properties, might be a better measure for profiling using hierarchical models than mean cost. Finally, the proposed profiling method adjusts for patient characteristics, but could comparisons be improved if a propensity score approach, rather than modeling, were incorporated into the process? Although these proposed combinations are not all straightforward, they do point out potential research areas to further improve the statistical analysis of large databases.
Barry P. Katz, PhD
Indiana University School of Medicine; Indianapolis, IN 46202