Annals
Established in 1927 by the American College of Physicians
:
Advanced search
 
box Article
 arrow  Table of Contents                
space
box Services
 arrow  Send comment/rapid response letter
space
 arrow  Notify a friend about this article
space
 arrow  Alert me when this article is cited
space
 arrow  Add to Personal Archive
space
 arrow  Download to Citation Manager
space
 arrow  ACP Search                        
space
 arrow  Get Permissions
space
box Google Scholar
 arrow  Search for Related Content
space
box PubMed
Articles in PubMed by Author:
  arrow  Katz, B. P.
space
 arrow  Related Articles in PubMed
space
 arrow  PubMed Citation
space
 arrow  PubMed
space

STATISTICAL METHODS

Biostatistics To Improve the Power of Large Databases

right arrow Barry P. Katz, PhD

15 October 1997 | Volume 127 Issue 8 Part 2 | Page 769


The three papers included in this section are based on presentations from a session entitled "Biostatistical Issues in Database Research." This area had the potential to be extremely broad because few issues related to databases do not ultimately have biostatistical implications. However, these articles address the issues that were the major foci of this conference: cost of care, outcomes, and measuring quality.

The cost of medical care is an important issue for society. Interventions, treatments, and even providers are now required to be both effective and cost-effective. However, it may be surprising that the statistical analysis of cost data in the medical literature is inconsistent and often incorrect. The literature review done by Zhou, Melfi, and Hui in their paper, "Methods for Comparison of Cost Data," makes this abundantly clear. Cost data frequently have three problems that make them difficult to analyze: They are highly skewed, have unequal variances, and frequently have a large proportion of persons with no health care utilization. These distributional properties violate the assumptions of both the usual parametric and nonparametric analyses. This paper presents a new Z-score method for comparing costs between two independent groups that addresses the first two problems and can be implemented by using existing software. Use of this information is an important first step for analyzing cost data from large databases or prospective studies. Future steps should include methods analogous to analysis of variance for comparing multiple groups and to multiple regression for modeling costs.

The use of nonrandomized studies to compare outcomes of competing treatments is also a longstanding controversy in medical research. The late Dr. David Byar was a strong opponent of this practice, and as he noted in his paper at the Third Regenstrief Conference, the inherent biases are magnified, not minimized, by large databases [1]. New statistical methods are required if we are to use databases in this manner because traditional statistical methods using models that adjust for covariates do not eliminate these biases. In his paper, "Estimating Causal Effects from Large Data Sets Using Propensity Scores," Rubin presents one such method. Propensity scores were first introduced in 1983 but have not been widely used in the medical literature. However, Rubin provides a clear explanation of their implementation and limitations. Even with new methods, it remains clear that databases cannot replace randomized trials because even the best statistical methods do not adjust for unobserved covariates and cannot compensate for completely different groups.

Measuring quality of care has been a primary point of contention at least since the Health Care Financing Administration began releasing mortality statistics to the press. The analysis of this database and the reported figures have become more sophisticated and complex over time, but the primary complaints from hospitals with higher-than-expected rates remain the same: case mix, sample size, and misinterpretation of the numbers by the public. Similar issues have arisen as quality report cards are issued for other health care providers, such as HMOs based on the HEDIS (Health Plan Employer Data and Information Set) criteria or physicians within health care organizations. The initial argument that it is not possible to use databases to obtain meaningful report cards of quality is now moot. The current issue is developing methods to obtain the best possible performance measures. In their paper "Improving the Statistical Approach to Health Care Provider Profiling," Christiansen and Morris initially review the strengths and weaknesses of many of the provider profiling measures that have been used previously. Ultimately, they present a new method that better accounts for sample size differences and uses a measure, the probability of poor performance, which has a more intuitive interpretation than P values or confidence intervals.

On the surface, these three papers seem to be related only because each presents biostatistical methods that can be used for large databases. However, if each one is included in every biostatistician's armamentarium, then their combined use seems inevitable. For example, a propensity score analysis of costs would be more appropriate if within-stratum comparisons were made using the Z-score method instead of t-tests or Wilcoxon rank sum tests. Similarly, the Z-score, with its improved distributional properties, might be a better measure for profiling using hierarchical models than mean cost. Finally, the proposed profiling method adjusts for patient characteristics, but could comparisons be improved if a propensity score approach, rather than modeling, were incorporated into the process? Although these proposed combinations are not all straightforward, they do point out potential research areas to further improve the statistical analysis of large databases.

Barry P. Katz, PhD

Indiana University School of Medicine; Indianapolis, IN 46202


Author and Article Information
space
up arrowTop
dotAuthor & Article Info
down arrowReferences

Indiana University School of Medicine; Indianapolis, IN 46202
Note: This article is one of a series of articles comprising an Annals of Internal Medicine supplement entitled "Measuring Quality, Outcomes, and Cost of Care Using Large Databases: The Sixth Regenstrief Conference." To see a complete list of the articles included in this supplement, please view its Table of Contents.


References
space
up arrowTop
up arrowAuthor & Article Info
dotReferences

1. Byar DP. Problems with using observational databases to compare treatments. Stat Med. 1991; 10:663-6.



box Article
 arrow  Table of Contents                
space
box Services
 arrow  Send comment/rapid response letter
space
 arrow  Notify a friend about this article
space
 arrow  Alert me when this article is cited
space
 arrow  Add to Personal Archive
space
 arrow  Download to Citation Manager
space
 arrow  ACP Search                        
space
 arrow  Get Permissions
space
box Google Scholar
 arrow  Search for Related Content
space
box PubMed
Articles in PubMed by Author:
  arrow  Katz, B. P.
space
 arrow  Related Articles in PubMed
space
 arrow  PubMed Citation
space
 arrow  PubMed
space


 Home | Current Issue | Past Issues | In the Clinic | ACP Journal Club | CME | Collections | Audio/Video | Mobile | Subscribe | Tools | Help | ACP Online