IN RESPONSE:
Mahadevia and associates underscore the importance of understanding the central point of our paper, that is, that variation between physicians (physician-level clustering) can limit the power of studies comparing any groups of physicians (for example, fee-for-service vs. managed care settings or high-quality vs. low-quality practice settings). All studies attempting such comparisons must be designed with not only sufficient numbers of patients per physician but also sufficient numbers of physicians per comparison group to ensure adequate power. The analytic methods we used were specifically chosen to investigate the separate effects of adjustment for patient characteristics and then physician clustering on observed differences, holding differences constant (1). The point estimates therefore did not change. It was the variability between physicians in each group rather than differences in case mix that accounted for the lack of statistical significance between groups. A recent study (2) showed similar findings when comparisons of groups of hospitals were adjusted for within-group variability.
Cobin and colleagues and Weir and associates raise substantially overlapping issues. We believe the issue of specialty misclassification to be inaccurate for the following reasons. First, as stated in our paper, each practice site self-reported its specialty and practices were specifically instructed to choose patients for whom they provided the principal diabetes care over the previous year. As a quality check, each site subsequently reaffirmed its specialty designation. Second, if there had been substantial misclassification, unadjusted differences in our Table 1 would have been smaller. Third, to determine the exact impact of site misclassification on our results, we reclassified each site as if it had been originally misclassified, first individually, then two and three sites at a time. The 29 individual, 406 paired, and 3654 three-way analyses of all possible misclassifications supported our original findings.
Cobin and Weir and their colleagues also confuse misclassification with co-management. As in other quality-of-care efforts (for example, the National Committee for Quality Assurance's Health Plan Employer Data and Information Set), co-management redounds to the credit of the principal caregiver, since coordination and referrals are the role of the responsible physician. We used no fructosamine values. For the two sites reporting total glycated hemoglobin, all values were corrected to hemoglobin A1c level according to the instructions of the manufacturer of the test kits used. As explained to PRP participants, a random audit of 5% of all practices was to be performed to evaluate data quality. The observed high interrater reliability in the 5% sample of practices in the pilot program has been replicated in the subsequent 4 years of the program with over 300 participating practices (
= 0.87).
Secondary data (for example, medical records and administrative data) are commonly and appropriately used for a variety of research purposes. The characteristics of the PRP database (its reliability, design, and sampling strategy) are more than adequate for the methodologic purpose to which we applied it. The average of 67 patients per practice allowed considerable precision of estimates of practice performance. However, not every site had 67 patients; the resulting total of 1750 patients reflects differences in the number by practice. No clinical performance measures were imputed, as noted in our paper. Only the measure of patient satisfaction (favoring endocrinologists) was imputed. The PRP measures committee concluded that the performance measures should not differ for type 1 compared with type 2 diabetes mellitus. We did, however, control specialty comparisons for the use of insulin.
Cobin and associates also confuse the methodologic conclusion of our study (that is, the need for more appropriately designed studies comparing groups of physicians) with a more generalized conclusion about specialty differences. The highly selected nature of the PRP, which made it a conservative and appropriate resource for investigating physician clustering and patient case mix, would compromise any generalized conclusion about the two specialty groups. We made no such conclusions. We hope that the PRP and programs like it ultimately serve to improve diabetes care delivered by all provider groups.
1. Localio AR, Berlin JA, Ten Have TR, Kimmel SE. Adjustments for center in multicenter studies: an overview Ann Intern Med. 2001;135:112-23. [PMID: 11453711].[Abstract/Free Full Text]
2. Krumholz HM, Rathore SS, Chen J, Wang Y, Radford MJ. Evaluation of a consumer-oriented internet health care report card: the risk of quality ratings based on mortality data JAMA. 2002;287:1277-87. [PMID: 11886319].[Abstract/Free Full Text]