Annals
Established in 1927 by the American College of Physicians
:
Advanced search
 
box Article
 arrow  Table of Contents                
space
 arrow  Abstract of this article Free
space
 arrow  PDF of this article
(PDFs free after 6 months)
space
 arrow  Summary for Patients
space
 arrow  Summary for Patients (PDF)
space
 arrow  Figures/Tables List
space
 arrow  Sample vignette and scoring sheet
space
 arrow  Related articles in Annals
space
 arrow  Articles citing this article
space
box Services
 arrow  Send comment/rapid response letter
space
 arrow  Notify a friend about this article
space
 arrow  Alert me when this article is cited
space
 arrow  Add to Personal Archive
space
 arrow  Download to Citation Manager
space
 arrow  ACP Search                        
space
 arrow  Get Permissions
space
box Google Scholar
 arrow  Search for Related Content
space
box PubMed
Articles in PubMed by Author:
  arrow  Peabody, J. W.
space
  arrow  Lee, M.
space
 arrow  Related Articles in PubMed
space
 arrow  PubMed Citation
space
 arrow  PubMed
space

IMPROVING PATIENT CARE

Improving Patient Care is a special section within Annals supported in part by the U.S. Department of Health and Human Services (HHS) Agency for Healthcare Research and Quality (AHRQ). The opinions expressed in this article are those of the authors and do not represent the position or endorsement of AHRQ or HHS.

Measuring the Quality of Physician Practice by Using Clinical Vignettes: A Prospective Validation Study

right arrow John W. Peabody, MD, PhD; Jeff Luck, MBA, PhD; Peter Glassman, MBBS, MSc; Sharad Jain, MD; Joyce Hansen, MD; Maureen Spell, MD; and Martin Lee, PhD

16 November 2004 | Volume 141 Issue 10 | Pages 771-780

Background: Worldwide efforts are under way to improve the quality of clinical practice. Most quality measurements, however, are poorly validated, expensive, and difficult to compare among sites.

Objective: To validate whether vignettes accurately measure the quality of clinical practice by using a comparison with standardized patients (the gold standard method), and to determine whether vignettes are a more or less accurate method than medical record abstraction.

Design: Prospective, multisite study.

Setting: Outpatient primary care clinics in 2 Veterans Affairs medical centers and 2 large, private medical centers.

Participants: 144 of 163 eligible physicians agreed to participate, and, of these, 116 were randomly selected to see standardized patients, to complete vignettes, or both.

Measurements: Scores, expressed as the percentage of explicit quality criteria correctly completed, were obtained by using 3 methods.

Results: Among all physicians, the quality of clinical practice as measured by the standardized patients was 73% correct (95% CI, 72.1% to 73.4%). By using exactly the same criteria, physicians scored 68% (CI, 67.9% to 68.9%) when measured by the vignettes but only 63% (CI, 62.7% to 64.0%) when assessed by medical record abstraction. These findings were consistent across all diseases and were independent of case complexity or physician training level. Vignettes also accurately measured unnecessary care. Finally, vignettes seem to capture the range in the quality of clinical practice among physicians within a site.

Limitations: Despite finding variation in the quality of clinical practice, we did not determine whether poorer quality translated into worse health status for patients. In addition, the quality scores are based on measurements from 1 patient–provider interaction. As with all other scoring criteria, vignette criteria must be regularly updated.

Conclusions: Vignettes are a valid tool for measuring the quality of clinical practice. They can be used for diverse clinical settings, diseases, physician types, and situations in which case-mix variation is a concern. They are inexpensive and easy to use. Vignettes are particularly useful for comparing quality among and within sites and may be useful for longitudinal evaluations of interventions intended to change clinical practice.


Accurate, affordable, and valid measurements of clinical practice are the basis for quality-of-care assessments (1). However, to date, most measurement tools have relied on incomplete data sources, such as medical records or administrative data; require highly trained and expensive personnel to implement; and are difficult to validate (2-5). Comparisons of clinical practice across different sites and health care systems are also difficult because they require relatively complex instrument designs or statistical techniques to adjust for variations in case mix among the underlying patient populations (6, 7). We have developed a measurement tool, computerized clinical vignettes, that overcomes these limitations and measures physicians' clinical practice against a predefined set of explicit quality criteria.

These vignettes simulate patient visits and can be given to physicians to measure their ability to evaluate, diagnose, and treat specific medical conditions. Each vignette-simulated case contains realistic clinical detail, allowing an identical clinical scenario to be presented to many physicians. Each physician can be asked to complete several vignettes to simulate diverse clinical conditions. This instrument design obviates the need to adjust quality scores for the variation in disease severity and comorbid conditions found in actual patient populations. Our vignettes are also distinct from other quality measurements of clinical practice because they do not focus on a single task, or even a limited set of tasks, but instead comprehensively evaluate the range of skills needed to care for a patient.

Vignettes are particularly well-suited for quality assessments of clinical practice that are used for large-scale (8, 9), cross-system comparisons (10, 11) or for cases in which ethical issues preclude involving patients or their records (7, 12, 13). They are also ideal for evaluations that require holding patient variation constant (14, 15) or manipulating patient-level variables (15-17). The appeal of vignettes has resulted in their extensive use in medical school education (18, 19), as well as various studies that explicitly evaluate the quality of clinical practice in real-life settings and comparative analysis among national health care systems (10, 20-23).

Before vignette-measured quality can be used confidently in these settings, however, 2 important questions must be answered: How valid are vignettes as a measure of actual clinical practice? Can vignettes discriminate among variations in the quality of clinical practice? This has led to a search to define a gold standard for validation (24-26). We and others have used standardized patients as this standard. Standardized patients are trained actors who present unannounced to outpatient clinics as patients with a given clinical condition. Immediately after meeting with a physician, the standardized patient records on a checklist what the physician did during the visit (26-28). Rigorous methods, which we have described in detail elsewhere (29), ensure that standardized patients can be considered a gold standard. In addition, we have demonstrated the validity of standardized patients as a gold standard by concealing audio recorders on standardized patients during visits. The overall rate of agreement between the standardized patients' checklists and the independent assessment of the audio transcripts was 91% (26).

We previously used paper-and-pen vignettes in a study limited to only 1 health care system, the Veterans Administration, and found that they seemed to be a valid measure of the quality of clinical practice according to their rate of agreement with standardized patient checklists (26). For this study, we wanted to confirm the validity of vignettes by using a more complex study design that introduced many more randomly assigned physicians, a broader range of clinical cases, and several sites representing different health care systems. We also wanted to test a refined, computerized version of vignettes, which we believe are more realistic and streamline data collection and scoring. We were particularly interested in determining whether the vignettes accurately capture variation in the quality of clinical practice, which has become increasingly prominent in the national debate on quality of care (30, 31). We hoped that vignettes could contribute to this debate by providing a low-cost measure of variation across different health care systems.


Methods
space
up arrowTop
dotMethods
down arrowResults
down arrowDiscussion
down arrowAuthor & Article Info
down arrowReferences

Sites

The study was conducted in 4 general internal medicine clinics: 2 Veterans Affairs (VA) medical centers and 2 large, private medical centers. One private site is a closed group model, and the other, primarily staffed by employed physicians, contracts with managed care plans. All sites are located in California, and each has an internal medicine residency training program. One VA medical center and 1 private site are located in 1 of 2 cities. The 2 VA medical centers are large, academically affiliated hospitals with large primary care general internal medicine practices. We chose the 2 private sites that were generally similar to the VA medical centers and to each other; each had large primary care practices and capitated reimbursement systems that provide primary care general internists with a broad scope of clinical decision-making authority.

Study Design

At each site, all attending physicians and second- and third-year residents who were actively engaged in the care of general internal medicine outpatients were eligible to participate in the study. We excluded only interns. Of 163 eligible physicians, 144 agreed to participate. We informed consenting physicians that 6 to 10 standardized patients might be introduced unannounced into their clinics over the course of a year and that they might be asked to complete an equal number of vignettes.

Sixty physicians were randomly selected to see standardized patients: 5 physicians from each of the 3 training levels at each of the 4 sites (Figure 1). We assigned standardized patients to each selected physician for 8 clinical cases—simple and complex cases of chronic obstructive pulmonary disease, diabetes, vascular disease, and depression. We abstracted the medical records from the 480 standardized patient visits. Each selected physician also completed a computerized clinical vignette for each of the 8 cases. For standardized patient visits that a selected physician did not complete, a replacement physician, who was randomly selected from the same training level at the same site, completed the visit. Eleven physicians required replacements. The 11 replacement physicians completed 24 standardized patient visits. Each replacement physician completed vignettes for all 8 cases. Finally, we randomly selected 45 additional physicians to serve as controls and complete vignettes (only) for all 8 cases. A total of 116 physicians participated in the study by seeing standardized patients, completing vignettes, or both. Standardized patients presented to the clinics between March and July 2000, and physicians completed vignettes between May and August 2000.



View larger version (40K):
[in this window]
[in a new window]
 
Figure 1. Planned study design showing sites and physician sample by level of training and clinical case for the 3 quality measurement methods. 1 = simple case; 2 = complex case; COPD = chronic obstructive pulmonary disease; DM = diabetes mellitus; MCO = managed care organization; MD = medical doctor (physician); SP = standardized patient; VAMC = Veterans Affairs medical center.

 

Vignette Data Collection

We developed the vignettes by using a standardized protocol. We first selected relatively common medical conditions frequently seen by internists. All selected conditions had explicit, evidence-based quality criteria and accepted standards of practice that could be used to score the vignettes, as well as be measured by standardized patients and chart abstraction. We developed written scenarios that described a typical patient with 1 of the same 4 diseases (chronic obstructive pulmonary disease, diabetes, vascular disease, or depression). For each disease, we developed a simple (uncomplicated) case and a more complex case with a comorbid condition of either hypertension or hypercholesterolemia. This yielded a total of 8 clinical cases. (A sample vignette and scoring sheet are available online.)

The physician completing the vignette "sees the patient" on a computer. Each vignette is organized into 5 sections, or domains, which, when completed in sequential order, recreate the normal sequence of events in an actual patient visit: taking the patient's history, performing the physical examination, ordering radiologic or laboratory tests, making a diagnosis, and administering a treatment plan.

For example, the computerized vignette first states the presenting problem to the physician and prompts the physician to "take the patient's history" (that is, ask questions that would determine the history of the present illness; past medical history, including prevention; and social history). Physicians can record components of the history in any order without penalty. The entire format is open-ended: The physician enters the history questions directly into the computer and, in the most recent computerized versions, receives realtime responses. When the history is completed, the computer confirms that the physician has finished and then provides key responses typical of a patient with the specific case. The same process is repeated for the 4 remaining domains.

In addition to the open-ended format, we have taken 3 steps to avoid potential inflation of vignette scores. First, physicians are not allowed to return to a previous domain and change their queries after they have seen the computerized response. Second, the number of queries is limited in the history and physical examination domains. For example, in the physical examination domain, physicians are asked to list only the 6 to 10 essential elements of the examination that they would perform. Third, they are given limited time to complete the vignette (just as time is limited during an actual patient visit).

Each physician completed 8 vignettes: 4 per physician given in a random order during two 60- to 70-minute sittings conducted during daytime work hours at the hospital or clinic. Physicians who saw standardized patients were given the vignette for a particular case only after completing the standardized patient visit. A trained abstractor reviewed the physician's responses for each completed vignette and assessed whether each explicit scoring criterion was met.

Standardized Patient Data Collection

The standardized patients were experienced actors from standardized patient medical education programs. Because 2 of the sites were VA medical centers, we hired only adult male standardized patients. We followed established protocols for standardized patient training, including practice in presenting 1 of the 8 clinical cases and in scoring physician performance on a checklist. Training for each case followed a script that exactly corresponded to the clinical scenario presented in the vignette for that case. After training the standardized patients, we enrolled them unannounced into clinics. This enrollment process involved creating fictitious names; personal data; insurance status; and brief medical histories, including laboratory results when indicated. We entered these data into paper-based and computerized record systems in the specific formats required for each site, including the necessary entries into the electronic medical records, paper clinical records, and administrative systems.

The standardized patient completed a closed-ended checklist immediately after each visit, indicating whether the physician took specific actions corresponding to each scoring criterion. This checklist generated a standardized patient score that could be compared directly with the score of the corresponding vignette completed by the same physician for that clinical case.

Medical Record Data Collection

Each standardized patient visit also generated a medical record that we retrieved from the clinic. A trained abstractor examined the information entered into the medical record to determine what specific actions the physician took or did not take during the visit. These actions corresponded exactly with the quality criteria that we used to score the vignettes. The resulting chart abstraction score for each visit was therefore directly comparable to the standardized patient score for that visit and the physician's vignette score for that clinical case.

Scoring

We conceptualize "good-quality" clinical practice as the comprehensive provision of services for a particular clinical case in a manner that leads to better outcomes for patients. We therefore determined what set of actions a physician should take—or should not take—during a patient visit to treat a clinical case in a manner that has been shown to lead to better health outcomes. This yielded a comprehensive set of explicit quality criteria, rather than simply a few selected measures of care that might be gamed, such as determining whether a medication was appropriately prescribed or whether the patient was screened for a comorbid condition. Our criteria captured whether the physician 1) determined the entire relevant history; 2) performed the relevant physical examination items; 3) ordered the necessary laboratory or imaging tests; 4) made the correct diagnosis, including etiology; and 5) prescribed a complete treatment (management) plan.

We identified quality criteria from 3 sources: an evidence-based literature search on the clinical practices that lead to better health outcomes, U.S. and international clinical guidelines, and local expert panels of academic and community physicians consisting of both generalists and specialists. We used the recommendations by the expert panels to modify and finalize the master criteria list derived from the literature and guidelines. We used identical criteria to score the vignettes, standardized patients, and charts for a particular clinical case.

We assigned a weight of 1.0 to individual criteria that expert reviewers believed to be most critical for quality care. We grouped criteria that were deemed less important, such as several physical examination items subsidiary to 1 clinical construct, into categories and implicitly assigned them weights less than 1.0.

Most of the explicit criteria measured whether the physician took specific actions necessary to provide good-quality care for a particular case. In addition, we also measured whether the physician took any action that she or he should not have for that case, such as prescribing antibiotics for an infection highly likely to be of viral cause. We measured these items of unnecessary care, including inappropriate tests, procedures, medications, or referrals, only for the domains of testing and treatment because the marginal time cost and risk of unnecessary history or physical examination items is negligible.

Each physician who saw standardized patients had a total of 3 scores (1 for each method) for each case. We compiled a physician's scores into a set so that the scores could be compared across cases by method. The score for all 3 methods—vignette, standardized patient checklist, and chart abstraction—was equal to the number of categories correct divided by the number of total possible categories (an average of 30 per case). We scored unnecessary care separately by counting the number of unnecessary items for each vignette or standardized patient visit.

To ensure the accuracy of scoring, physician investigators audited randomly selected standardized patient checklists, chart abstraction forms, and vignette scoring forms from each site.

Statistical Analysis

We compared mean scores for each of the 3 methods to determine how well vignettes and chart abstraction measured actual quality compared with the standardized patient benchmark. We disaggregated these comparisons by disease, site, case complexity, and physician training level. We evaluated the statistical significance of the differences in mean scores among the 3 methods by using the F test from an analysis of variance (ANOVA) model that considered the matching of vignette, standardized patient, and chart abstraction scores for each physician for each case. Specifically, the 3-way crossed, 1-way nested model included factors for site, physician training level, quality measurement method, and physician (nested within site), plus a site-by-method interaction. Where differences among means for the 3 methods were statistically significant, we used the Tukey–Kramer multiple comparison procedure to evaluate the significance of comparisons between pairs of methods by using a global 5% significance level. We also considered other interaction terms (method by disease, method by case complexity, and method by physician training level) in the ANOVA model to assess the consistency of the results across these factors. We estimated the 95% CIs by using adjusted errors to account for the nested study design.

To explore the ability of vignettes to detect variation in performance among sites, we created a box plot of the total scores for each physician for each method for each site. We constructed this plot to show the site median bar, the interquartile range, and the 5th to 95th percentile range. We evaluated variation among sites by comparing site median scores for each method. We measured variation within a site by the distance from the 5th to the 95th percentile. We compared within-site variation across sites by normalizing each site's 5th to 95th percentile range by its SD for each method.

In addition, we wanted to comprehensively assess how well vignettes measure unnecessary care. Because standardized patients cannot be trained to accurately report unnecessary care, we therefore developed a combined "standardized patient plus chart" measure that added some items from the medical record (including tests, referrals, and medications) to the standardized patient checklist items, yielding a complete set of criteria for a visit that exactly matched the vignette scoring criteria for that case.

We also used the "standardized patient plus chart" to assess the ability of vignettes to detect the poorest performers. To do this, we aggregated the scores from all cases completed by a physician. We reported results as the percentage of times the vignettes and standardized patient scores identified the same physicians performing in the lowest quartile. Because results were the same whether we used lowest quintile, quartile, or tertile, we report the results for only the lowest quartile.

To assess whether seeing a standardized patient before completing the vignette for that case "cued" the physician in a way that affected vignette performance, we compared mean scores for vignettes matched with a standardized patient visit to the mean scores for vignettes without an associated standardized patient visit (mostly those completed by physicians who had not seen standardized patients). We compared vignette scores in the 2 groups by using a linear regression model that accounted for the clustering of physicians within site and training level.

We conducted all statistical analyses by using Stata software, versions 6.0, 7.0, and 8.2 (StataCorp, College Station, Texas).

Role of the Funding Source

The funding source had no role in the design, conduct, or reporting of the study or in the decision to submit the manuscript for publication. We had open and full access to all the data files for the study and had full control over the data.


Results
space
up arrowTop
up arrowMethods
dotResults
down arrowDiscussion
down arrowAuthor & Article Info
down arrowReferences

Figure 2 shows that vignette scores more closely reflected the measured quality of care by the gold standard standardized patients than did the abstracted medical record: 73% (95% CI, 72.1% to 73.4%) of the criteria were correctly done as measured by standardized patients, compared with 68% (CI, 67.9% to 68.9%) for vignettes and 63% (CI, 62.7% to 64.0%) for chart abstraction (P < 0.001 per ANOVA model; all pairwise comparisons between methods significant at a global 5% level). Figure 2 also shows that vignette performance with respect to standardized patients was consistent across all 4 diseases evaluated. This consistent ranking of vignettes with respect to standardized patients was also observed across the 4 sites (data not shown). Figure 3 shows that vignettes performed similarly well for uncomplicated versus complex cases and at all 3 levels of physician training, with scores deviating from the standardized patient encounter by 1% to 6%. Chart abstraction scores less closely reflected the standardized patient scores across all diseases, case complexity, or training levels.



View larger version (20K):
[in this window]
[in a new window]
 
Figure 2. Direct comparison of scores, overall and by disease, using 3 measurement methods: standardized patients, vignettes, and chart abstraction. Scores are expressed as percentage correct; error bars represent upper bound of 95% CIs. Per analysis of variance model, P < 0.001 (overall); P > 0.2 for method by disease interaction. COPD = chronic obstructive pulmonary disease.

 


View larger version (20K):
[in this window]
[in a new window]
 
Figure 3. Comparison of vignette scores to standardized patient and chart scores, stratified by case complexity and training level. Scores are expressed as percentage correct; error bars represent upper bound of 95% CIs. Per analysis of variance models, P < 0.001 (overall); P > 0.2 for method by case complexity interaction; P > 0.2 for method by physician training level interaction.

 

Figure 4 indicates that vignettes also measured differences in quality among the 4 sites. Specifically, vignettes ranked the sites in the same order, as did the standardized patients. Figure 4 also indicates that the variation within sites (that is, among providers at the same site) greatly exceeded the variation among sites. Within-site variation (that is, the 5th to the 95th percentile range) ranged from 46.2% for standardized patients to 36.0% for vignettes. Vignettes compared favorably to standardized patients in detecting this variation. When the 5th to 95th percentile ranges were normalized by SD, the variation among sites was 1.8% and 1.0% for standardized patients and vignettes, respectively, while the corresponding within-site variation was 3.3% and 3.2%, respectively.



View larger version (13K):
[in this window]
[in a new window]
 
Figure 4. Comparison of variations among and within the 4 sites by measurement method. Boxes represent interquartile range, and stems describe 5th to 95th percentile range.

 

We investigated 4 other measurement characteristics of vignettes. First, we measured the ability of vignettes to detect physicians who performed in the lowest quartile, according to the comprehensive standardized patient plus chart results. Vignettes identified 8 of 15 (53% [CI, 27% to 79%]) physicians. Second, seeing a standardized patient before completing a vignette seemed to have a small "cueing" effect. Physicians who were cued by previously seeing a standardized patient for a particular case had an absolute average vignette score that was 2.6% higher (P = 0.031) than those who only completed the vignette but saw no corresponding standardized patient. Third, vignettes and standardized patients were almost identical in their ability to detect unnecessary care. Figure 5 shows that the number of unnecessary items ranged from 0 to 7 per physician per case and that vignettes demonstrated the same distribution of unnecessary tests, treatments, and referrals as standardized patients. Fourth, we compared the time it took personnel to obtain and score the vignettes compared with locating, abstracting, and scoring a medical record. We found that, with trained personnel, it typically takes 40 minutes in total for each medical record versus 25 minutes for a vignette.



View larger version (9K):
[in this window]
[in a new window]
 
Figure 5. Distribution of unnecessary items ordered by participants while caring for cases depicted by vignettes compared with all tests and referrals entered directly into the medical record after standardized patient visits.

 


Discussion
space
up arrowTop
up arrowMethods
up arrowResults
dotDiscussion
down arrowAuthor & Article Info
down arrowReferences

Vignettes can be a valuable tool for measuring the quality of clinical practice (13, 32). Unaddressed concerns about their validity and ability to discern variations in quality, however, have limited their use. The hope has been that vignettes could be constructed in such a way that they would measure the process of care in various clinical settings and even allow for cross-system or cross-national comparisons (8, 10, 14, 23, 33-35). These expectations arise from the observation that vignettes are less expensive to administer than other measurements and eliminate the need for difficult case-mix adjustments. If validated, vignettes also offer the prospect of a more thorough measurement of clinical practice than typical instruments that measure few quality criteria per case. In particular, we have been interested in the possibility of using vignettes to measure the effectiveness of policy interventions that are designed to improve care for an array of clinical conditions and physician characteristics in diverse patient populations (9, 17, 36-45).

We report results from a large validation study of computerized vignettes. We compared the quality of clinical practice, as measured by vignettes, with standardized patient checklists for almost 500 outpatient visits, as well as the medical records of these visits. We made these comparisons across different health care systems, a range of clinical conditions, and different levels of training among randomly sampled general internists in an outpatient setting.

We found that vignettes provided consistently better measurements of the quality of clinical care than did medical record abstraction when we compared both with the standardized patient checklists (the gold standard). This measurement capability of vignettes was robust across 4 different clinical conditions, simple and complex cases, different levels of physician training, and 4 sites. Vignettes mirrored the differences in quality of clinical care both among sites in different health care systems and within the same site found by the standardized patients. When we compared the differences in the scores among and within sites, the rank order of overall quality measured by vignettes was the same as that measured by standardized patients. Other studies have observed this ability to detect differences between sites (32, 46, 47).

We were particularly struck by the 3-fold greater magnitude of the variation within sites compared with the variation among sites. This variation has been described elsewhere but has always been confounded by questions of whether differences in measured quality or differences in the health of the underlying population cause this variation (23). This problem did not complicate our study since vignettes present the identical "case mix" at each site (48, 49). This implies that institutions can decrease quality variation at the provider level by targeting both low-end performers and specific clinical areas or cases in which providers perform poorly by using education and reward incentives.

In this study, vignettes accurately captured unnecessary care. This suggests that vignettes not only have a high correlation with actual practice but also have a capacity to measure the efficiency of clinical care. Thus, among physicians who score well, vignettes might be useful to distinguish between physicians who are "careful and smart" (ordering a parsimonious set of necessary tests and referrals) and those who are "just cautious" (ordering an array of unnecessary and unhelpful tests, referrals, or treatments in addition to the correct ones).

Computerization reduces the time and money required to score either handwritten responses to vignettes or to abstract charts. Administering the computerized vignettes also seems more realistic than our earlier paper-and-pen vignettes. The computerization allows for realtime responses that more closely simulate the patient–physician interaction. In the future, we expect to enhance the cost-effectiveness of vignettes to include electronic scoring, thereby reducing costs and facilitating access to vignette technology by health plans and medical groups (50, 51).

Although previous research has documented that vignettes are, at the very least, a measure of knowledge (52, 53), our findings indicate that, when properly constructed, vignettes are a valid measure of what physicians do during actual clinical encounters with patients. We believe that to maximize their validity, vignettes must incorporate several features that we included. Specifically, they should be open-ended (54); impose realistic temporal constraints; provide online, realtime responses where necessary; reflect clinical complexity (9, 12, 50); use evidence-based scoring criteria that are linked to improved health outcomes; and measure both unnecessary and necessary care.

Of interest, the abstracted medical record seems to be less representative of the actual visit than the vignette. We and others have noted that the chart is an imperfect measure of what occurs during a patient visit (2-5, 55). Physicians, for example, often perceive the chart more as a billing device, legal document, or communication tool than as a recording device (56).

As promising as these results seem, we must point out that vignettes identified only half of the worst performances across sites. Vignettes, by construct, also do not capture the important elements of caring and collegial rapport that are critical to overall patient well-being. These shortcomings underscore our belief that vignettes, like other measures of clinical practice, should not be used as a solitary measure of individual clinical competence (57). Clearly, different measurement methods capture different elements of practice. We believe, however, that some of the measured differences also reflect the variation in an individual's day-to-day performance. The cueing effect found among physicians who had previously seen a standardized patient supports this notion. The enhanced score of these physicians is small but suggests that variation does exist within a group's performance and that vignette scores can be improved if effort level (or motivation) is increased.

Our study has several limitations. While vignettes may capture variation in clinical care, we have not demonstrated that this variation translates into equivalent better (or worse) health outcomes. We confined our study to general internists in teaching programs and conditions in men in which the physical findings can be simulated. While gynecologic and pediatric conditions have been studied by using vignettes (12, 14, 21, 41, 54), these conditions cannot be simulated by standardized patients, making it difficult to compare methods. Vignettes in this study are also limited because they captured the quality of clinical care during only 1 visit, and we observed a cueing effect in physicians who saw standardized patients before the vignettes. In addition, the 4 vignettes require between 45 minutes and 1 hour to complete, which may be difficult when the physician's time is already limited. To reduce this burden, we recommend giving the vignettes only every 6 to 12 months when they are used in longitudinal studies. Finally, vignette criteria, like all scoring criteria, must be regularly updated and linked to improvements in patients' health status if they can truly be useful (58).

The discouraging levels of and the wide variations in the quality of clinical care in the United States and other countries (30, 59) can no longer be overlooked. Experience indicates, however, that physician practice can be improved, but only if it is measured (31, 60). This study describes an innovative method—clinical vignettes—that seems to measure the level of and variation in clinical practice for a defined set of conditions among different sites. Our reported validation of vignettes is a first step toward their wider application. However, more work is needed to link vignette scores to policy interventions or improvements in patient outcomes. While vignettes capture actual practice, control for case mix, can measure necessary and unnecessary care, and are easy to administer at a low cost, they have limitations that must be considered and are best used in conjunction with other measures.


Author and Article Information
space
up arrowTop
up arrowMethods
up arrowResults
up arrowDiscussion
dotAuthor & Article Info
down arrowReferences

From San Francisco Veterans Affairs Medical Center, Institute for Global Health, University of California, San Francisco, and California Pacific Medical Center, San Francisco, California; University of California, Los Angeles, Veterans Affairs Greater Los Angeles Healthcare System, and Permanente Medical Group, Los Angeles, California; RAND, Santa Monica, California; and Veterans Affairs Center for the Study of Healthcare Provider Behavior, Sepulveda, California.

Acknowledgments: The authors thank Elizabeth O'Gara (University of California, Los Angeles), Julianne Arnall (Stanford University), and Molly Bates Efrusy and Ojig Yeretsian (University of California, San Francisco), who trained and coordinated the standardized patient visits; the staff of the hospitals that worked so diligently to simulate the visits, provide their laboratory data, and introduce the simulated medical records; Bret Lewis, Ed La Calle, and Dan Bertenthal for their programming assistance and keen insights; Anne Sunderland and Miriam Polon for their assistance with the manuscript; and the actors for their fine performances.

Grant Support: By grant 11R 98118-1 from the Veterans Affairs Health Service Research and Development Service, Washington DC. Dr. Peabody is also a recipient of a Senior Research Associate Career Development Award from 1998 to 2001 from the Department of Veterans Affairs.

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: John W. Peabody, MD, PhD, Institute for Global Health, 74 New Montgomery, Suite 508, San Francisco, CA 94105; e-mail, peabody{at}psg.ucsf.edu.

Current Author Addresses: Dr. Peabody: Institute for Global Health, 74 New Montgomery Street, Suite 508, San Francisco, CA 94105.

Dr. Luck: Department of Health Services, University of California, Los Angeles, School of Public Health, PO Box 951772, Los Angeles, CA 90095-1772.

Mr. Glassman: Division of General Internal Medicine (111G), Veterans Affairs Greater Los Angeles, 11301 Wilshire Boulevard, Los Angeles, CA 90073.

Dr. Jain: Medical Service (111), Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, CA 94121.

Dr. Hansen: 3801 Sacramento Street, Suite 309, San Francisco, CA 94118.

Dr. Spell: 4950 Sunset Boulevard, Los Angeles, CA 90027.

Dr. Lee: Center for the Study of Healthcare Provider Behavior, Veterans Affairs Medical Center (152), 16111 Plummer Street, Building 25, Sepulveda, CA 91343-2036.


References
space
up arrowTop
up arrowMethods
up arrowResults
up arrowDiscussion
up arrowAuthor & Article Info
dotReferences

1. Mackel JV, Farris H, Mittman BS, Wilkes M, Kanouse DE. A Windows-based tool for the study of clinical decision-making. Medinfo. 1995;8:1687 [PMID: 8591547].

2. Luck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. Am J Med. 2000;108:642-9. [PMID: 10856412].[Medline]

3. Rethans JJ, Martin E, Metsemakers J. To what extent do clinical notes by general practitioners reflect actual medical performance? A study using simulated patients. Br J Gen Pract. 1994;44:153-6. [PMID: 8185988].[Medline]

4. Lloyd SS, Rissing JP. Physician and coding errors in patient records. JAMA. 1985;254:1330-6. [PMID: 3927014].[Abstract]

5. Green J, Wintfeld N. How accurate are hospital discharge data for evaluating effectiveness of care? Med Care. 1993;31:719-31. [PMID: 8336511].[Medline]

6. Lawthers AG, Palmer RH, Banks N, Garnick DW, Fowles J, Weiner JP. Designing and using measures of quality based on physician office records. J Ambul Care Manage. 1995;18:56-72. [PMID: 10139347].[Medline]

7. Rosen AK, Ash AS, McNiff KJ, Moskowitz MA. The importance of severity of illness adjustment in predicting adverse outcomes in the Medicare population. J Clin Epidemiol. 1995;48:631-43. [PMID: 7730920].[Medline]

8. Morita T, Akechi T, Sugawara Y, Chihara S, Uchitomi Y. Practices and attitudes of Japanese oncologists and palliative care physicians concerning terminal sedation: a nationwide survey. J Clin Oncol. 2002;20:758-64. [PMID: 11821458].[Abstract/Free Full Text]

9. Epstein SA, Gonzales JJ, Weinfurt K, Boekeloo B, Yuan N, Chase G. Are psychiatrists' characteristics related to how they care for depression in the medically ill? Results from a national case-vignette survey. Psychosomatics. 2001;42:482-9. [PMID: 11815683].[Abstract/Free Full Text]

10. O'Connor DW, Blessed G, Cooper B, Jonker C, Morris JC, Presnell IB, et al. Cross-national interrater reliability of dementia diagnosis in the elderly and factors associated with disagreement. Neurology. 1996;47:1194-9. [PMID: 8909429].[Abstract/Free Full Text]

11. Nordyke RJ. Determinants of PHC productivity and resource utilization: a comparison of public and private physicians in Macedonia. Health Policy. 2002;60:67-96. [PMID: 11879946].[Medline]

12. Aitken ME, Herrerias CT, Davis R, Bell HS, Coombs JB, Kleinman LC, et al. Minor head injury in children: current management practices of pediatricians, emergency physicians, and family physicians. Arch Pediatr Adolesc Med. 1998;152:1176-80. [PMID: 9856425].[Abstract/Free Full Text]

13. Gould D. Using vignettes to collect data for nursing research studies: how valid are the findings? J Clin Nurs. 1996;5:207-12. [PMID: 8718052].[Medline]

14. Furth SL, Hwang W, Yang C, Neu AM, Fivush BA, Powe NR. Relation between pediatric experience and treatment recommendations for children and adolescents with kidney failure. JAMA. 2001;285:1027-33. [PMID: 11209173].[Abstract/Free Full Text]

15. Englund L, Tibblin G, Svärdsudd K. Variations in sick-listing practice among male and female physicians of different specialities based on case vignettes. Scand J Prim Health Care. 2000;18:48-52. [PMID: 10811044].[Medline]

16. Stillman AE, Braitman LE, Grant RJ. Are critically ill older patients treated differently than similarly ill younger patients? West J Med. 1998;169:162-5. [PMID: 9771155].[Medline]

17. Fowers BJ, Applegate B, Tredinnick M, Slusher J. His and her individualisms? Sex bias and individualism in psychologists' responses to case vignettes. J Psychol. 1996;130:159-74. [PMID: 8636906].[Medline]

18. Chappel JN. Educational approaches to prescribing practices and substance abuse. J Psychoactive Drugs. 1991;23:359-63. [PMID: 1813608].[Medline]

19. Baskett SJ. Teaching psychiatry in a new medical school: a multimedia approach. South Med J. 1978;71:1507-10. [PMID: 83005].[Medline]

20. Gorter K, de Poel S, de Melker R, Kuyvenhoven M. Variation in diagnosis and management of common foot problems by GPs. Fam Pract. 2001;18:569-73. [PMID: 11739338].[Abstract/Free Full Text]

21. Dickson RA, Seeman MV, Corenblum B. Hormonal side effects in women: typical versus atypical antipsychotic treatment. J Clin Psychiatry. 2000;61(Suppl 3):10-5. [PMID: 10724128].

22. Martin C, Rohan BG. Chronic illness care as a balancing act. A qualitative study. Aust Fam Physician. 2002;31:55-9. [PMID: 11840890].[Medline]

23. Peabody J, Tozija F, Muñoz JA, Nordyke RJ, Luck J. Using vignettes to compare the quality of clinical care variation in economically divergent countries. Health Serv Res. 2004;39:1937-56. [In press].

24. Williams RG, McLaughlin MA, Eulenberg B, Hurm M, Nendaz MR. The Patient Findings Questionnaire: one solution to an important standardized patient examination problem. Acad Med. 1999;74:1118-24. [PMID: 10536634].[Medline]

25. Ram P, van der Vleuten C, Rethans JJ, Grol R, Aretz K. Assessment of practicing family physicians: comparison of observation in a multiple-station examination using standardized patients with observation of consultations in daily practice. Acad Med. 1999;74:62-9. [PMID: 9934298].[Medline]

26. Luck J, Peabody JW. Using standardised patients to measure physicians' practice: validation study using audio recordings. BMJ. 2002;325:679 [PMID: 12351358].[Abstract/Free Full Text]

27. Carney PA, Ward DH. Using unannounced standardized patients to assess the HIV preventive practices of family nurse practitioners and family physicians. Nurse Pract. 1998;23:56-8. [PMID: 9513219].[Medline]

28. Badger LW, deGruy F, Hartman J, Plant MA, Leeper J, Ficken R, et al. Stability of standardized patients' performance in a study of clinical decision making. Fam Med. 1995;27:126-31. [PMID: 7737446].[Medline]

29. Glassman PA, Luck J, O'Gara EM, Peabody JW. Using standardized patients to measure quality: evidence from the literature and a prospective study. Jt Comm J Qual Improv. 2000;26:644-53. [PMID: 11098427].

30. McGlynn EA, Asch SM, Adams J, Keesey J, Hicks J, DeCristofaro A, et al. The quality of health care delivered to adults in the United States. N Engl J Med. 2003;348:2635-45. [PMID: 12826639].[Abstract/Free Full Text]

31. Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academy Pr; 2001.

32. Cornfeld M, Miller S, Ross E, Schneider D. Accuracy of cancer-risk assessment in primary care practice. J Cancer Educ. 2001;16:193-8. [PMID: 11848666].[Medline]

33. Shields L, King SJ. Qualitative analysis of the care of children in hospital in four countries-Part 1. J Pediatr Nurs. 2001;16:137-45. [PMID: 11326401].[Medline]

34. Poses RM, De Saintonge DM, McClish DK, Smith WR, Huber EC, Clemo FL, et al. An international comparison of physicians' judgments of outcome rates of cardiac procedures and attitudes toward risk, uncertainty, justifiability, and regret. Med Decis Making. 1998;18:131-40. [PMID: 9566446].

35. Quinn T, MacDermott A, Caunt J. Determining patients' suitability for thrombolysis: coronary care nurses' agreement with an expert cardiological ‘gold standard’ as assessed by clinical and electrocardiographic ‘vignettes’. Intensive Crit Care Nurs. 1998;14:219-24. [PMID: 9849234].[Medline]

36. Shekelle PG, Kravitz RL, Beart J, Marger M, Wang M, Lee M. Are nonspecific practice guidelines potentially harmful? A randomized comparison of the effect of nonspecific versus specific guidelines on physician decision making. Health Serv Res. 2000;34:1429-48. [PMID: 10737446].[Medline]

37. Aboff BM, Collier VU, Farber NJ, Ehrenthal DB. Residents' prescription writing for nonpatients. JAMA. 2002;288:381-5. [PMID: 12117406].[Abstract/Free Full Text]

38. Dresselhaus TR, Peabody JW, Lee M, Wang MM, Luck J. Measuring compliance with preventive care guidelines: standardized patients, clinical vignettes, and the medical record. J Gen Intern Med. 2000;15:782-8. [PMID: 11119170].[Medline]

39. Alemayehu E, Molloy DW, Guyatt GH, Singer J, Penington G, Basile J, et al. Variability in physicians' decisions on caring for chronically ill elderly patients: an international study. CMAJ. 1991;144:1133-8. [PMID: 2018965].[Abstract]

40. Rothera I, Jones R, Gordon C. An examination of the attitudes and practice of general practitioners in the diagnosis and treatment of depression in older people. Int J Geriatr Psychiatry. 2002;17:354-8. [PMID: 11994890].[Medline]

41. Brann P, Coleman G, Luk E. Routine outcome measurement in a child and adolescent mental health service: an evaluation of HoNOSCA. The Health of the Nation Outcome Scales for Children and Adolescents. Aust N Z J Psychiatry. 2001;35:370-6. [PMID: 11437812].[Medline]

42. Hugo M. Mental health professionals' attitudes towards people who have experienced a mental health disorder. J Psychiatr Ment Health Nurs. 2001;8:419-25. [PMID: 11882162].[Medline]

43. Kelly WF, Eliasson AH, Stocker DJ, Hnatiuk OW. Do specialists differ on do-not-resuscitate decisions? Chest. 2002;121:957-63. [PMID: 11888982].[Abstract/Free Full Text]

44. Hazelett S, Powell C, Androulakakis V. Patients' behavior at the time of injury: effect on nurses' perception of pain level and subsequent treatment. Pain Manag Nurs. 2002;3:28-35. [PMID: 11893999].[Medline]

45. Hughes R, Huby M. The application of vignettes in social and nursing research. J Adv Nurs. 2002;37:382-6. [PMID: 11872108].[Medline]

46. Gutkind D, Ventura J, Barr C, Shaner A, Green M, Mintz J. Factors affecting reliability and confidence of DSM-III-R psychosis-related diagnosis. Psychiatry Res. 2001;101:269-75. [PMID: 11311930].[Medline]

47. Skånér Y, Strender LE, Bring J. How do GPs use clinical information in their judgements of heart failure? A clinical judgement analysis study. Scand J Prim Health Care. 1998;16:95-100. [PMID: 9689687].[Medline]

48. Friedmann PD, Brett AS, Mayo-Smith MF. Differences in generalists' and cardiologists' perceptions of cardiovascular risk and the outcomes of preventive therapy in cardiovascular disease. Ann Intern Med. 1996;124:414-21. [PMID: 8554250].[Abstract/Free Full Text]

49. Cacciola JS, Alterman AI, Fureman I, Parikh GA, Rutherford MJ. The use of case vignettes for Addiction Severity Index training. J Subst Abuse Treat. 1997;14:439-43. [PMID: 9437613].[Medline]

50. Kalf AJ, Spruijt-Metz D. Variation in diagnoses: influence of specialists' training on selecting and ranking relevant information in geriatric case vignettes. Soc Sci Med. 1996;42:705-12. [PMID: 8685738].

51. Malcolm L, Wright L, Seers M, Davies L, Guthrie J. Laboratory expenditure in Pegasus Medical Group: a comparison of high and low users of laboratory tests with academics. N Z Med J. 2000;113:79-81. [PMID: 10855584].[Medline]

52. Rethans JJ, Sturmans F, Drop R, van der Vleuten C, Hobus P. Does competence of general practitioners predict their performance? Comparison between examination setting and actual practice. BMJ. 1991;303:1377-80. [PMID: 1760606].

53. Everitt DE, Avorn J, Baker MW. Clinical decision-making in the evaluation and treatment of insomnia. Am J Med. 1990;89:357-62. [PMID: 2393038].[Medline]

54. Sandvik H. Criterion validity of responses to patient vignettes: an analysis based on management of female urinary incontinence. Fam Med. 1995;27:388-92. [PMID: 7665027].[Medline]

55. Peabody JW, Luck J, Jain S, Bertenthal D, Glassman P. Assessing the accuracy of administrative data in health information systems. Med Care. 2004;42. [In press].

56. Kleinke JD. Release 0.0: clinical information technology in the real world. Health Aff (Millwood). 1998;17:23-38. [PMID: 9916350].[Abstract]

57. Peabody JW, Luck J, Glassman P, Dresselhaus TR, Lee M. Comparison of vignettes, standardized patients, and chart abstraction: a prospective validation study of 3 methods for measuring quality. JAMA. 2000;283:1715-22. [PMID: 10755498].[Abstract/Free Full Text]

58. Fihn SD. The quest to quantify quality [Editorial]. JAMA. 2000;283:1740-2. [PMID: 10755502].[Free Full Text]

59. Nolan T, Angos P, Cunha AJ, Muhe L, Qazi S, Simoes EA, et al. Quality of hospital care for seriously ill children in less-developed countries. Lancet. 2001;357:106-10. [PMID: 11197397].[Medline]

60. McGlynn EA. There is no perfect health system. Health Aff (Millwood). 2004;23:100-2. [PMID: 15160807].[Abstract/Free Full Text]

Related articles in Annals:

Editorials
Back to the Future: Clinical Vignettes and the Measurement of Physician Performance
John Norcini
Annals 2004 141: 813-814. [Full Text]  

Summaries for Patients
The Use of Clinical Vignettes To Measure the Quality of Health Care
Annals 2004 141: I-67. [Full Text]  



This article has been cited by other articles:


Home page
Med Decis MakingHome page
C. E. Guerra, P. A. Gimotty, J. A. Shea, J. A. Pagan, J. S. Schwartz, and K. Armstrong
Effect of Guidelines on Primary Care Physician Use of PSA Screening: Results from the Community Tracking Study Physician Survey
Med Decis Making, September 1, 2008; 28(5): 681 - 689.
[Abstract] [PDF]


Home page
Med Decis MakingHome page
O. Kostopoulou, J. Oudhoff, R. Nath, B. C. Delaney, C. W. Munro, C. Harries, and R. Holder
Predictors of Diagnostic Accuracy and Safe Management in Difficult Diagnostic Problems in Family Medicine
Med Decis Making, September 1, 2008; 28(5): 668 - 680.
[Abstract] [PDF]


Home page
PediatricsHome page
S. O. Okelo, C. M. Patino, K. A. Riekert, B. Merriman, A. Bilderback, N. N. Hansel, K. Thompson, J. Thompson, R. Quartey, C. S. Rand, et al.
Patient Factors Used by Pediatricians to Assign Asthma Treatment
Pediatrics, July 1, 2008; 122(1): e195 - e201.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
N. L. Keating, M. B. Landrum, C. N. Klabunde, R. H. Fletcher, S. O. Rogers, W. R. Doucette, D. Tisnado, S. Clauser, and K. L. Kahn
Adjuvant Chemotherapy for Stage III Colon Cancer: Do Physicians Agree About the Importance of Patient Age and Comorbidity?
J. Clin. Oncol., May 20, 2008; 26(15): 2532 - 2537.
[Abstract] [Full Text] [PDF]


Home page
Health Aff (Millwood)Home page
B. Sirovich, P. M. Gallagher, D. E. Wennberg, and E. S. Fisher
Discretionary Decision Making By Primary Care Physicians And The Cost Of U.S. Health Care
Health Aff., May 1, 2008; 27(3): 813 - 823.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
K. K. Mangione, R. B Lopopolo, N. P Neff, R. L Craik, and K. M Palombaro
Interventions Used by Physical Therapists in Home Care for People After Hip Fracture
Physical Therapy, February 1, 2008; 88(2): 199 - 210.
[Abstract] [Full Text] [PDF]


Home page
Med Decis MakingHome page
S. J. Weiner, A. Schwartz, R. Yudkowsky, G. D. Schiff, F. M. Weaver, J. Goldberg, and K. B. Weiss
Evaluating Physician Performance at Individualizing Care: A Pilot Study Tracking Contextual Errors in Medical Decision Making
Med Decis Making, December 1, 2007; 27(6): 726 - 734.
[Abstract] [PDF]


Home page
StrokeHome page
C. Dumoulin, N. Korner-Bitensky, and C. Tannenbaum
Urinary Incontinence After Stroke: Identification, Assessment, and Intervention by Rehabilitation Professionals in Canada
Stroke, October 1, 2007; 38(10): 2745 - 2751.
[Abstract] [Full Text] [PDF]


Home page
Health Policy PlanHome page
J. W Peabody and A. Liu
A cross-national comparison of the quality of clinical care using vignettes
Health Policy Plan., September 1, 2007; 22(5): 294 - 302.
[Abstract] [Full Text] [PDF]


Home page
Health Aff (Millwood)Home page
K. L. Leonard and M. C. Masatu
Variations In The Quality Of Care Accessible To Rural Communities In Tanzania
Health Aff., May 1, 2007; 26(3): w380 - w392.
[Abstract] [Full Text] [PDF]


Home page
Int J EpidemiolHome page
L. Li, Z. Wu, Y. Zhao, C. Lin, R. Detels, and S. Wu
Using case vignettes to measure HIV-related stigma among health professionals in China
Int. J. Epidemiol., February 1, 2007; 36(1): 178 - 184.
[Abstract] [Full Text] [PDF]