15 September 1996 | Volume 125 Issue 6 | Pages 457-464
Objective: To 1) assess the degree of agreement among physicians on the cause of previously flagged adverse outcomes and 2) relate the findings to systems of quality assurance and performance assessment and proposals for no-fault compensation for medical injuries.
Design: Observational study of 7533 pairs of "structured implicit" reviews (subjective opinions based on guidelines) of medical records done by 127 physicians working independently.
Setting: Random sample of 51 inpatient facilities in New York State.
Patients: Random sample of inpatient medical records from the selected facilities.
Measurements: 1] Number of agreed-upon adverse events compared with the number of cases of extreme disagreement and 2) internally and indirectly standardized rates at which physician reviewers found adverse events (injuries to patients caused at least in part by medical management).
Results: In 12.9% of cases (971 of 7533), the two physicians in a pair had extreme disagreement about the occurrence of an adverse event. These cases outnumbered those in which both reviewers found an adverse event (10%; n = 757). Agreement was highest for wound infections and lowest for adverse events attributed to failure to diagnose or lack of therapy. The amount of experience the physicians had in reviewing records tended to increase the level of agreement. Even after standardization to the results of the entire sample, individual physicians' rates of finding at least slight evidence of an adverse event varied widely (range, 9.9% to 43.7%) (P < 0.001).
Conclusions: Structured implicit reviews produced disagreement on the causes of adverse patient outcomes. If systems of quality assurance, performance audits, or no-fault patient compensation are to succeed, methods for overcoming the common tendency toward disagreement among experts must be developed.
Case review also underlies current and proposed systems of compensating patients for injuries caused by medical care. Under the current litigation system, the patient must prove, with the support of expert medical opinion, that medical care contributed to the injury (causation) and fell below the standards of practice in the community (negligence). Under proposed "no-fault" alternatives to litigation, entitlement to compensation and liability for payment might also depend on an expert's opinion as to whether the patient's outcome was caused by medical care rather than by a preexisting disease or condition [1-4].
Critics have identified several problems with case review. First, experts cannot form a consensus about which outcomes are adverse. Second, medical technology changes rapidly and creates uncertainty about the appropriateness and effectiveness of practices. Third, administrative or transaction costs in making individualized determinations of causation might be high [5-8]. The American College of Physicians [9] and others [10] have called for further demonstration projects.
Building on previous research on the reliability of clinical judgments, we used a large sample of physician reviews of medical records to estimate the degree of agreement on the cause of adverse patient outcomes. We also discuss the implications of the results for quality assurance, performance assessment, and proposals for no-fault patient compensation.
Other aspects of the Medical Practice Study and the general methods have been widely reported [11-15]. The following methods are relevant to our report.
Record Review
Records were reviewed in two stages. In stage 1 (which is not the subject of this report), nurses and medical records administrators used a single review per case to screen the entire sample of records for the presence of 1 or more of 18 explicit criteria (Figure 2). These criteria were based primarily on previous research [16] and were revised by the physician investigators of the Medical Practice Study. Although explicit, the criteria were broad and open to interpretation. The nurses and records administrators received an extensive manual, which contained detailed examples of the criteria, and 2 hours of focused classroom training from team leaders chosen for this project. To increase the efficiency and accuracy of screening, the nurses and records administrators used preprinted forms generated by the project management team. Nurses were instructed to refer any questionable cases for stage 2 review. Questions of a more general nature were referred to supervisors and then to the project office for consistent responses. The estimated negative predictive value of the screening was 99.5% [17]. ARTICLE
Identifying Adverse Events Caused by Medical Care: Degree of Physician Agreement in a Retrospective Chart Review
Retrospective case review has long been a main-stay of peer review. It supports scientific studies and medical audits as well as assessments of the appropriateness, effectiveness, and quality of health care provided by physicians, hospitals, or regions. As part of quality assurance, hospitals and clinics regularly use formal and informal case review. Insurers and managed care organizations rely on case review when making decisions about coverage. All forms of case review depend heavily on expert opinion.
Methods
![]()
Top
Methods
Results
Discussion
Author & Article Info
References
Cases were obtained from the Medical Practice Study, a project designed to estimate the rate of adverse events occurring among inpatients in a random sample of 31 429 medical records from 51 health care facilities in New York State. We defined an adverse event as an injury that 1) was caused at least in part by medical management and 2) required or prolonged hospitalization or led to disability after discharge. The injury could result from a provider's action or inaction in either inpatient or outpatient settings or from a drug or medical device. The medical management did not have to be substandard or inappropriate; the injury could follow an unexpected complication. Adverse outcomes caused solely by underlying disease or by the intended consequences of treatment were not considered to be adverse events. For example, an injury to the recurrent laryngeal nerve during partial thyroidectomy (an unplanned and unintended but recognized complication) would be considered an adverse event, but the intentional destruction of the same nerve in a radical thyroid resection for cancer would not. A broken experimental balloon that led to an embolus and stroke during cardiac catheterization would, as a complication of treatment, also be an adverse event, especially if the patient's risk was unknown. This result would apply even in a study approved by a Human Subjects Committee.
|
In stage 2, each record that had or may have had at least one criterion present was further analyzed by two physicians who worked independently. Physicians were recruited primarily from New York State through a network of personal contacts of the study investigators. The physicians could not review records at the hospitals in which they practiced. Most were board certified in surgery (23%) or internal medicine (68%); the remaining were certified in obstetrics and gynecology, family practice, pediatrics, urology, or emergency medicine. Eighty-five percent were male. Most physicians were in the early stage of their careers: Fifty-five percent had received board certification within the 10 years before the study began. All physicians had telephone access to a panel of experts.
A separate manual and a structured abstraction form guided the stage 2 review. As described previously [17], both were revised repeatedly after extensive pilot testing. This 65-page manual included explicit instructions on several types of adverse events. According to the manual, for example, all surgical wound infections were "almost invariably" adverse events, as were all falls and all drug reactions that prolonged hospitalization or caused disability. A 14-page abstraction form first asked the physician reviewer to assess whether an adverse event might have occurred. If the physician found "no possible adverse event," the review was stopped and the case received a score of 0. If an adverse event might have occurred, the reviewer considered a list of factors on the cause of the injury and rated his or her confidence about the occurrence of an adverse event on an interval scale of 1 to 6 (Figure 1). For a confidence score of 2 ("slight to modest evidence" of an adverse event) or greater, the reviewer indicated the type of event (fall, drug reaction, wound infection, error of omission, or failure to diagnose), the number of additional days of hospitalization (if applicable), and the degree of disability over and above the underlying disease. Finally, the reviewers considered whether the error amounted to negligence. Within this structure, however, the physician could be discreet in judging the cause of the injury, hospitalization, or disability (a "structured implicit" review). All physician reviewers identified themselves by number, with the understanding that their confidential opinions would not be used for quality assurance, peer review, or litigation. Copies of the abstraction booklet are available from the authors.
|
Our report focuses on the two independent expert opinions obtained during stage 2 review as to whether an adverse outcome identified during stage 1 had been caused at least in part by medical management. Results of each assessment of causation were linked to the patient's computerized discharge data summary to identify the patient's age, diagnosis, and discharge status.
Statistical Analysis
Agreement between Reviewers
We calculated a rate of agreement between the two physician reviewers in each pair on adverse events using a statistic described by Grant [18] for assessing agreement on abnormal tracings from electronic fetal monitoring. In our application, the numerator of this statistic was the number of cases in which both reviewers assessed their confidence in an adverse event as "more likely than not" or greater. This assessment corresponded to a score of 4, 5, or 6. The denominator was the sum of the numerator and the number of cases of extreme disagreement, for which one reviewer scored the case as 4, 5, or 6 and the other physician found "no possible adverse event" (a score of 0). The statistic therefore compared the number of cases with agreed-upon adverse events with the number of clear disagreements.
This statistic does not include cases for which both physicians agreed that no adverse event had occurred. It recognizes that agreement about whether a patient's condition is normal (no adverse event) is usually greater than agreement about whether a patient has disease or an abnormal condition [19-22]. The statistic is also not affected by the number of clearly normal cases in the samples of cases for review. In our study, the number of cases clearly without adverse events at stage 2 was influenced by the coarseness of the previous screening process. The stage 1 reviewers were cautioned to avoid false-negative determinations if they were in doubt, so that adverse events would not be overlooked.
This statistic also facilitated comparisons of rates of agreement across subsets of such adverse events as drug reactions, which are defined only if one or both physicians found and described the event. Clearly normal cases (for which both physicians found no possible adverse event) could not be categorized in this manner. In addition, the statistic permitted comparisons between our data and those of similar studies that had different designs and prevalences of abnormal findings (adverse events, preventable deaths, drug reactions, or quality problems). For all rates, we calculated exact binomial CIs [23], which are slightly wider than those calculated using the normal approximation [24].
We also computed the commonly used weighted
statistic [25]; we recognized, however, that this does not distinguish among different patterns of agreement [26] and might be unsuitable for comparisons across studies [27]. Weighted
statistics vary with the prevalence of the result being scored [28], depend on the choice of weight, and are sensitive to the study design (that is, to the number of clearly normal persons in the sample [29]).
Reviewer Calibration
We define calibration as a reviewer's underlying propensity for finding an adverse event [30, 31]. When the cases being reviewed have varying degrees of evidence about the cause of injury, reviewers with different calibrations will necessarily disagree about some cases. Only reviewers with common calibrations have a chance of perfect agreement, and even then they might disagree on individual cases.
The unit of analysis was the review rather than the case. Each physician's rate of finding adverse events was standardized [32] by using a regression model to account for differences across cases. We selected physician confidence scores of 0 or 1 (no possible adverse event or little or no evidence of management-related causation of an adverse event) to reflect the absence of an adverse event and scores of 2 through 6 to indicate the occurrence of an adverse event. The use of a single cut-point allowed for a relatively simple binary-outcome regression model of adverse events. We used this particular cut-point because the reviewers had to describe only adverse events that had a score of 2 or more; some reviewers appear to have used scores of 0 and 1 as equivalents. Cases scored as 2 or 3, reflecting that the reviewer believed an adverse event had occurred but that he or she had a low level of confidence in this opinion, were lumped with cases for which opinions on causation were stronger (scores of 4, 5, and 6). Other cut-points were then used to assess the sensitivity to this choice.
Candidate regression covariates selected on the basis of our previous knowledge on predictors of adverse events [11, 14] included hospital (or, alternatively, hospital ownership, teaching status, and location), patient age, survival at hospital discharge, length of stay in the hospital, diagnosis group, race, reimbursement source, and the median income of the ZIP code of the patient's residence. Initial modeling with logistic regression resulted in an estimate for each reviewer of the expected number of adverse events, adjusted for the mix of cases. We then applied empirical Bayes methods [33] through mixed-effects logistic regression to compensate for two problems: 1) the natural tendency for physicians who had done fewer reviews to have more variable estimates than would physicians who had done many reviews and 2) the implicit multiple comparisons of repeated testing of each physician against the entire group [34, 35]. The mixed model included the predictors of an adverse event as fixed effects and the physician reviewers as random effects. All calculations were done using the SAS statistical package (SAS Institute, Cary, North Carolina), and empirical Bayes methods were done using a specialized program within the SAS package [36].
Results
|
|---|
|
|
|---|
Working independently, 127 paired physicians assessed 7533 inpatient admissions for the occurrence of an adverse event (Table 1). A total of 15 066 reviews were done. On average, physicians found that an event was at least "more likely than not" to have been caused by medical management in 18% of reviews (2764 of 15 066). There were more cases of extreme disagreement on the occurrence of an adverse event than there were cases for which both reviewers found an adverse event. The paired physicians strongly disagreed in 12.9% of cases (971 of 7533): One physician found that the event was at least "more likely than not" to have been caused by medical management, and the other found no possible adverse event. In 10% of cases (n = 757), the two physicians agreed on the occurrence of an adverse event; their scores indicated a confidence level of "more likely than not" that management caused an adverse event (Figure 1). The rate of agreement on the presence of an adverse event for all cases was 0.44 (757/[757 + 971]). After adjustment for the sampling weights, this rate was 0.42 (95% CI, 0.39 to 0.44). For comparison purposes, this sample of reviews has a chance rate of agreement of 0.11, which represents the chance number of cases for which both reviewers found adverse events, divided by the sum of chance agreements on adverse events plus the chance cases of extreme disagreement.
|
Type of Adverse Event
Overall, the rates of agreement differed significantly among the types of cases specified by the reviewers (P < 0.001) (Figure 2). Wound infections showed the highest rate (0.62 [CI, 0.56 to 0.69]). Drug reactions and falls had markedly lower rates of agreement (0.48 [CI, 0.42 to 0.54] and 0.37 [CI, 0.17 to 0.61], respectively). However, the sample size makes the rate for falls unstable. Rates of agreement were lowest when reviewers concluded that the event was caused by a failure to diagnose (0.32 [CI, 0.26 to 0.38]) or the omission of therapy (0.24 [CI, 0.18 to 0.31]).
Reviewer Experience
We found greater rates of agreement within pairs of physicians who had reviewed many cases on this project before the date of their review of the case of interest. For example, when both reviewers had previously reviewed 200 cases (n = 27), the rate of agreement was 0.62 (205 of 332) (CI, 0.56 to 0.67). This figure, however, was heavily influenced by 2 of these 27 physicians. Exclusion of these 2 physicians' 494 joint reviews reduced the rate of agreement among the most experienced physicians to 0.50 (113 of 227) (CI, 0.43 to 0.56). The
statistic was 0.57 (CI, 0.50 to 0.63). This rate remained significantly higher than that for cases assessed by less experienced reviewers (0.37 [CI, 0.35 to 0.40]; P < 0.001).
These results, however, were not always consistent. Of the 7533 pairs of reviews, 237 were completed by pairs of the seven "senior" physicians who helped to design this study, the abstracting forms, and the guidelines for the review. All seven were thoroughly familiar with the processes and goals of the study. For this subsample of cases, the rate of agreement was similar to that for all cases: 0.41 (15 of 37) (CI, 0.25 to 0.66). The
statistic was 0.50 (CI, 0.34 to 0.66).
Reviewer Calibration
|
|---|
The observed and expected numbers of adverse events across the 127 physician reviewers were markedly different when compared using a chi-square test (P < 0.001) or computer simulations (P < 0.001). This large variation in reviewer-specific adjusted rates persisted regardless of the cut-point used to delineate a finding of an adverse event along the ordered scale of confidence in the reviewer's finding. An empirical Bayes analysis Table 2 identified 18 of the 127 reviewers as statistical "outliers;" the standardized rates at which these 18 physicians found adverse events were significantly higher or lower than average (on the basis of a critical P value of 0.05) and ranged from 9.9% to 43.7%. Several of these outlier physicians were experienced reviewers. Four of the 10 low outliers (who had low rates of finding adverse events) and 2 of the 8 high outliers each completed more than 200 reviews. A third, high outlier was one of the senior physicians who helped to design and guide the study. Thus, experience did not eliminate the variation in the propensity to find adverse events.
|
Discussion
|
|---|
|
|
|---|
By applying a simple statistic to data shown in a common format Figure 1, we could compare the number of cases for which both reviewers agreed on the occurrence of an adverse event with the number of extreme disagreements. Readers might define adverse events (in terms of confidence score) or extreme disagreement differently than we have. For that reason, we advise that raw data rather than arbitrary or common statistics of agreement be reported.
Rates of agreement varied across subsets of cases. Agreement was greatest for wound infections, which were covered by specific guidelines and should be clearly associated with the site and time of surgery. Falls should also be relatively easy to identify, except in cases in which patients have not complied with orders for bed rest or medications. Our finding of the lower rate of agreement with drug reactions is consistent with findings of earlier studies [48-50] that relied heavily on unguided expert opinion on a question of "inescapable difficulties and complexities" [51]. As a recent report suggests [52], adverse drug events might now be described with greater agreement than our study and others have shown. Reproducibility and validity of judgments on drug reactions can be improved with the use of diagnostic tests [53], focused algorithms [54, 55], or consensus conferences [56].
In contrast, adverse events caused by omitted therapy or failure to diagnose a treatable disease or condition are, by nature, more complex. In these cases, the underlying disease rather than the medical intervention caused the adverse outcome; the reviewer must conclude that appropriate therapy or timely diagnosis could have prevented or cured the disease or caused it to enter remission. Similar findings have been seen in studies on preventable deaths, in which trauma clearly causes the death and the aim of medical care is timely, appropriate intervention. Clinical experiments designed specifically to test whether physicians agree on which deaths are preventable have found case review to be unreliable [47]; independent reviewers often arrive at opposite conclusions [46, 57] in part because they differ in their prognoses of critically ill patients [58]. In our study, the statewide population estimate of cases of extreme disagreement on whether failure to diagnose or omission of therapy caused injuries was 28 800. This represents 32% of all cases of marked disagreement and is almost half the population estimate of the annual number of cases for which reviewers agreed that an adverse event had occurred (60 400) (Figure 2). The U.S. legal system has long recognized the difficulty of determining causation in these cases [59].
The large number of reviewers in our study allowed us to assess variation in physicians' propensity to find adverse events, that is, differences in calibrations [60]. A lack of reviewer experience or understanding cannot explain this degree of variation, nor can the deficiencies in the medical record (all reviewers worked with the same materials). The explanation must lie in differences in previous expectations, in the diagnostic criteria being applied, or in the inability of some reviewers to avoid the hindsight bias of knowing that an unfavorable outcome has already occurred [61, 62]. In a similar retrospective analysis of hospital records done in New York State 20 years ago, Richardson [63] reported that "it would appear that some judges were consistently lenient in their judgments of care quality, whereas others, with the same apparent consistency, appeared to be strict." Rates of agreement cannot be high unless the expert reviewers are uniformly calibrated.
The design of a study can limit its generalizability. In particular, the sample of records that the physician reviewers examined depended on the stage 1 screening done by nurses. Previous reports indicate that the number of adverse events missed at stage 1 was low, but our design could have generated a biased sample for review. In addition, although our study describes reviewer variation in a particular clinical exercise and reports possible reasons for differences in levels of agreement, it was not designed (as are some studies [64, 65]) to elicit the sources of variation. Although both physicians in a pair based their opinions on the same screening criterion flagged previously by a nurse Figure 2, we could not determine whether the reviewers disagreed because one physician could not locate the relevant supporting information in a disorganized medical record or whether both physicians found the same facts but had different opinions about their implications [66, 67]. Adding more information (for example, from medical records established before hospitalization or autopsy records) might improve reliability [47]. One study suggests, however, that reviewers will remain confident about their disparate opinions despite limitations in the record [45].
Our study also could not measure the effect of discussion and building consensus among reviewers on levels of agreement. Two studies [47, 68] found only marginal improvement in agreement when the team approach was formally tested. Finally, our reviewers were primarily general internists or surgeons rather than specialists in the problems before them. Although the reviewers had telephone access to specialists, few made use of it. Further research should focus on whether specialists agree more often than do generalists.
Our analysis of physician calibration on adverse events has a shortcoming common to statistical models. These models often cannot completely adjust for differences in patients' illnesses. Because the odds of an adverse event vary with illness, an underspecified regression model might overstate the true degree of variation among reviewers in their propensity for finding adverse events. However, the empirical Bayes methods we used tended to compensate for incomplete adjustment.
Systems of performance review, quality assurance, or patient compensation based on case review continue to face the problem of "observer variability" among physicians in various tasks [69, 70]. Especially difficult are judgments on the cause of adverse outcomes, because the physician must often subjectively assess what has not occurred: the patient's outcome with different treatment [71]. For that reason, a previous study called for an end to subjective judgment on preventable trauma-related deaths [68]. With sufficiently large samples of adverse events, the level of agreement reported in our study and other studies [72, 73] might support comparisons of large groups of patients. However, that rate of agreement falls short of the rate considered necessary for making decisions about quality and accountability in individual cases [22, 74]. As some have proposed [75-77], the challenge is to design objective criteria, algorithms, or guidelines that will increase agreement on the cause of suboptimal outcomes. The manner in which disagreements are resolved can influence the number of adverse outcomes attributed to medical management. For example, a decision rule requiring a unanimous panel opinion on the presence of management-related causation might protect health care providers from unjustified censure or liability [46], but at the expense of a patient seeking compensation or coverage for injuries. Areas that must still be studied are questions on the optimal number, training, and qualifications of reviewers; the need for tiers of reviews; and the scope and quality of supporting medical records and documentation.
|
Drs. Lawthers and Brennan: Department of Health Policy and Management, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115.
Dr. Hebert: Rush Institute on Aging and Rush Alzheimer's Disease Center, Rush University and Rush-Presbyterian-St. Luke's Medical Center, 1645 West Jackson Boulevard, Suite 675, Chicago, IL 60612.
Ms. Sharp: Department of Biostatistics, CB 7400, University of North Carolina, Chapel Hill, NC 27599-7400.
Author and Article Information
|
|---|
|
|
|---|
References
|
|---|
|
|
|---|
1. American Medical Association. A proposed alternative to the civil justice system for resolving medical liability disputes: a fault-based, administrative system. Chicago: American Med Assoc; 1988.
2. Lyall S. Cuomo proposes a fund for injured newborns. The New York Times. 21 April 1993; 142:B1, B8.
3. Havighurst CC. Reforming malpractice law through consumer choice. Health Aff (Milwood). 1984; 3:63-70.
4. Epstein RA. Medical malpractice, imperfect information and the contractual foundation for medical services. Law and Contemporary Problems. 1986; 49:201-12.
5. O'Connell J. An elective no-fault liability statute. Insurance Law Journal. 1975; 628:261-93.
6. Institute of Medicine, Division of Legal, Ethical, and Educational Aspects of Health. Beyond Malpractice: Compensation for Medical Injuries. Washington, DC: National Academy of Sciences; 1978.
7. Fleming J. Is there a future for tort? Louisiana Law Review. 1984; 44:1193-212.
8. Moore H, O'Connell J. Foreclosing medical malpractice claims by prompt tender of economic loss. Louisiana Law Review. 1984; 44:1495-506.
9. Beyond MICRA: new ideas for liability reform. American College of Physicians. Ann Intern Med. 1996; 122:466-73.
10. Wadlington W. Medical injury compensation. A time for testing new approaches [Editorial]. JAMA. 1991; 265:2861.
11. Brennan TA, Leape LL, Laird NM, Hebert LE, Localio AR, Lawthers AG, et al. Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I. N Engl J Med. 1991; 324:370-6.
12. Leape LL, Brennan TA, Laird N, Lawthers AG, Localio AR, Barnes BA, et al. The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. N Engl J Med. 1991; 324:377-84.
13. Localio AR, Lawthers AG, Brennan TA, Laird NM, Hebert LE, Peterson LM, et al. Relation between malpractice claims and adverse events due to negligence. Results of the Harvard Medical Practice Study III. N Engl J Med. 1991; 325:245-51.
14. Brennan TA, Hebert LE, Laird NM, Lawthers A, Thorpe KE, Leape LL, et al. Hospital characteristics associated with adverse events and substandard care. JAMA. 1991; 265:3265-9.
15. Johnson WG, Brennan TA, Newhouse JP, Leape LL, Lawthers AG, Hiatt HH. The economic consequences of medical injuries. Implications for a no-fault insurance plan. JAMA. 1992; 267:2487-92.
16. California Medical Association. Report on the Medical Insurance Feasibility Study. San Francisco: California Med Assoc; 1977.
17. Harvard Medical Practice Study. Patients, Doctors, and Lawyers: Medical Injury, Malpractice Litigation, and Patient Compensation in New York. Cambridge, MA: President and Fellows of Harvard College; 1990.
18. Grant JM. The fetal heart rate trace is normal, isn't it? Observer agreement of categorical assessments. Lancet. 1991; 337:215-8.
19. Koran LM. The reliability of clinical methods, data, and judgments (second of two parts). N Engl J Med. 1975; 293:695-701.
20. Bader JD, Shugars DA. Agreement among dentists' recommendations for restorative treatment. J Dent Res. 1993; 72:891-6.
21. Geffen N, Darnborough A, De Dombal FT, Watkinson G, Goligher JC. Radiological signs of ulcerative colitis: assessment of their reliability by means of observer variation studies. Gut. 1968; 9:150-6.
22. Rubin HR, Rogers WH, Kahn KL, Rubenstein LV, Brook RH. Watching the doctor-watchers. How well do peer review organization methods detect hospital care quality problems? JAMA. 1992; 267:2349-54.
23. Daly L. Simple SAS macros for the calculation of exact binomial and Poisson confidence limits. Comput Biol Med. 1992; 22:351-61.
24. Altman DG. Practical Statistics for Medical Research. London: Chapman and Hall; 1991:161.
25. Fleiss JL. Statistical Methods for Rates and Proportions. 2d ed. New York: Wiley; 1981:212-25.
26. Graham P, Jackson R. The analysis of ordinal agreement data: beyond weighted
. J Clin Epidemiol. 1993; 46:1055-62.
27. Byrt T, Bishop J, Carlin JB. Bias, prevalence, and
. J Clin Epidemiol. 1993; 46:423-9.
28. Cicchetti DV, Feinstein AR. High agreement but low
: II. Resolving the paradoxes. J Clin Epidemiol. 1990; 43:551-8.
29. Localio AR, Weaver SL, Landis JR, Lawthers AG, Brennan TA, Hebert L, et al. Clinical Decisionmaking on Adverse Events in Medical Care: Measuring Agreement and Implications for Professional Accountability. Final Report. Washington, DC: Agency for Health Care Policy and Research; 30 June 1993.
30. Poses RM, Cebul RD, Centor RM. Evaluating physicians' probabilistic judgments. Med Decis Making. 1988; 8:233-40.
31. Lichtenstein S, Fischhoff B, Phillips LD. Calibration of probabilities: the state of the art to 1980. In: Kahneman D, Slovic P, Tversky A, eds. Judgment Under Uncertainty: Heuristics and Biases. Cambridge, England: Cambridge Univ Pr; 1982:306-34.
32. Wilcosky TC, Chambless LE. A comparison of direct adjustment and regression adjustment of epidemiologic measures. J Chronic Dis. 1985; 10:849-56.
33. Casella G. An introduction to empirical Bayes data analysis. American Statistician. 1985; 39:83-7.
34. Davis CE, Leffingwell DP. Empirical Bayes estimates of subgroup effects in clinical trials. Control Clin Trials. 1990; 11:37-42.
35. Harris EK, Shakarki G. Use of the population distribution to improve estimation of individual means in epidemiological studies. J Chronic Dis. 1979; 32:233-43.
36. Wolfinger R, O'Connell M. Generalized linear mixed models: a pseudo-likelihood approach. Journal of Statistical Computation and Simulation. 1993; 48:233-43.
37. The oversight of medical care: a proposal for reform. American College of Physicians. Ann Intern Med. 1994; 120:423-31.
38. Park RE, Fink A, Brook RH, Chassin MR, Kahn KL, Merrick NJ, et al. Physician ratings of appropriate indications for six medical and surgical procedures. Am J Public Health. 1986; 76:766-72.
39. Brook RH, Kosecoff JB, Park RE, Chassin MR, Winslow CM, Hampton JR. Diagnosis and treatment of coronary disease: comparison of doctors' attitudes in the USA and the UK. Lancet. 1988; 1:750-3.
40. McClennan M, Brook RH. Appropriateness of care. A comparison of global and outcome methods to set standards. Med Care. 1992; 30:565-86.
41. Park RE, Fink A, Brook RH, Chassin MR, Kahn KL, Merrick NJ, et al. Physician ratings of appropriate indications for three procedures: theoretical indications vs indications used in practice. Am J Public Health. 1989; 79:445-7.
42. Goldman RL. The reliability of peer assessments of quality of care. JAMA. 1992; 267:958-60.
43. Dippe SE, Bell MM, Wells MA, Lyons W, Clester S. A peer review of a peer review organization. West J Med. 1989; 151:93-6.
44. Macnee CL, Penchansky R. Targeting ambulatory care cases for risk management and quality management. Inquiry. 1994; 31:66-75.
45. Hayward RA, McMahon LF Jr, Bernard AM. Evaluating the care of general medicine inpatients: how good is implicit review? Ann Intern Med. 1993; 118:550-6.
46. Dubois RW, Brook RH. Preventable deaths: who, how often, and why? Ann Intern Med. 1988; 109:582-9.
47. MacKenzie EJ, Steinwachs DM, Bone LR, Floccare DJ, Ramzy Al. Interrater reliability of preventable death judgments. The Preventable Death Study Group. J Trauma. 1992; 33:292-303.
48. Danan G, Benichou C. Causality assessment of adverse reactions to drugsI. A novel method based on the conclusions of international consensus meetings: application to drug-induced liver injuries. J Clin Epidemiol. 1993; 11:1323-30.
49. Karch FE, Smith CL, Kerzner B, Mazzullo JM, Weintraub M, Lasagna L. Adverse drug reactionsa matter of opinion. Clin Pharmacol Ther. 1976; 19:489-92.
50. Koch-Weser J, Sellers EM, Zacest R. The ambiguity of adverse drug reactions. Eur J Clin Pharmacol. 1977; 11:75-8.
51. Kramer MS. Difficulties in assessing the adverse effects of drugs. Br J Clin Pharmacol. 1981; 11(Suppl 1):105S-10S.
52. Bates DW, Cullen DJ, Laird N, Petersen LA, Small SD, Servi D, et al. Incidence of adverse drug events and potential adverse drug events. Implications for prevention, ADE Prevention Study Group. JAMA. 1995; 274:29-34.
53. Naranjo CA, Shear NH, Lanctot KL. Advances in the diagnosis of adverse drug reactions. J Clin Pharmacol. 1992; 32:897-904.
54. Kramer MS, Leventhal JM, Hutchinson TA, Feinstein AR. An algorithm for the operational assessment of adverse drug reactions. I. Background, description, and instructions for use. JAMA. 1979; 242:623-32.
55. Leventhal JM, Hutchinson TA, Kramer MS, Feinstein AR. An algorithm for the operational assessment of adverse drug reactions. III. Results of tests among clinicians. JAMA. 1979; 242:1991-4.
56. Benichou C, Danan G, Flahault A. Causality assessment of adverse reactions to drugsII. An original model for validation of drug causality assessment methods: case reports with positive rechallenge. J Clin Epidemiol. 1993; 11:1331-6.
57. Best WR, Cowper DC. The ratio of observed-to-expected mortality as a quality of care indicator in non-surgical VA patients. Med Care. 1994; 4:390-400.
58. Poses RM, Bekes C, Copare FJ, Scott WE. The answer to What are my chances, doctor? depends on whom is asked: prognostic disagreement and inaccuracy for critically ill patients. Crit Care Med. 1989; 17:827-33.
59. Toth v. Community Hospital at Glen Cove. 22 N.Y. 2d 255, 261, 239 N.E. 2d 368 (1968).
60. Lichtenstein S, Fischoff B, Phillips LD. Calibration of probabilities: the state of the art. In: Kahneman D, Slovic P, Tversky A. Judgment Under Uncertainty: Heuristics and Biases. Cambridge, England: Cambridge, England: Cambridge Univ Pr; 1982:306-34.
61. Caplan RA, Posner KL, Cheney FW. Effect of outcome on physician judgments of appropriateness of care. JAMA. 1991; 265:1957-60.
62. Schroeder SA, Kabcenell AI. Do bad outcomes mean substandard care? [Editorial]. JAMA. 1991; 265:1995.
63. Richardson FM. Peer review of medical care. Med Care. 1972; 10:29-39.
64. de Vet HC, Knipschild PG, Schouten HJ, Koudstaal J, Kwee WS, Willebrand D, et al. Sources of interobserver variation in histopathological grading of cervical dysplasia. J Clin Epidemiol. 1992; 45:785-90.
65. Musch DC, Higgins IT, Landis JR. Some factors influencing interobserver variation in classifying simple pneumoconiosis. Br J Ind Med. 1985; 42:346-9.
66. Stratton KR, Howe CJ, Johnston RB, eds. Adverse Events Associated with Childhood Vaccines: Evidence Bearing on Causality. Washington, DC: National Academy Pr; 1994:19-33.
67. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. 2d ed. Boston: Little, Brown; 1991:36.
68. Wilson DS, McElligott J, Fielding LP. Identification of preventable trauma deaths: confounded inquiries? J Trauma. 1992; 32:45-51.
69. Feinstein AR. A bibliography of publications on observer variability. J Chronic Dis. 1985; 38:619-38.
70. Elmore JG, Feinstein AR. A bibliography of publications on observer variability (final installment). J Clin Epidemiol. 1992; 45:567-80.
71. Kramer MS, Lane DA. Causal propositions in clinical research and practice. J Clin Epidemiol. 1992; 45:639-49.
72. Rubenstein LV, Kahn KL, Reinisch EJ, Sherwood MJ, Rogers WH, Kamberg C, et al. Changes in quality of care for five diseases measured by implicit review, 1981 to 1986. JAMA. 1990; 264:1974-9.
73. Richards T, Lurie N, Rogers WH, Brook RH, Meredith L, Monsivais GI, et al. Measuring Case Mix and Quality of Care. Rater Training and Reliability in the Graduate Medical Education Study. Santa Monica, CA: Rand; 1987.
74. Kahn KL, Draper D, Keeler EB, Rogers WH, Rubenstein LV, Kosecoff J, et al. The Effects of the DRG-based Prospective Payment System on Quality of Care for Hospitalized Medicare Patients. Final Report. Santa Monica, CA: Rand; 1992:83.
75. Havighurst C, Tancredi L. Medical Adversity Insurancea no-fault approach to medical malpractice insurance and quality assurance. Milbank Mem Fund Q Health Soc. 1973; 51:125-68.
76. Commission on Medical Professional Liability, American Bar Association. Designated Compensable Event System: A Feasibility Study. New York: American Bar Assoc; 1979.
77. Bovbjerg RR, Tancredi LR, Gaylin DS. Obstetrics and malpractice. Evidence on the performance of a selective no-fault system. JAMA. 1991; 265:2836-43.
This article has been cited by other articles:
![]() |
T K Nuckols, D S Bell, S M Paddock, and L H Hilborne Contributing factors identified by hospital incident report narratives Qual. Saf. Health Care, October 1, 2008; 17(5): 368 - 372. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Charles, D. Ranson, M. Bohensky, and J. E. Ibrahim Under-reporting of deaths to the coroner by doctors: a retrospective review of deaths in two hospitals in Melbourne, Australia Int. J. Qual. Health Care, August 1, 2007; 19(4): 232 - 236. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. K Nuckols, D. S Bell, H. Liu, S. M Paddock, and L. H Hilborne Rates and types of events reported to established incident reporting systems in two US hospitals Qual. Saf. Health Care, June 1, 2007; 16(3): 164 - 168. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Crosby Medical malpractice and anesthesiology: literature review and role of the expert witness: [Fautes medicales et anesthesiologie : revue de la litterature et role du temoin expert] Can J Anesth, March 1, 2007; 54(3): 227 - 241. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Haller, P. S. Myles, J. Stoelwinder, M. Langley, H. Anderson, and J. McNeil Integrating Incident Reporting into an Electronic Patient Record System J. Am. Med. Inform. Assoc., March 1, 2007; 14(2): 175 - 181. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. N. Meurer, H. Yang, C. E. Guse, C. Russo, K. J. Brasel, and P. M. Layde Excess mortality caused by medical injury. Ann. Fam. Med, September 1, 2006; 4(5): 410 - 416. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jagsi, B. T. Kitch, D. F. Weinstein, E. G. Campbell, M. Hutter, and J. S. Weissman Residents Report on Adverse Events and Their Causes Arch Intern Med, December 12, 2005; 165(22): 2607 - 2613. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Brennan, A. Gawande, E. Thomas, and D. Studdert Accidental deaths, saved lives, and improved quality. N. Engl. J. Med., September 29, 2005; 353(13): 1405 - 1409. [Full Text] [PDF] |
||||
![]() |
R. R. Bovbjerg and L. R. Tancredi Liability Reform Should Make Patients Safer: "Avoidable Classes of Events" are a Key Improvement J. Law Med. Ethics, September 1, 2005; 33(3): 478 - 500. [PDF] |
||||
![]() |
A. Garland Improving the ICU: Part 1 Chest, June 1, 2005; 127(6): 2151 - 2164. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Garland Improving the ICU: Part 2 Chest, June 1, 2005; 127(6): 2165 - 2179. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. N. Weingart Beyond Babel: prospects for a universal patient safety taxonomy Int. J. Qual. Health Care, April 1, 2005; 17(2): 93 - 94. [Full Text] [PDF] |
||||
![]() |
S. N. Weingart, M. Toth, J. Eneman, M. D. Aronson, D. Z. Sands, A. N. Ship, R. B. Davis, and R. S. Phillips Lessons from a patient partnership intervention to prevent adverse drug events Int. J. Qual. Health Care, December 1, 2004; 16(6): 499 - 507. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. R. Baker, P. G. Norton, V. Flintoft, R. Blais, A. Brown, J. Cox, E. Etchells, W. A. Ghali, P. Hebert, S. R. Majumdar, et al. The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada Can. Med. Assoc. J., May 25, 2004; 170(11): 1678 - 1686. [Abstract] [Full Text] |
||||
![]() |
B. J. Isetts, L. M. Brown, S. W. Schondelmeyer, and L. A. Lenarz Quality Assessment of a Collaborative Approach for Decreasing Drug-Related Morbidity and Achieving Therapeutic Goals Arch Intern Med, August 11, 2003; 163(15): 1813 - 1820. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. J. Murff, A. J. Forster, J. F. Peterson, J. M. Fiskio, H. L. Heiman, and D. W. Bates Electronically Screening Discharge Summaries for Adverse Medical Events J. Am. Med. Inform. Assoc., July 1, 2003; 10(4): 339 - 350. [Abstract] [Full Text] [PDF] |
||||
![]() |
G Neale and M Woloshynowych Retrospective case record review: a blunt instrument that needs sharpening Qual. Saf. Health Care, February 1, 2003; 12(1): 2 - 3. [Full Text] [PDF] |
||||
![]() |
E. J. Thomas, S. R. Lipsitz, D. M. Studdert, and T. A. Brennan The Reliability of Medical Record Review for Estimating Adverse Event Rates Ann Intern Med, June 4, 2002; 136(11): 812 - 816. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Hughes, P. Honig, J. Phillips, J. Woodcock, R. E. Anderson, C. J. McDonald, M. Weiner, and S. L. Hui How Many Deaths Are Due to Medical Errors? JAMA, November 1, 2000; 284(17): 2187 - 2187. [Full Text] [PDF] |
||||
![]() |
R. E. Anderson Billions for Defense: The Pervasive Nature of Defensive Medicine Arch Intern Med, November 8, 1999; 159(20): 2399 - 2402. [Full Text] [PDF] |
||||