Effects of Computer-based Clinical Decision Support Systems on Clinician Performance and Patient Outcome: A Critical Appraisal of Research

  1. Mary E. Johnston;
  2. Karl B. Langton;
  3. R. Brian Haynes; and
  4. Alix Mathieu
  1. From McMaster University, Hamilton, Ontario, Canada; University of Cincinnati, Cincinnati, Ohio. Requests for Reprints: R. Brian Haynes, MD, PhD, McMaster University Health Sciences Centre, Room 3H7, 1200 Main Street West, Hamilton, Ontario L8N 3Z5, Canada. Acknowledgments: The authors thank C. Walker Dilks and T. Fleming for help with literature searching, J. Wyatt for his comments, and authors for responding to our requests for additional information. Grant Support: Supported by the Brian C. Decker Health Information Research Fund and the Medical Research Council of Canada. Dr. Haynes is supported by a National Health Scientist Award from the National Health Research and Development Programme of Health and Welfare Canada.

    Abstract

    Objective: To review the evidence from controlled trials of the effects of computer-based clinical decision support systems (CDSSs) on clinician performance and patient outcomes.

    Data Sources: The literature in the MEDLARS, EMBASE, SCISEARCH, and INSPEC databases was searched from 1974 to the present. Conference proceedings and reference lists of relevant articles were reviewed. Evaluators of CDSSs were asked to identify additional studies.

    Study Selection: 793 citations were examined, and 28 controlled trials that met predefined criteria were reviewed in detail.

    Data Extraction: Study quality was assessed, and data on setting, clinicians and patients, method of allocation, computer system, and outcomes were abstracted and verified using a structured form. Separate summaries were prepared for physician and patient outcomes. Within each of these categories, studies were classified further according to the primary purpose of the CDSS: drug dose determination, diagnosis, or quality assurance.

    Results: Three of 4 studies of computer-assisted dosing, 1 of 5 studies of computer-aided diagnosis, 4 of 6 studies of preventive care reminder systems, and 7 of 9 studies of computer-aided quality assurance for active medical care that assessed clinician performance showed improvements in clinician performance using a CDSS. Three of 10 studies that assessed patient outcomes reported significant improvements.

    Conclusions: Strong evidence suggests that some CDSSs can improve physician performance. Additional well-designed studies are needed to assess their effects and cost-effectiveness, especially on patient outcomes.

    The application of artificial intelligence and other computing and information science techniques to the field of health care has resulted in the development of computer-based clinical decision support systems (CDSSs), sometimes called, generically, “expert systems”. Although no consensus has been achieved on the definition of a CDSS, Wyatt and Spiegelhalter [1] have defined medical decision aids as “active knowledge systems which use two or more items of patient data to generate case-specific advice,” thus capturing the main attributes of these systems in a simple statement.

    Much has been written about the theoretical and technical aspects of CDSSs and their reliability, validity, and acceptability. Wyatt and Spiegelhalter [2] described a systematic approach to laboratory and field testing of CDSSs, suggesting that the final stages should include evaluation of effects on the health care process and on patient outcomes. This overview focuses on studies of the final and most clinically important stages of evaluation and examines controlled trials designed to measure the effects of CDSSs on clinician performance and patient outcomes.

    Methods

    Study Identification

    Previously published reviews of CDSSs were identified using a MEDLINE search from January 1983 through February 1992 and through a manual search of textbooks and conference proceedings in the areas of artificial intelligence and computer applications in medicine. Original studies were identified by an all-language MEDLINE search of articles published from 1974 (the year of publication of the first article to evaluate the effect of a CDSS on clinician performance [3]) to February 1992 (search terms available on request). Studies were also identified through an update of a previous review on computer-aided quality assurance [4]; through an EMBASE (Excerpta Medica) search for the same time period; through an INSPEC (International Information Service for the Physics and Engineering Communities) search; through review of citations in the articles from electronic searches and a search forward on three citations [5-7], one each from the areas of dose determination, diagnosis, and quality assurance, using SCISEARCH; through articles on related topics collected by the Health Information Research Unit of McMaster University, including a regularly updated bibliography of studies of continuing education [8]; and by scanning the Proceedings of the Symposium on Computer Applications in Medical Care, 1989 through 1991. After a set of relevant publications was selected for inclusion in the overview, a list of their titles was sent to corresponding authors and experts in medical informatics with a request for information about any additional published or unpublished studies.

    Study Selection

    We selected studies for review if 1) the population of interest was composed of clinicians [physicians, nurses, dentists, and so on] in practice or training; 2) the intervention was a computer-based CDSS evaluated in a clinical setting; 3) the outcomes assessed were clinician performance, a measure of the process of care, or patient outcomes, including any aspect of patient well-being; and 4) the type of evidence was limited to prospective studies with a contemporaneous control group where patient care with a CDSS was compared with patient care without one. Crossover studies were included.

    A CDSS was defined as follows: computer software using a knowledge base designed for use by a clinician involved in patient care as a direct aid to clinical decision making. Characteristics of an individual patient were matched to information in the knowledge base. Patient-specific information in the form of assessments (management options or probabilities) or recommendations were presented to the clinician.

    Lists of citations, with abstracts and indexing terms if available, were assessed by two of the authors, one with a background in expert system development and one with experience in critical appraisal of applied medical research. Each rater indicated whether the citation was potentially relevant (that is, appeared to meet the selection criteria, applied loosely at this stage), was clearly not relevant, or gave insufficient information to make a judgment. The latter usually lacked abstracts; the full texts of these articles were examined by one of the raters in the library and rated as potentially relevant or definitely irrelevant. A copy was made of the full article for citations judged to be potentially relevant by either rater and then rated independently by both raters according to the four criteria listed above. To be included in the overview, a study was required to meet all the selection criteria. A third rater independently performed the same detailed relevance rating if the original raters disagreed or were uncertain about the study's relevance to the overview. In these cases, a study was included if two of the three raters deemed it relevant.

    Study Evaluation

    Two reviewers independently rated the studies selected for the overview on each of five potential sources of bias (see the Appendix). Disagreements were handled in the same way as those encountered during study selection. Additional information about study design was solicited from authors if necessary. Options under each item were assigned a score of 2, 1, or 0 points. Scores were summed for the five items to achieve an overall score; 10 was the highest possible score. The first rating option under each item indicated that appropriate steps were taken to minimize bias in the study results; the second option indicated that although efforts were made to reduce bias, they may not have been adequate; and the third option indicated a study in which a definite potential for bias existed or in which insufficient information was given to assure the reader that no bias was present. Item three (unit of allocation) was a measure of potential contamination specific to evaluations in which interventions are applied to clinicians and end points are measured for patients. When individual patients are allocated to intervention and control groups, clinicians may treat patients with or without the aid of the CDSS. Because only a portion of the patients in a given study may have been treated with the aid of a CDSS, knowledge gained from the CDSS may be applied to control patients, leading to an underestimate of the system's effect. In some cases, the CDSS may be accessible during assessment of control patients and could be used in their care. Similar problems can occur when individual clinicians are the unit of allocation. The presence of a CDSS or of colleagues who are using a CDSS in the clinic may influence the treatment given by control clinicians. Item five, concerning follow-up rates, was not applicable for several studies of computer-aided quality assurance in which all patients in the practice were randomly assigned to be evaluated with the aid of a CDSS or to be control patients but in which data were generated only for those coming to the clinic for a visit during the study period. These studies were scored on the first four criteria and the total was prorated so that the maximum possible score was 10 points.

    Data Extraction

    Data concerning setting, clinicians and patients, interventions, and outcomes were extracted from each article by one investigator using a structured data collection form. Data extraction was verified by a second rater, and corrections were made where necessary. A letter was sent to the corresponding author for each study to request missing data and verification of the description of the study setting, subjects, CDSS, and the main outcome to be included in the overview.

    Analysis

    We began the overview expecting that studies would come from several areas of health care and would evaluate a number of CDSSs developed for different purposes. They would be assessed in several settings using many types of outcomes: continuous, discrete, long-term, and short-term, with some having more direct clinical relevance than others. A single measure of effect, in our judgment, would not be the best expression of our current knowledge of the effects of CDSSs on clinicians and patients, and a formal meta-analysis could be misleading for most applications. We decided, instead, to present summaries of the effect of each type of system for several clinician and patient outcomes separately so that system developers and consumers could focus on the type of system and outcome that would be relevant to their practice.

    An effect size was calculated for each study reporting a continuous outcome variable by dividing the difference of the means for the intervention and control groups by the pooled standard deviation. The effect size indicated the number of standard deviations by which the CDSS and control groups differed. For dichotomous outcome variables, odds ratios were calculated. An odds ratio whose 95% CI did not include 1 indicated a significant beneficial effect of the CDSS. An odds ratio less than 1 favored the CDSS when the outcome variable of interest was the number of adverse effects, and an odds ratio of greater than 1 favored the CDSS when the outcome variable measured health or clinician performance. Differences in the findings for studies of the same type of CDSS were assessed statistically for “heterogeneity” using the Homogeneity Q test [9] or the Breslow-Day test [10]. Two crossover studies were treated as parallel designs because the risk for carryover effects from one study period to the next was considered minimal.

    Results

    Retrieval of Previous Reviews and Original Studies

    We identified seven previous reviews that examined experimental evidence of the effects of CDSSs on the process or outcome of care [4, 11-16]. Only one [4] was a systematic overview, and this was restricted to the evidence available by 1986 concerning computer-aided quality assurance, which is only one of several types of CDSSs available.

    To find reports of original research, two raters evaluated 786 unique citations from all sources, with MEDLINE contributing 424 citations. The raters agreed on a rating of “potentially relevant,” “not relevant,” or “can't tell” for 96% of citations (κ = 0.81). Ten citations that did not provide enough information for the relevance rating were published in foreign-language journals and were not investigated further. Fifty-four citations, all published in English, were judged to be potentially relevant by at least one rater. Full reports of these citations were fully evaluated. Twenty-four of these studies failed the detailed relevance evaluation: 16 did not satisfy our definition of a CDSS, 2 measured neither physician performance nor patient outcome, and 6 did not fulfill our design criteria. Of the remaining 30 studies, 3 [17-19] reported different outcomes for the same group of patients and were counted as a single study. Thus, 28 studies were included in the overview.

    General Description of Eligible Studies

    Quality scores are summarized in the Appendix. Seven studies (25%) had scores of 8 or more points of a possible 10. Confidence in the results of the other 21 scores was undermined, to some extent, by lack of randomization, possible contamination, baseline differences between groups, failure to guard against bias in outcome determination, or inadequate follow-up. Allocation by patient in situations in which clinicians treated patients from both intervention and control groups was the most common design flaw and occurred in 57% of studies.

    Four studies tested a CDSS for the determination of drug dose Table 1[7, 20-22], 5 tested aids to diagnosis Table 2[23-27], and 19 aimed at enhancing the quality of preventive Table 3 or active Table 4 medical care [6, 17-19, 28-44]. Ten of the 19 studies evaluated patient outcomes [6, 7, 17-21, 23, 28, 30, 31, 39].

    Table 1. Studies of Computerized Dosing for Toxic Drugs*
    Table 2. Studies of Computer-aided Diagnosis*
    Table 3. Studies of Computerized Reminders for Preventive Care*
    Table 4. Studies of Active Medical Care*

    Most studies (86%) were conducted at teaching hospitals and affiliated clinics. Nineteen studies took place in the United States, 4 in the United Kingdom, and 5 in Canada. The CDSSs were used for the care of inpatients in 47% of studies and outpatients in the other 53%. Information from the CDSSs were used by nurse/physician teams in 9 studies, by physicians alone in 16 studies, and by nurses alone in 3 studies. In most studies, clinicians did not use computers themselves but were given printed reports generated by a computer after data entry by clinic or ward staff. For studies of computer-assisted dosing for toxic drugs, patients in the control groups were managed directly by clinicians without the aid of computers. For studies of computer-assisted diagnosis, data were collected for computer analysis for patients in the control groups, but computer output was withheld from the clinicians. For 15 of 19 studies of computerized feedback and reminders for preventive and active medical care, both groups received a summary from a computer-based information system, with the intervention group receiving patient-specific recommendations in addition to the summary. In the other 4 studies, clinicians in the CDSS group interacted with a computer program, whereas the control group had no computer available. Four systems were based on formal clinical decision-making rules. For example, the urologic nursing information systems evaluated by Petrucci and colleagues [39] used a knowledge base comprising rules for managing urinary incontinence that reflected the knowledge of nurse experts. Three CDSSs used a Bayesian approach. For example, White and colleagues [7] used a computer program that predicted individual patient prothrombin time responses from population data. Twenty CDSSs were not used interactively by clinicians but applied criteria to patient-specific information in computerized health records to produce recommendations. For example, several studies used computerized medical records to produce reminders to prompt clinic staff to make follow-up appointments for patients for screening and immunizations using a schedule determined by matching individual patient characteristics with guidelines for preventive care [36-40].

    Effects on Clinician Performance and Clinical Outcomes

    Drug Dose Determination

    Three of four studies of computerized aids for determining the dose for toxic drugs reported statistically significant improvements in achieving therapeutic levels [7, 20, 21]. Using Bayesian estimates to determine dosage, Gonzalez and colleagues [20] found significantly more serum theophylline concentrations in the desired therapeutic range 4 hours after the initiation of therapy, and White and colleagues [7] found that a therapeutic prothrombin time ratio was reached more quickly. A second study by White and colleagues [22] failed to show a benefit for patients receiving long-term anticoagulant therapy. Authors of a fourth paper [21] reported statistically significant results favoring the CDSS, but the data did not appear to support this finding. Significant heterogeneity (P = 0.0002) was noted among the results from the three studies that reported means and standard deviations, which is perhaps not surprising given the different clinical disorders, settings, and performance measures.

    Three studies [7, 20, 21], all with small sample sizes, evaluated the effects on patient outcomes and found no significant benefits or adverse effects compared with usual clinical practice. Rodman and colleagues [21] found no toxic responses to lidocaine in either the computer-assisted intervention group or the control group. Data on adverse reactions to aminophylline from the study by Gonzalez and colleagues [20] did not show a significant difference between CDSS and control groups, and no difference was found in the proportion of patients sent home from the emergency department with resolution of wheezing. White and colleagues' [7] count of bleeding complications from warfarin therapy also failed to show a difference, with no bleeding complications in the CDSS group and only three (8.3%) in the control group. However, patients in the CDSS group who received warfarin stayed in the hospital for an average of 13 days, compared with 20 days for the control patients (P = 0.01; effect size = 0.59).

    Diagnosis

    Findings were mainly negative for the effects of computerized decision aids for diagnosis. Two studies [24, 27] of patients with symptoms of possible acute ischemic heart disease failed to show an effect of CDSSs on diagnostic accuracy. (The study reported by Pozen and colleagues [24] was conducted in two parts: An alternate-month design was used at three hospitals and a time-series design was used at three others. Only the alternate-month study met the inclusion criteria for this overview.) Wellwood and colleagues' [25] evaluation of computer-aided diagnosis for abdominal pain failed to show a reduction in the number of laparotomies associated with negative findings. Wexler and colleagues [26] reported that a CDSS had no effect on the number of inappropriate laboratory studies ordered to make a diagnosis in patients admitted to a pediatric service without a clear diagnosis. In the only positive study, Chase and colleagues [23] found a large difference favoring a CDSS designed to identify high-risk patients and to refer them for respiratory assessment.

    Only one study of computer-assisted diagnosis examined a patient outcome. Chase and colleagues [23] found a reduction, after adjustment for site of surgery, in the rate of respiratory complications among high-risk surgical patients screened with a CDSS. (As shown in Table 2, however, the 95% CI for the unadjusted odds ratio of 0.48 included 1.)

    Quality of Preventive Care

    In contrast to the findings for diagnosis, four of six studies of CDSSs that were designed to enhance the quality of preventive care showed statistically significant effects on clinician performance. One of the successful interventions consisted of reminders from computerized medical records to provide an influenza vaccination [36]; a second study had similar findings for tetanus vaccination [40]. Reminders from computerized medical records to measure blood pressure resulted in improvements in one study [38] but not in another [28]. Tierney and colleagues [41] also found increased physician compliance with some, but not all, preventive care protocols in a general medicine clinic; better results were achieved with immediate reminders than with delayed feedback. Other investigations also found differences in effects within the same clinical setting, with computerized reminders successfully increasing compliance with recommended vaccinations [40], immunizations [36], and blood pressure screening [38], but not cervical cancer screening [37], despite the fact that the same reminder system and approach were used. Reflecting these differences in findings, significant heterogeneity was observed among results from the five studies [6, 17-19, 28, 36, 37] that provided data for odds ratios (P < 0.001), perhaps due to differences in quality, with quality scores ranging from 4 to 9 of 10 possible points. Only one of the studies of computerized reminders for preventive care assessed the effects on patient outcomes: Barnett and colleagues [28] found no statistically significant effect of reminders for follow-up blood pressure measurements on the lowering of blood pressures.

    Quality of Active Medical Care

    Findings were also generally positive for the effects of CDSSs in acute medical care. Seven of nine studies that assessed the effect of CDSSs on clinician performance in caring for active medical problems reported statistically significant effects on medical care processes. Studies by McDonald and colleagues [32-35] found that reminders significantly increased clinician response rates to clinical events for several common medical problems that were built into computerized medical records for patients treated at a general medical clinic. Tierney and colleagues [42] reported that per-visit charges for a set of diagnostic tests were reduced when the probability that the result would be abnormal (based on the patient's previous test findings) was displayed when a test was ordered again. Young [44] reported the average number of errors per patient made by a physician who ordered tests based on reminders from a computer-based clinical information system and by another physician who did not have access to the system. The investigators found statistically significant improvements for four of eight tests and indicated that the error rates for the other four tests may have been too low for improvement. Brownbridge and colleagues [29] studied a computerized protocol for management of hypertension in a complicated design and included several treatment periods at each clinic but did not include a statistical test for differences between the control and experimental sites during the treatment period, when a direct comparison between CDSS and control could be made. Adherence to the management protocol appeared to be higher when the computer was used. Two additional studies of computerized protocols for managing hypertension did not show a statistically significant effect [6, 30].

    Five studies of CDSSs in acute medical care examined patient outcomes [6, 19, 30, 31, 39]. Two of these studies [30, 31] reported sufficient data for calculating effect sizes or odds ratios. Neither of these two studies showed a statistically significant effect when recommendations were added to an information system. One study examined the effect of a CDSS on blood pressure control [30]. An odds ratio for the proportion of patients controlled (defined as diastolic blood pressure less than 95 mm Hg) was 0.70 (95% CI, 0.31 to 1.55).

    In a study by Petrucci and colleagues [39], the number of occurrences of urinary incontinence among nursing home patients decreased significantly when a CDSS that mimicked the knowledge of a nurse expert was used to develop a nursing care plan. Data were obtained from the author of a third study [33] that showed no difference in satisfaction with care between patients treated with and without a computerized nurse care planning system. The duration of hospital stay was actually 3.6 days greater for the CDSS group (P = 0.005; effect size = 0.59).

    Two additional studies examined patient outcomes, but the data reported did not allow direct calculation of odds ratios or effect sizes. McAlister and colleagues [6] found no increase in the proportion of patients with hypertension with diastolic blood pressures less than 90 mm Hg when computer-generated feedback on management of individual patients was provided to physicians. Rogers and colleagues [19] reported that patients treated in a clinic whose physicians were aided by a medical information system that provided recommendations perceived that they had better health status than did those whose physicians used a manual record. Patient outcomes for three conditions were analyzed for subgroups of patients in this study. No difference was noted between CDSS and control groups for blood pressure among 359 patients with hypertension. After 2 years, 46 obese patients in the CDSS group were less overweight than were 44 patients in the control group. Among 87 patients with renal disease, more normal test results were found among the CDSS group.

    Discussion

    Controlled trials of the effects of computerized CDSSs on clinician performance and patient outcomes available to early 1992 have assessed several types of systems; the studies have varied greatly in their design, methodologic rigor, and measures of effect, making it difficult to come to general conclusions about the utility of CDSSs in clinical practice. A few small studies showed that computer-assisted dose determination can help physicians achieve therapeutic drug levels, at least in the short term, but larger, confirmatory studies with more important clinical outcomes are needed. A small number of studies on computer-aided diagnosis were found, but only one of these reported evidence on effectiveness. Several sound studies showed that recommendations built into computerized medical record systems can improve clinician compliance with practice guidelines for preventive and active care. Overall, only 10 studies measured patient outcomes and only 3 reported statistically significant benefits favoring the use of a CDSS. The lack of positive findings may have been due to the small sample sizes used in some of the studies.

    As CDSSs mature, they offer increasingly exciting prospects for improving the effectiveness and efficiency of patient care. For all health care interventions, however, CDSSs have the potential for not only good but also for harm and waste. The literature on CDSSs is growing rapidly, but only a small proportion is devoted to evaluations of the effects of CDSSs used by clinicians in everyday practice. Assessment of most systems occurs primarily at earlier phases, such as measuring reliability, accuracy, and acceptability. It is appropriate for developers to evaluate their systems in a systematic way, progressing in steps from the laboratory to clinical application [2]. It could be wasteful to skip from these early steps to full clinical trials, which should be reserved for mature systems. Eventually, however, claims that CDSSs benefit patients should be judged by the same standards as is any such health claim; the accepted standard would be randomized controlled trials showing unequivocal benefits for important clinical outcomes. Unfortunately, rigorous evaluations of CDSSs are usually more difficult to conduct than evaluations of pharmaceuticals, for example, because blinding of providers is impossible, and clinical settings often preclude complete separation of the intervention and control groups. Studies of patient outcomes may have the added burden of requiring large numbers of participants and substantial budgets. Nevertheless, the studies we have reviewed show that current scientific methods are being applied to the testing of CDSSs and that some have enough effect on the process of care to warrant trials with important clinical outcomes. We look forward to such trials in due course.

    In the meantime, the lack of effect of some CDSSs on patient outcomes in the studies reviewed here may also reflect inappropriate study design or failure to measure outcomes that are responsive to the use of CDSSs. Alternatively, some CDSSs may aim to modify clinician behavior without necessarily having an effect on patient outcome. Although it could be argued that effects on clinician performance alone may be worthwhile in situations leading to greater efficiency, the most convincing tests of most CDSSs will be their effect on patient well-being.

    Appendix

    Original studies meeting criteria for study selection [see Methods] were rated according to five criteria for methodologic adequacy: 1) formation of study groups (score of 2 for random allocation, 1 for quasi-random allocation [alternate allocation, by date, and so on], and 0 for selected concurrent controls); 2) baseline differences between the CDSS and control groups that were potentially linked to outcome [score of 2 for no baseline differences or if statistical adjustments were made for differences, 1 for baseline differences without statistical adjustments, and 0 for no statement about baseline features for the groups being compared]; 3) unit of allocation [score of 2 for practice or clinic or hospital, 1 for randomization by physician, and 0 for randomization by patient]; 4) outcome measure (score of 2 for objective outcome [that is, not open to interpretation] or subjective outcome with outcome assessors blinded to study group, 1 for subjective outcome with outcome assessors not blinded to study group but explicit criteria for defining outcome were used, and 0 for subjective outcome with outcome assessors not blinded to study group and no mention of explicit criteria); and 5) follow-up (score of 2 for outcome reported for at least 90% of patients starting the study, 1 for outcome reported for less than 90% but more than 80% of patients starting the study, and 0 for outcome reported for less than 80% of patients starting the study). Scores for each study are given as follows: citation, score on each criterion (with “-” indicating that the criterion did not apply), and total (of possible score of 10, prorated if fewer than 5 criteria applied) Appendix Table 1.

    Appendix Table 1. Methodologic Scores*

    Abbreviation

    CDSS: Clinical decision support systems

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22.
    23. 23.
    24. 24.
    25. 25.
    26. 26.
    27. 27.
    28. 28.
    29. 29.
    30. 30.
    31. 31.
    32. 32.
    33. 33.
    34. 34.
    35. 35.
    36. 36.
    37. 37.
    38. 38.
    39. 39.
    40. 40.
    41. 41.
    42. 42.
    43. 43.
    44. 44.
    « Previous | Next Article »Table of Contents