Polymerase Chain Reaction for the Diagnosis of HIV Infection in Adults
A Meta-Analysis with Recommendations for Clinical Practice and Study Design
- Douglas K. Owens, MD, MSc;
- Mark Holodniy, MD;
- Alan M. Garber, MD, PhD;
- John Scott, BA;
- Seema Sonnad, MS;
- Lincoln Moses, PhD;
- Bruce Kinosian, MD; and
- J. Sanford Schwartz, MD
- From Veterans Affairs Palo Alto Health Care System, Palo Alto, California; Stanford University, Stanford, California; and Department of Veterans Affairs Medical Center and University of Pennsylvania, Philadelphia, Pennsylvania. Acknowledgments: The authors thank Michael Newman for expert assistance with computer-based literature searches; Daniel Kent, MD, for assistance with the development of the quality scoring system; Andrea Sullivan for help with data analysis; and Lyn Dupre for helpful comments. Some of the methods used in this research are based on work sponsored by the John A. Hartford Foundation. Grant Support: In part by the Veterans Affairs Office of Research and Development, Health Services Research and Development Service (IIR #91-044.A); the Center for Health Care Evaluation (Health Services Research and Development Field Program, Veterans Affairs Health Care System, Palo Alto, California); and grant AI 27762-04 from the National Institutes of Health. Drs. Owens and Garber are supported by Veterans Affairs Health Services Research and Development Career Development Awards. Requests for Reprints: Douglas K. Owens, MD, MSc, Section of General Internal Medicine (111A), Veterans Affairs Palo Alto Health Care System, 3801 Miranda Avenue, Palo Alto, CA 94304. Current Author Addresses: Drs. Owens and Garber: Veterans Affairs Palo Alto Health Care System, 3801 Miranda Avenue (111A), Palo Alto, CA 94304.
Abstract
Purpose: To do a meta-analysis of studies that have evaluated the sensitivity and specificity of polymerase chain reaction (PCR) assay for the diagnosis of human immunodeficiency virus (HIV) infection in adults. Evaluating the performance of PCR is difficult because in certain clinical situations, the sensitivity or specificity of PCR may exceed those of the current reference standard tests [enzyme immunoassay followed by confirmatory Western blot analysis]. Therefore, an additional goal was to develop recommendations for 1) the design of future evaluative studies of PCR and 2) the use of PCR in persons with suspected HIV infection.
Data Sources: Studies published between 1988 and 1994 that were identified in a search of 17 computer databases, including MEDLINE, and abstracts identified from conference proceedings.
Study Selection: Studies were included if DNA amplification by PCR was done on peripheral blood mononuclear cells from adults. Ninety-six studies met the inclusion criteria.
Data Extraction: Data were extracted independently by two reviewers. Study design was assessed independently by two investigators blinded to study results.
Results: Reported sensitivities for PCR range from 10% to 100%, and specificities range from 40% to 100%. A summary receiver-operating characteristic curve based on all 96 studies has a maximum joint sensitivity and specificity [upper left point on the curve, where sensitivity equals specificity] of 97.0% to 98.1%. If the threshold value that defines a positive PCR result is chosen so that sensitivity is higher than 98.1%, specificity will decrease to less than 98.1%. Conversely, if the threshold value that defines a positive PCR result is chosen so that specificity is greater than 98.1%, sensitivity will decrease to less than 98.1%. If sensitivity and specificity are chosen to be equal, the corresponding false-positive rate is 1.9% to 3.0%. At the maximum joint sensitivity and specificity, the positive predictive value of PCR ranges from 34% to 85% as the prevalence of HIV increases from 1.0% to 10%. We identified seven areas in which study design could be modified to 1) reduce susceptibility to bias in estimates of the sensitivity and specificity of PCR and 2) to increase the generalizability of the study results. These modifications will also help to overcome methodologic problems created by the lack of a reference standard test.
Conclusions: The PCR assay is not sufficiently accurate to be used for the diagnosis of HIV infection without confirmation. Use of PCR for the diagnosis of HIV in adults should be limited to situations in which antibody tests are known to be insufficient. Future studies of PCR performance should be sufficiently large and should use adequate reference standard tests and standardized methods for the performance of PCR. Specimens should be evaluated by persons blinded to clinical status and to the results of other diagnostic tests for HIV infection.
Polymerase chain reaction (PCR) is a gene amplification technique that has found widespread use in medicine and molecular biology. The PCR assay was developed in 1985 [1, 2], and one of its earliest and most important clinical applications has been the diagnosis of human immunodeficiency virus (HIV) infection [3-9]. The PCR assay received attention as a diagnostic test for HIV infection in part because numerous reports suggested that months to years might elapse between infection with HIV and the development of HIV antibodies that could be detected by enzyme immunoassay and Western blot analysis [10, 11]. Because PCR directly amplifies proviral HIV DNA and does not depend on HIV antibody formation, it is a potentially attractive alternative to conventional antibody tests. However, the clinical role of PCR in the diagnosis of HIV infection remains uncertain because subsequent studies [12, 13] have not confirmed the occurrence of long “window” periods between infection and the development of antibodies.
Considerable controversy remains about the diagnostic accuracy of PCR. Some studies report that the test has perfect sensitivity and specificity, but others report high false-positive and false-negative rates. An understanding of the diagnostic performance of PCR for HIV infection is essential in determining the appropriate role of PCR in the clinical diagnosis of such infection. However, evaluation of the performance of PCR poses difficult methodologic challenges. To evaluate the sensitivity and specificity of PCR, investigators must ascertain whether study participants are infected with HIV. Typically, a new test is compared with a superior reference (or gold standard) test, but PCR is an example of a class of diagnostic technologies (including, for example, genetic screening tests) that have the potential to outperform and displace existing tests. At least in certain clinical circumstances, PCR may be more sensitive or more specific than the current reference tests (enzyme immunoassay followed by confirmatory Western blot analysis). The lack of an appropriate reference test substantially complicates evaluation. A successful approach to the evaluation of such technologies would be broadly useful.
We sought to 1) assess the validity and reliability of the scientific evidence on the diagnostic accuracy of PCR; 2) characterize the sensitivity and specificity of PCR on the basis of a formal analysis of the available studies; 3) develop recommendations for the clinical use of PCR in persons with suspected HIV infection; and 4) develop recommendations for the design of future studies of the diagnostic accuracy of PCR. In pursuing our third objective, we paid particular attention to whether PCR technology has improved enough to play a broader clinical role in the diagnosis of HIV infection. We did not evaluate the use of PCR for the quantification of viral load [14] or for the prediction or assessment of response to antiviral therapy [15].
We postulated that more recent studies, because they would reflect advances in PCR technology, would report higher sensitivities and specificities. We also expected that the most methodologically rigorous studies would report lower sensitivities and specificities than other studies and that studies published as full articles would report higher sensitivities and specificities than studies published only as abstracts because of publication bias (the results of which would be that studies reporting high sensitivity and specificity would be published more frequently than studies reporting poor test performance).
Methods
We did a meta-analysis of the published English-language literature to examine the relation between study population, study characteristics, technical aspects of the assay, and measured test performance. We used statistical techniques to fit a summary receiver-operating characteristic (ROC) curve that characterizes the results of multiple studies [16]. An ROC curve represents the tradeoff between sensitivity and specificity for a diagnostic test. It can be used to compare diagnostic tests by assessing the degree to which differences in test sensitivity and specificity result from the use of different cut-off points for abnormality rather than from actual differences in test performance [17]. Typically, an ROC curve is developed from a single study by varying the cut-off point for an abnormal test. In our study, we developed summary ROC curves on the basis of an analysis of multiple studies. Although the method for developing a summary ROC curve differs from the method for developing an ROC curve from a single study, the summary ROC curve also estimates the tradeoff between sensitivity and specificity for a diagnostic test.
Study Identification
An investigator and a professional librarian with extensive experience in medical literature searches independently developed search strategies to identify studies of PCR for the diagnosis of HIV infection that had been published through the middle of 1994 (Appendix). We also manually searched the bibliographies of retrieved articles and conference proceedings. We wrote to the authors of studies that were published only as abstracts and requested information about study design and updated data on PCR performance.
Study Selection
Two investigators independently examined all titles, abstracts, and full articles identified in the search. We included studies if 1) PCR was done on peripheral blood mononuclear cells; 2) DNA [as opposed to RNA] was amplified; 3) study participants were older than 16 years of age; 4) more than 10 participants were enrolled; and 5) primary data sufficient for the determination of both sensitivity and specificity were reported. We excluded studies with fewer than 10 participants because we believed such studies would provide unreliable estimates. We also excluded studies that determined only sensitivity or specificity, because calculation of each is needed to determine a point on the ROC curve. Disagreements were resolved by re-review and discussion.
Data Abstraction
Two investigators independently abstracted data from each study, including the characteristics and risk behaviors of the study sample; the technical details of the assay, including the use of heparin [18]; the reference test used (for example, Western blot analysis or viral culture); the criteria used to interpret results of both PCR and the reference test; and the data needed to calculate the sensitivity, specificity, false-positive rate, and false-negative rate of PCR. Disagreements were resolved by re-review and discussion.
Calculation of Sensitivity and Specificity for Polymerase Chain Reaction
We abstracted primary data on the performance of PCR into a 3 × 3 table in which all participants (or test results) were classified as PCR-positive, PCR-negative, or PCR-indeterminate and as reference test-positive, reference test-negative, or reference test-indeterminate. We used the authors' criteria for PCR-positive, -negative, and -indeterminate test results (for example, the number of primers that had to be detected for a PCR test result to be positive) when they were stated. In the few instances in which these criteria were not stated, we defined a positive test result (in terms of the number of primer pairs detected) to maximize both sensitivity and specificity, if possible, or to maximize sensitivity if doing so did not substantially decrease specificity. We examined whether differences in these criteria affected test performance. We calculated both upper and lower estimates of PCR sensitivity and specificity. We calculated the upper estimate by excluding results that were PCR indeterminate (thereby overestimating sensitivity and specificity), and we calculated the lower estimate by considering reference-test-positive, PCR-indeterminate results to be false-negative results and by considering reference test-negative, PCR-indeterminate results to be false-positive results (thereby underestimating sensitivity and specificity). For the lower estimate, we also considered PCR test results to be false-positive if, after repeated PCR and antibody tests, the results remained PCR-positive and antibody test-negative throughout the follow-up period [8, 19-23]. Excluding these few discordant samples did not produce a statistically significant change in our lower-bound estimate.
When possible, PCR performance was evaluated on the basis of the number of study participants rather than the number of tests conducted (some participants were tested more than once). This was done because repeated samples in the same individual person are not independent, and the use of multiple test results from an individual person may therefore spuriously inflate or deflate estimated sensitivity and specificity. Approximately 2% of the samples included in our analysis were repeated samples from individual persons that we could not exclude. Because we calculated sensitivity and specificity by using prospectively defined criteria for the patient's true disease state, the sensitivity and specificity we report for a study sometimes differ from those reported by the original authors. We calculated 95% CIs for individual study estimates of sensitivity and specificity (Figure 1) by using normal or Poisson approximations to the binomial distribution [24], as appropriate [25].
Assessment of Study Design
To assess the reliability of the evidence for the diagnostic accuracy of PCR for HIV infection, two investigators independently assessed the design of the studies by using prospectively developed criteria (Table 1). To develop these criteria, we modified a previously developed assessment framework for diagnostic tests [26-28]. Investigators were blinded to the study title, study results, study authors, the name of the journal in which the study results were published, and the name of the institution where the study was done. We assessed the appropriateness of the study design for the evaluation of the diagnostic performance of PCR on a four-point scale (1,2,3, or 4). A rating of 1 indicated that the design made the study susceptible to significant bias; a rating of 4 indicated that the study design satisfied all criteria for the evaluation of diagnostic tests (Table 1). Some studies had primary research questions that were not about the accuracy of PCR, but our assessments of the potential sources of bias in the study apply only to the evaluation of the diagnostic performance of PCR. We also identified studies for which the evaluation of PCR performance was either the sole objective or a major goal. We analyzed these studies separately to evaluate whether their design and methods differed from those of studies in which the evaluation of PCR performance was not a primary objective. We accepted positive results on conventional antibody tests (if they included a confirmatory Western blot analysis or similar test) or viral cultures as highquality evidence of infection. The absence of infection is more difficult to establish. Only studies that used serial testing or follow-up to establish the absence of HIV infection received the highest ratings for study design.
Development of Summary Receiver-Operating Characteristic Curves
The summary ROC curve characterizes the performance of a test as measured in multiple studies [16]. Our statistical approach [16, 29-31] for developing summary ROC curves is described in detail in the Appendix. We characterize the summary ROC curve by the point that we call the “maximum joint sensitivity and specificity.” This point is defined by the intersection of the ROC curve with a diagonal line that runs from the top left to the bottom right corner of the diagram, along which sensitivity and specificity are equal (specificity is equal to 1 minus the false-positive rate). This point is the maximum attainable common value for sensitivity and specificity for this test; a perfect test would have a joint sensitivity and specificity of 1.0. The maximum joint sensitivity and specificity provides a convenient point with which to compare two ROC curves (much like the area under the ROC curve). This point does not indicate the only, or even necessarily the best, combination of sensitivity and specificity for a particular clinical application. Rather, the ROC curve shows the tradeoff between sensitivity and specificity as the threshold for an abnormal PCR test result is changed. The developers of a test can choose a threshold for an abnormal result so that they balance test sensitivity and specificity and appropriately for particular clinical applications. For example, if the developers deem a false-negative result to be more harmful than a false-positive result (as they might for blood-bank screening), they could increase test sensitivity and thereby decrease the number of false-negative results. For each analysis, we report a summary ROC curve based on our upper estimates of sensitivity and specificity (indeterminate PCR results excluded) and a summary ROC curve based on our lower-bound estimates of PCR sensitivity and specificity (indeterminate PCR results counted as false-positive or as false-negative results).
Results
Studies Identified
Our literature search identified 5698 titles of potentially relevant articles. After independent review by two readers, 1735 titles were judged to be potentially relevant. We reviewed the associated abstracts and then selected 379 studies published as full articles for further review. Of these 379 articles, 96 met the inclusion criteria and were analyzed (1 article reported two independent studies that were analyzed individually) [3-57-911, 12, 14, 19-22, 32-113]. These studies included 5739 HIV-infected persons and 8929 uninfected persons. We excluded 26 of the 379 studies because they supplied data on either sensitivity or specificity but not both (references available from the authors). Other reasons for exclusion are noted in Table 2. Forty-five studies published only as abstracts met the inclusion criteria and were analyzed separately (references available from the authors).
Assessment of Study Design
The degree to which the 96 included studies satisfied each criterion for the design of an evaluation of a diagnostic test is shown in Figure 2. The information provided in studies whose results were published only in abstract form was insufficient for an assessment of study design. Because our criteria were rigorous, few published studies satisfied all of them. Identifiable aspects of the study design left many studies susceptible to potential bias (for example, lack of blinding during test interpretation) or produced imprecise estimates of the sensitivity and specificity of PCR (for example, small sample size). The numbers of studies receiving a rating of 1, 2, 3, or 4 for overall study design (see Methods and Table 1) were 73, 12, 6, and 5, respectively. Studies that focused solely or largely on the evaluation of PCR performance did not receive more favorable ratings than other studies. The criteria that were satisfied least often were adequacy of blinding during the interpretation of test results, adequacy of the reference test in uninfected study participants, and adequacy of sample size. In 57% of studies, there were fewer than 30 reference test-positive or reference test-negative participants; this resulted in wide 95% CIs on the estimates of sensitivity and specificity (Figure 1).
Most of the studies (74%) used acceptable reference tests in the HIV-infected participants. The clinical population of greatest interest for PCR testing, however, is that of persons at high risk for infection who have negative results on conventional antibody tests. Twenty-two of the 96 studies (23%) fully met the criterion for an adequate reference test in these persons. Thirty-six studies (38%) partially fulfilled this criterion, and 38 studies (40%) did not satisfy this criterion. Twenty-two studies (23%) fully met the reference test criteria for participants with and those without disease.
Sensitivity and Specificity of Polymerase Chain Reaction
Measured performance was extremely variable. When indeterminate PCR results were excluded, sensitivity ranged from 10% to 100% and specificity ranged from 40% to 100% (data available from the authors). In studies in which the design was rated as either 3 or 4, sensitivity ranged from 83% to 100%, and specificity ranged from 95% to 100%.
Summary Receiver-Operating Characteristic Curves
On the basis of all 96 studies, the upper estimate (indeterminate PCR results excluded) of the maximum joint sensitivity and specificity was 98.1%, and the lower estimate (indeterminate PCR results counted as false-positive or false-negative results) was 97.0% (Figure 3, Table 3). The corresponding log odds ratios (± SE) are 7.93 ± 0.330 and 6.96 ± 0.195, respectively (Table 3). The exclusion of four studies that reported sensitivity and specificity on the basis of the number of samples rather than the number of study participants did not significantly affect our results (P > 0.2). The exclusion of studies that used heparin to preserve blood samples provided slightly but not statistically significantly higher estimates of joint sensitivity and specificity (upper estimate, 98.5 [P > 0.2]; lower estimate, 97.1 [P > 0.2]). Figure 3 shows the tradeoff between sensitivity and specificity. For example, if a cut-off point for an abnormal PCR result is chosen so that the specificity of PCR is 99.0% (false-positive rate, 1.0%), the sensitivity decreases to approximately 91.0% to 96.0%.
Subgroup Analysis
Our subgroup analyses (Table 3) indicated that studies published only as abstracts reported lower values for sensitivity and specificity than did studies published as articles. The upper estimate of maximum joint sensitivity and specificity based on studies that received scores of 2, 3, or 4 did not differ significantly from the estimated performance based on studies that received a score of 1. However, the lower-bound estimate of joint sensitivity and specificity was significantly lower in studies with better study design scores (96.2 compared with 97.7 [P = 0.02]; see Table 3). In addition, rather than finding that the reported accuracy of PCR was greater in more recent studies, we found that studies published during or after 1991 gave lower estimates of the accuracy than did studies published before 1991. We analyzed studies reported in and after 1991 because we believed that PCR technology had matured by 1991. Finally, the upper estimate of sensitivity and specificity based on studies in which the primary purpose was to evaluate the accuracy of PCR did not differ from estimates based on other studies. Subgroups defined by reference test criteria and by study objective showed significant differences only as judged by lower-bound estimates of joint sensitivity and specificity (Table 3).
The criteria for determining when PCR gave a positive result varied among the studies. Two of the 23 studies in which study design was rated as 2, 3, or 4 considered a PCR test result to be positive if reactivity with any one primer pair was seen. Thirteen studies required reactivity with two primer pairs, 7 did not specify explicit criteria, and 1 used variable criteria depending on the PCR assay used. A summary ROC curve based on the 13 studies that required reactivity with two primer pairs yielded an upper estimate of the joint combined sensitivity and specificity of 98.0%.
Post-Test Probability
Post-test probabilities depend on the sensitivity and specificity of PCR. Figure 4 shows the post-test probability of disease after positive and negative PCR test results as calculated using Bayes theorem [17] if the threshold for an abnormal test result has been chosen so that the test has maximum joint sensitivity and specificity. For example, if the pretest probability of HIV infection is 10%, the post-test probability of disease after a positive PCR test result (positive predictive value) increases to between 78% (thin curve, Figure 4) and 85% (thick curve, Figure 4). At a pretest probability of 1.0% and a sensitivity and specificity of 98.1% (the upper estimate), the post-test probability of HIV infection after a positive PCR test result is only 34%.
Discussion
We sought to critically and systematically examine the many published studies that have reported on the use of PCR for the diagnosis of HIV infection in adults. If it is sufficiently accurate and inexpensive, PCR could supplant standard antibody tests for diagnosis and screening. Our investigation produced two main findings. First, the false-positive and false-negative rates of PCR that we determined are too high to warrant a broader role for PCR in either routine screening or in the confirmation of diagnosis of HIV infection. This conclusion is true even for the results reported from more recent, high-quality studies that used commercially available, standardized PCR assays. We did not address the emerging potential uses of PCR for use in quantification of viral load [14] or in the prediction or assessment of response to antiviral therapy [15], areas in which PCR may prove to have an important clinical role. Second, our evaluation of study design suggests several modifications of design that would substantially reduce susceptibility to bias.
We estimated the maximum joint sensitivity and specificity of PCR to range from 97.0% to 98.1%, with corresponding false-positive and false-negative rates between 1.9% and 3.0%. Our analysis of the post-test probability of disease (Figure 4) indicates that if we use the joint maximum sensitivity and specificity for PCR, the proportion of false-positive tests would be unacceptably high for screening or other common clinical applications. The post-test probability of disease will vary depending on the sensitivity and specificity, which in turn depend on the cut-off point used to define an abnormal test result. The summary ROC curve indicates how specificity will decrease (or increase) as sensitivity increases (or decreases).
To put the diagnostic performance of PCR in context, the conventional antibody test sequence of an enzyme immunoassay followed by confirmatory Western blot analysis has a sensitivity that exceeds 99% and a specificity greater than 99.5% (corresponding to a false-positive rate less than equals 0.5%) in high-quality screening programs [114-116]. Although the metaanalytic techniques that we used have not been applied to HIV antibody tests, a study including 1400 participating laboratories, done by the Centers for Disease Control and Prevention (CDC), found the sensitivity and specificity of the enzyme immunoassay to be 99.68% and 98.46%, respectively, in 1988 (6566 infected samples, 3051 negative samples) and 99.3% and 99.7%, respectively, in 1989 [115]. False-positive rates as low as 6 per million have been reported in blood-bank screening programs [116], although such low rates may not be attainable in all programs. The log odds ratio associated with the 1988 findings of the CDC study is 9.90 ± 0.26, which substantially exceeds the log odds ratio we found for PCR (7.93 ± 0.33); the sensitivity and specificity of the enzyme immunoassay in 1989 were even higher. The studies included in our analysis suggested that the sensitivity of the p24 antigen assay (in contrast to that of antibody tests) is inferior to that of other tests. For example, p24 antigen was detected in only 14% of HIV-infected hemophiliacs [3] and in only 8% to 32% of participants with PCR-positive, antibody test-positive test results [50, 63]. Although these studies suggest that PCR is superior to the p24 antigen assay, we cannot directly compare the sensitivities and specificities produced by the two assays, because the p24 antigen assays has not been evaluated formally with summary ROC curves.
Our subgroup analyses show that studies published only as abstracts provided lower estimates of the sensitivity and specificity of PCR. This may indicate publication bias (the preference for publishing favorable rather than unfavorable studies). Although publication bias is a concern in meta-analyses, few examples of it have been documented. Studies with more rigorous designs provided similar upper estimates of joint sensitivity and specificity but decreased lower estimates of joint sensitivity and specificity relative to other studies. Rigorous study design (for example, blinding) may prevent the inadvertent overestimation of test performance.
We did not find evidence that the performance of PCR improved over time. The problem of false-positive and false-negative PCR results for the diagnosis of HIV infection has led to efforts to develop quality assurance programs for the performance of PCR [90, 95, 100]. For example, laboratory personnel now take extensive precautions to prevent carryover contamination, which was an important cause of false-positive test results in early studies. A particularly rigorous program of quality assurance was instituted recently by the AIDS (acquired immunodeficiency syndrome) Clinical Trials Group investigators. Because training of laboratory personnel is probably an important component of laboratory performance, the study done by these investigators [95] used only experienced laboratories that met strict performance criteria. The sensitivity and specificity of PCR were found to be 97.4% and 94.8%, respectively, in an ongoing quality assurance program that used the latest generation of commercially available PCR kits [95] and standardized protocols for the performance of PCR. These results are consistent with the results of our analysis and, along with the findings of another multicenter quality assurance study [90], indicate that the problem of false-positive and false-negative results persists in currently available test programs, including those that use commercially available standardized PCR tests rather than assays developed in-house.
Recommendations for the Clinical Role of Polymerase Chain Reaction
Our analysis confirms that at present, PCR is not sufficiently accurate to be a reference or gold standard test. The frequency of false-positive and falsenegative results, even in more recent studies, precludes this. Clearly, the performance of PCR is not adequate to justify its use as a clinical screening test. The PCR assay will be most useful in settings in which conventional antibody tests are indeterminate or are likely to be inaccurate. Depending on the criteria used, 13% to 48% of Western blot analyses in low-risk persons who have repeated reactive enzyme immunoassay results may be indeterminate [48]. In these situations, PCR is a useful alternative test.
The PCR assay may also be useful in persons who have recently had a known or suspected exposure to HIV whose infection status must be determined urgently (for example, health care workers who have sustained a percutaneous exposure to HIV-infected blood). Although the PCR assay provides interim information that may be useful in selected cases, clinicians and health care workers should be aware that the false-positive rate of PCR probably exceeds that of conventional antibody tests. Therefore, the benefit of early detection should be weighed against the increased risk for a false-positive result. Conventional antibody tests and clinical follow-up can minimize the effect of false-negative or false-positive PCR test results. We conclude that for the diagnosis of HIV infection in adults, the role of PCR should continue to be limited to circumstances in which antibody tests are known to be insufficient or indeterminate.
Recommendations for Study Design
Our analysis highlights the importance of a crucial aspect of study design: the choice, use, and description of the index test (PCR) and reference tests. Whenever possible, studies of the performance of a diagnostic test should use reference tests that unequivocally establish the true state of disease or health. Because PCR can detect HIV infection before antibodies have developed, a positive PCR test result in a person with negative results on an HIV enzyme immunoassay could represent either a false-positive PCR result or a false-negative enzyme immunoassay result. Evaluation of PCR is challenging because no single diagnostic test can resolve this dilemma with certainty. For current studies of HIV infection, the discrepancy can be resolved by serially testing seronegative persons with enzyme immunoassay and Western blot analysis and doing clinical follow-up for a period long enough to exclude acute infection. If a person is truly infected with HIV, then eventually peripheral blood mononuclear cell culture or plasma culture should become positive, the enzyme immunoassay and Western blot analysis should become reactive, or clinical illness should ensue. Although some reports indicate that the period between infection and antibody production may last as long as 4 years, more than 95% of HIV-infected persons seroconvert within 9 to 12 months [117]. Studies of other diagnostic tests for HIV have successfully used serial testing and clinical follow-up to determine true infection status [118]. In high-risk populations, however, the value of long-term serial testing may be attenuated by incident infections. In many of the studies that we reviewed, longer follow-up would have enabled the investigators to convincingly establish the disease status of antibody test-negative participants.
Once the procedure for determining the infection status has been chosen, it should be applied consistently to all the study participants, regardless of their PCR test results. A particular PCR test result should not be used to decide which persons are given the reference test, because such a selection procedure can create “referral bias.” Referral bias spuriously reduces the number of true-negative and false-negative PCR test results in the study population and thereby overestimates sensitivity and underestimates specificity [17]. Investigators can further avoid potential bias in the interpretation of test results by doing the PCR assays and the reference tests while blinded to the results of other tests for HIV and to all clinical information. Investigators were blinded to previous test results in only 40% of the 96 studies that we evaluated. In addition, interpretation of studies can be enhanced if both the PCR assays and the reference tests are described in sufficient detail to allow another investigator to reproduce the test procedures. Descriptions should address how the tests were done and how the results were interpreted.
Many of the studies we analyzed had design limitations that are commonly found in studies of other types of diagnostic tests: incomplete representation of the spectrum of patients in the study population, insufficient sample size, and incomplete reporting of test results. To increase the generalizability of study results, the study sample should reflect the entire spectrum of disease encountered in the clinical population of interest [119]. For example, the nondiseased population should include persons who are at risk for HIV infection and would be candidates for testing rather than healthy controls. The usefulness of the study will be enhanced if the study sample is described in enough detail to 1) enable readers to determine whether the sample is sufficiently similar to their clinical setting to permit application of the study findings and 2) allow another investigator to assemble a cohort similar to the sample to confirm the study findings [120]. Investigators can reduce uncertainty to acceptable levels in the estimates of sensitivity and specificity by increasing the sample size. As shown in Figure 1, the 95% CIs for sensitivity and specificity are broad if the sample size is small. Recommendations for determining appropriate sample sizes have been published [121].
Finally, studies of test performance can be improved if investigators report the sensitivity and specificity of a test for various definitions of test reactivity [114]. Because both sensitivity and specificity are determined by the choice of the threshold for an abnormal test result, there is an inherent tradeoff between them. The threshold for a reactive test can be chosen so that PCR is 100% sensitive or 100% specific, but usually not both (unless the test is perfect and the diseased and nondiseased populations have no overlap for the attribute being measured). Thus, a study that evaluates only the sensitivity of PCR (that is, that includes only diseased persons) or the specificity of PCR (that is, that includes only nondiseased persons) provides insufficient information for an evaluation of test performance. Investigators can develop an ROC curve by calculating sensitivity and specificity for varying definitions of test reactivity [122]. The ROC curve represents the performance of a test much more thoroughly than do single values of sensitivity and specificity, in which differences in test performance may merely indicate that different criteria for test positivity were used. Such reporting also facilitates the development of summary ROC curves, such as those used in our meta-analysis and used by others in the analysis of other diagnostic tests [123, 124].
Technical advances will probably improve the performance of the PCR assay. As the sensitivity and specificity of PCR for the diagnosis of HIV improve, the clinical role of PCR may change. Such a change should occur only after a rigorous evaluation of test performance that incorporates the recommendations for study design discussed above. Currently, interpretation of PCR test results for the diagnosis of HIV infection should be combined with careful consideration of the clinical circumstances and with the use of confirmatory tests and clinical follow-up whenever possible.
Appendix
In this Appendix, we describe the methods we used to search the literature and develop summary ROC curves.
Literature Search
Two literature searches were done by a professional research librarian to identify pertinent published data. For articles published in or before 1991, 17 databases were searched: MEDLINE, AIDSline, Cancerlit, Embase, Federal Research in Progress, Compendex, Scisearch, Inspec, Conference Papers, Diogenes, Chemical Abstracts, Biosis, Life Sciences Collection, Biobusiness, Pharmaceutical News Index, National Technical Information Service, and International Pharmaceutical Abstracts. For articles published in 1992 through 1994, we limited our computer-based search to MEDLINE because we found other databases to be redundant. In the initial search, we used the following strategy.
1. S1 Acquired (W) Immunodeficiency OR Acquired (W) Immune (W) Deficiency OR AIDS
2. S2 HIV OR HIV1 OR HIV2 OR HIV-1 OR HIV-2
3. S3 Human (W) (Immunodeficiency OR Immune [W] Deficiency) (W) (Virus OR Viruses)
4. S4 HTLV3 OR HTLVIII OR HTLV (5W) (3 OR III)
5. S5 Human (W) T (W) Cell (W) (Leukaemia OR Leukemia) (W) (Virus OR Viruses) (5W) (3 OR III)
6. S6 LAV OR Lymphadenopathy (W) Associated (W) (Virus OR Viruses)
7. S7 ARC
8. S8 PCR OR Polymerase (W) Chain (W) Reaction
9. S9 PCR OR Polymerase (W) Chain
10. S10 Amplif? (3N) (Gene OR Genes OR Genetic OR DNA OR Deoxyribonucleic)
11. S11 Sequence (W) Tagged (W) Site?
12. S12 (S1 OR S2 OR S3 OR S4 OR S5 OR S6 OR S7) AND (S9 OR S10 OR S11)
13. S13 Remove Duplicates S12
This search was updated with a slightly different strategy.
1. S1 Acquired (W) Immunodeficien? OR Acquired (W) Immune (W) Deficien? OR AIDS
2. S2 HIV OR Human (W) Immunodeficien? (W) Virus? OR Human (W) Immune (W) Deficien? (w) Virus? OR HIV-1 OR HIV-2
3. S3 DC equals D24.611.216.327.570.470.?
4. S4 ARC
5. S5 Polymerase (W) Chain OR PCR
6. S6 (Gene OR Genetic OR DNA OR Sequence? OR Deoxyribonucleic OR Nucleic OR Nucleotide? OR Genome?) (5N) Amplif?
7. S7 Amplicon OR Amplicons
8. S8 Sequence (W) Tagged (W) Site?
9. S9 (S1 OR S2 OR S3 OR S4) AND (S5 OR S6 OR S7 OR S8)
10. Limit S9 to Updates since the earlier search
11. Eliminate Duplicates
When the Chemical Abstracts database was searched, the following strategy was used.
1. L1 Acquired (W) Immunodeficien? OR Acquired (W) Immune (W) Deficien?
2. L2 AIDS OR HIV OR Human (W) Immunodeficien? (W) Virus?
3. L3 Human (W) Immune (W) Deficien? (W) Virus?
4. L4 HIV-1 OR HIV-2 OR ARC
5. L5 Polymerase (W) Chain OR PCR
6. L6 (Gene OR Genetic OR DNA OR Sequence? OR Deoxyribonucleic) (3A) Amplif?
7. L7 S Amplicon OR Amplicons
8. L8 Amplicon OR Amplicons OR Sequence (W) Tagged (W) Site?
9. L9 (L1 OR L2 OR L3 OR L4) AND (L5 OR L6 OR L8)
Summary Receiver-Operating Characteristic Curves
We used two approaches for estimating summary ROC curves. The first method, described previously [16], uses a logistic transformation of sensitivity and specificity so that a summary ROC curve can be fitted with linear regression. To do the logistic transformation, we added a correction factor of 0.5 when the data for a study included zero values (which occurred when either the number of false-positive tests or the number of falsenegative tests was zero). The ROC curve was then determined by back transformation of the fitted linear regression line. The method also provides a statistical test to evaluate whether the ROC curve is symmetrical. If the summary ROC curve is symmetrical, a common log odds ratio uniquely determines the entire ROC curve. The test of symmetry is to determine whether the slope of the fitted regression line differs significantly from zero. Regression lines with a slope near zero can be represented by a common log odds ratio; if the slope differs from zero, the odds ratio changes for different points on the ROC curve. Our analysis indicated that the slope for both our upper estimate (slope [±SE] = −0.156 ± 0.118 [95% CI, −0.369 to 0.078]; P = 0.10) and our lower estimate (slope = −0.174 ± 0.114 [CI, −0.40 to 0.05]; P = 0.13) of the summary ROC curves did not differ significantly from zero. We therefore felt justified in estimating a common odds ratio, and we used the Mantel-Haenszel estimator [29].
We also chose the Mantel-Haenszel method because the alternative method uses a logistic transformation that requires a correction factor for zero values. The correction factor can introduce bias in the estimation of summary ROC curves for highly accurate tests such as PCR. We calculated the SE of the estimated log odds ratio using both the method of Robins and coworkers [31] and the jackknife and bootstrap methods [30]. Reported comparison statistics are based on the SE as calculated by using the method of Robins and coworkers because this method produced the most conservative estimates of statistical significance (that is, the largest SEs). To determine whether the sensitivity and specificity of PCR differed among certain subgroups, we compared the Mantel-Haenszel estimated common log odds ratio for each group in terms of their SEs. We compared both the upper and lower estimate of sensitivity and specificity in the subgroups.
Dr. Holodniy: Veterans Affairs Palo Alto Health Care System, 3801 Miranda Avenue (111-ID), Palo Alto, CA 94304.
Mr. Scott: 1 Cloister Court, Apartment 205, Bethesda, MD 20814-1460.
Ms. Sonnad: University of Michigan, SPHII, Department of Health Management and Policy, 109 Observatory Road, Ann Arbor, MI 48109-2029.
Dr. Moses: Department of Health Research and Policy, Stanford University, Redwood Building, Room T-160, Stanford, CA 94305-5092.
Dr. Kinosian: Veterans Affairs Medical Center, Hospital Based Home Care Program (111F), University and Woodland Avenues, Philadelphia, PA 19104.
Dr. Schwartz: Leonard Davis Institute of Health Economics, 3641 Locust Walk, Room 209, Philadelphia, PA 19104.
- Copyright ©2004 by the American College of Physicians
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
RSS Feeds













