Home |
Current Issue |
Past Issues |
In the Clinic |
ACP Journal Club |
CME |
Collections |
Audio/Video |
Mobile |
Subscribe |
Tools |
Help |
ACP Online
|
1 February 1997 | Volume 126 Issue 3 | Pages 210-214
Background: Important management decisions depend on results of the tuberculin skin test. However, the test is subject to several potential errors, and its reliability has not been adequately studied.
Objective: To ascertain the reliability of tuberculin skin testing.
Design: Cross-sectional study.
Setting: University hospital.
Participants: 96 persons who received a tuberculin skin test.
Measurements: Ballpoint-pen and palpation measures of induration.
Results: Global intra- and interobserver reliability coefficients of the ballpoint-pen technique were high. Five percent of the time, however, a second measurement by the same observer could be at least 2.7 mm less to 3.0 mm more than the first measurement and the measurement from the second observer could be at least 3.4 mm less to 3.7 mm more than the measurement from the first observer. This could lead to the reclassification of a positive test result as negative or vice versa. The area of imprecision was 38% less broad for the ballpoint-pen technique than for the palpation technique.
Conclusion: Reading of tuberculin skin tests may frequently result in misclassifications when measurements are close to the cutoff point that separates negative from positive results.
Patients and health care personnel who were in an internal medicine department and needed a tuberculin skin test were invited to participate. Persons who had received bacille CalmetteGuérin vaccine were enrolled preferentially. Ninety-six persons who provided informed consent ultimately participated in the study.
Tuberculin Skin Tests and Measurement Methods
Ten units of tuberculin from Pasteur Merieux, Lyon, France (corresponding to the recommended 5 IU of purified protein derivative tuberculin), were injected intradermally on the volar surface of the forearm (Mantoux technique) [11]. Readings were done on the third day after the test was administered, and the diameter of induration was measured along the long axis of the forearm. Two experienced investigators each independently did three measurements. The first two measurements were taken with a blinded caliper using the ballpoint-pen technique [3]. With this technique, a medium-point ballpoint pen is used to draw a line starting 1 to 2 cm away from the skin reaction and moving toward its center. When the pen reaches the margin of the induration, an increased resistance to further movement is felt and the pen is lifted. The procedure is repeated on the opposite side of the skin reaction. The distance between the ends of the opposing lines at the margins of the induration is measured. In our study, the lines were erased and the measurement process was repeated. The lines were then erased again, and the third measurement was done by palpation [2]. To reproduce the usual conditions of testing, we used a flexible ruler.
The data were collected during eight sessions; 11 to 14 participants were tested per session. To reduce the chance that an observer would remember previous readings, three things were done. First, the results of measures that were obtained with the blinded caliper were recorded by a third investigator. Second, the first ballpoint-pen measure was done for all participants at each session, then the second ballpoint-pen measure, and then the palpation measure. Third, before the second and third readings, the third investigator verified that no minor landmarks persisted.
Statistical Analysis
To analyze the reliability of quantitative data, we used statistical methods that have been described elsewhere [12]. Intraclass correlation coefficients and their 95% CIs were computed using SAS soft-ware (SAS Institute, Cary, North Carolina) [13]. Induration diameters were used to classify skin reactions as positive or negative according to the 5-, 10-, and 15-mm cutoff points that have been recommended as indicating positivity in various situations [2]. Reliability was then assessed with
Reliability of the Ballpoint-Pen Technique
Intraobserver Reliability
In persons who had no response to the tuberculin skin test, the intraobserver reliability was perfect (intraclass correlation coefficient = 1.0). Intraclass correlation coefficients were high for both observers and decreased only slightly after the nonresponders were excluded. The BRIEF COMMUNICATION
Reliability of Tuberculin Skin Test Measurement
The tuberculin skin test has many potential sources of error and variability. Standardization of the tuberculin reagent and the meaning of the test results have been considered in some detail [1, 2], but little attention has been paid to the reading itself [3-10]. Measurement of the induration, however, is one of the most important potential sources of error. If the customary technique of palpation is used, the margins of the induration may be difficult to define. The alternative ballpoint-pen method, although advocated as more reliable than palpation [3], has not been discussed in official statements on tuberculosis [1, 2]. We investigated the reliability of the ballpoint-pen technique and compared this technique with the palpation method.
Methods
![]()
Top
Methods
Results
Discussion
Author & Article Info
References
Patients and Procedures
coefficients [14]. We also used a graphical analysis that focuses on the mean and the variation in the differences between repeated measurements [15]. Mean differences and the SD of the differences were calculated. An area of imprecision that was determined on the basis of the SD of the differences was placed around the arbitrarily chosen 10-mm cutoff value (10 mm ± 1.96 SD). If a first measurement fell within this area, particularly at or about the cutoff value, the likelihood that the second measurement would be sufficiently different to change the result of the tuberculin skin test from negative to positive (or vice versa) was high. Conversely, such reclassification would occur in only 5% of the cases that had values outside this area.
Results
![]()
Top
Methods
Results
Discussion
Author & Article Info
References
Because of the study design, only 27 participants (28%) did not react to the tuberculin skin test.
coefficients also suggested good intraobserver reliability but were lower with the 10- and 15-mm cutoff values than with the 5-mm cutoff value (Table 1).
|
The top panel of Figure 1 shows the difference between the two readings for each participant that were done by the first observer (range, 6.8 to +3.5 mm) plotted against the corresponding mean for each participant. The level of intraobserver reliability was evaluated by determining the 95% CI ( 2.68 to +2.96 mm) within which most of the differences were seen. This means that 5% of the time, the second measure of the test results done by using the ballpoint-pen method would be at least 2.7 mm less than or 3.0 mm more than the first one. This lack of reliability could lead to the reclassification of a negative tuberculin skin test result as positive or vice versa.
|
As shown in the top panel of Figure 1, an area of imprecision that straddles the cutoff value (7.2 to 12.8 mm for a 10-mm cutoff value) was generated using the SD of the differences. Test results for 8 of the 69 patients (12%) were reclassified. The first measurement for 30 of the 69 patients (43.5%) fell within this area of imprecision; 7 of those 30 patients (23.3%) were among the 8 patients whose test results were reclassified.
Interobserver Reliability
Agreement between observers, estimated by using the intraclass correlation and
coefficients, was high (Table 1). The first ballpoint-pen measures made by the two observers were used for these analyses.
Differences between first measures done by the two observers were between 5.1 and +7.3 mm (Figure 1, middle). The 95% CI of the differences was 3.39 to +3.69 mm; this means that 5% of the time, the result of a second tuberculin skin test measurement by another investigator would be at least 3.4 mm more than or 3.7 mm less than that of a first investigator. As in the top panel of Figure 1, an area of imprecision (6.5 to 13.5 mm) is shown in the middle panel of Figure 1; this area is slightly broader than that calculated for intraobserver reliability. Test results for 8 of the 69 patients (12%) were reclassified. The first measurement for 40 of the 69 patients (58%) fell within this area of imprecision; 7 of those 40 patients (17.5%) were among the 8 patients whose results were reclassified.
Reliability of the Palpation Technique
Except for the
coefficients at the 15-mm cutoff, assessment of agreement between observers showed that all reliability coefficients obtained with the palpation technique were slightly lower than those obtained with the ballpoint-pen method (Table 1).
The 95% CI of the differences between the measures of the two observers was 4.6 to +5.2 mm (Figure 1, bottom). This resulted in a much broader area of imprecision for the readings (5.1 to 14.9 mm). Test results were reclassified for 12 of the 69 patients (17.4%). The first measure of 43 of the 69 patients (62.3%) fell within this area of imprecision, and the 12 patients whose test results were reclassified were among those 43 (27.9%).
Agreement between Ballpoint-Pen and Palpation Methods
Although all the intraclass correlation coefficients were high, the
coefficients that were produced after persons with no response to the test were excluded suggested only moderate to good reliability (Table 1). The 95% CIs of the differences between the first ballpoint-pen and the palpation measures were 3.0 to +4.1 mm for readings taken by the first observer and 2.5 to +3.9 mm for readings taken by the second observer. The areas of imprecision for the measurements were from 6.4 to 13.6 mm for readings taken by the first observer and 6.8 to 13.2 mm for readings taken by the second observer. Reclassification occurred in 8 of 69 patients (12%) for both observers.
Discussion
|
|---|
|
|
|---|
Only one study [10] has addressed the interobserver reliability of the ballpoint-pen technique. That study relied on simple correlation coefficients to determine reliability. Reanalysis of the data from that study provided a
coefficient of 0.74 (using a cutoff point of 10 mm). Previous studies of the reliability of the palpation method [5, 6, 10] have also been restricted primarily to the assessment of interobserver agreement and have provided conflicting results. Recalculation from the data of one large survey of six studies on tuberculin skin testing [4] gave slightly wider CIs for the differences between readings by two observers (range, 3.1 to +3.3 mm to 4.2 to +5.0 mm) than between two readings from the same observer for the difference (range, 2.7 to +3.3 mm to 3.5 to +4.5 mm). Four studies [7-10] compared the ballpoint-pen and palpation methods and had discordant results. None of these studies used reliability coefficients, which are the generally recommended measures of agreement.
Recall bias would make our findings more important if it were present, because it would tend to improve repeatability. Because the tuberculin reaction could not be measured by a blinded technique, we cannot exclude an unconscious bias that would have favored the ballpoint-pen method. In addition, knowledge of the standard cutoff values may have influenced the observers when they used the ruler. Finally, the generalizability of our study results is limited because only two observers did the measurements, both of whom have considerable experience reading tuberculin skin test indurations. Persons who occasionally read induration from tuberculin skin tests may not have as much experience as these two observers have.
Results of tuberculin skin tests influence the decision to initiate antituberculous therapy [16-18]. Although errors in classification cannot be completely avoided, they should be minimized by the implementation of appropriate rules for measurement. Our study suggests that current recommendations for tuberculin skin testing should be reconsidered. We propose that induration be measured with the ballpoint-pen technique. Clinicians should be more cautious when interpreting a tuberculin skin test result that is close to the cutoff value than they would be when interpreting one that is far removed. To improve intraobserver reliability, the final result could be the average of two consecutive measures, preferably taken with a blinded gauge. Even when done in the most reliable way possible, however, the tuberculin skin test remains an imperfect diagnostic tool and should not replace clinical judgment.
Dr. Coste: Departement de Biostatistiques, Hopital Cochin, 75674 Paris Cedex 14, France.
Dr. Esdaile: Mary Pack Arthritis Centre and Vancouver Hospital, University of British Columbia, 895 West 10th Avenue, Vancouver, British Columbia V5Z 1L7, Canada.
Author and Article Information
|
|---|
|
|
|---|
References
|
|---|
|
|
|---|
1. American Thoracic Society. The tuberculin skin test. Am Rev Respir Dis. 1981; 124:356-63.
2. American Thoracic Society. Diagnostic standards and classification of tuberculosis. Am Rev Respir Dis. 1990; 142:725-35.
3. Sokal JE. Measurement of delayed skin-test responses [Editorial]. N Engl J Med. 1975; 293:501-2.
4. Nissen Meyer S, Hougen A, Edwards P. Experimental error in the determination of tuberculin sensitivity. Public Health Rep. 1951; 66:561-9.[Medline]
5. Loudon RG, Lawson RA Jr, Brown J. Variation in tuberculin test reading. Am Rev Respir Dis. 1963; 87:852-61.
6. Bearman JE, Kleinman H, Glyer VV, Lacroix OM. A study of variability in tuberculin test reading. Am Rev Respir Dis. 1964; 90:913-9.
7. Jordan TJ, Sunderam G, Thomas L, Reichman LB. Tuberculin reaction size measurement by the pen method compared to traditional palpation. Chest. 1987; 92:234-6.
8. Howard TP, Solomon DA. Reading the tuberculin skin test. Who, when, and how? Arch Intern Med. 1988; 148:2457-9.
9. Bouros D, Zeros G, Panaretos C, Vassilatos C, Siafakas N. Palpation vs pen method for the measurement of skin tuberculin reaction (Mantoux test). Chest. 1991; 99:416-9.
10. Bouros D, Maltezakis G, Tzanakis N, Tzortzaki E, Siafakas N. The role of inexperience in measuring tuberculin skin reaction (Mantoux test) by the pen or palpation technique. Respir Med. 1992; 86:219-23.
11. Mantoux C. L'intradermo-reaction a la tuberculine et son interpretation clinique. Presse Med. 1910; 18:10-3.
12. Shrout PE, Fleiss JL Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979; 86:420-8.
13. SAS/STAT Users Guide, Version 6. Cary, NC: SAS Institute; 1990:893-6.
14. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960; 20:37-46.
15. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1:307-10.
16. "Guidelines on management of tuberculosis and HIV infection in the United Kingdom. Subcommittee of the Joint Tuberculosis Committee of the British Thoracic Society. BMJ. 1992; 304:1231-3.".
17. Bass JB Jr, Farer LS, Hopewell PC, O'Brien R, Jacobs RF, Ruben F, et al. Treatment of tuberculosis and tuberculosis infection in adults and children. American Thoracic Society and The Centers for Disease Control and Prevention. Am J Respir Crit Care Med. 1994; 149:1359-74.[Abstract]
18. De Cock KM, Grant A, Porter JD. Preventive therapy for tuberculosis in HIV-infected persons: international recommendations, research, and practice. Lancet. 1995; 345:833-6.[Medline]
This article has been cited by other articles:
![]() |
J. D. Mancuso, S. K. Tobler, and L. W. Keep Pseudoepidemics of Tuberculin Skin Test Conversions in the U.S. Army after Recent Deployments Am. J. Respir. Crit. Care Med., June 1, 2008; 177(11): 1285 - 1289. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Menzies, M. Pai, and G. Comstock Meta-analysis: New Tests for the Diagnosis of Latent Tuberculosis Infection: Areas of Uncertainty and Recommendations for Research Ann Intern Med, March 6, 2007; 146(5): 340 - 354. [Abstract] [Full Text] [PDF] |
||||
![]() |
Ma. C. Garcia-Sancho F, L. Garcia-Garcia, Ma. E. Jimenez-Corona, M. Palacios-Martinez, L. D Ferreyra-Reyes, S. Canizales-Quintero, B. Cano-Arellano, A. Ponce-de-Leon, J. Sifuentes-Osornio, P. Small, et al. Is tuberculin skin testing useful to diagnose latent tuberculosis in BCG-vaccinated children? Int. J. Epidemiol., December 1, 2006; 35(6): 1447 - 1454. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. W. England, J. S. Nugent, K. W. Grathwohl, L. Hagan, and J. M. Quinn High-Dose Inhaled Fluticasone and Delayed Hypersensitivity Skin Testing Chest, April 1, 2003; 123(4): 1014 - 1017. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Coste and J. Pouchot A grey zone for quantitative diagnostic and screening tests Int. J. Epidemiol., April 1, 2003; 32(2): 304 - 313. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. P. Kurbasic and J. T. Badgett Underreading of the Tuberculin Skin Test Reaction Pediatrics, July 1, 2000; 106(1): 160a - 161. [Full Text] |
||||
![]() |
B. S. Slovis, J. D. Plitman, and D. W. Haas The Case Against Anergy Testing as a Routine Adjunct to Tuberculin Skin Testing JAMA, April 19, 2000; 283(15): 2003 - 2007. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. F. Hoft and J. M. Tennant Persistence and Boosting of Bacille Calmette-Guerin-Induced Delayed-Type Hypersensitivity Ann Intern Med, July 6, 1999; 131(1): 32 - 36. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. White and M. F. Watcha Has the Use of Meta-Analysis Enhanced Our Understanding of Therapies for Postoperative Nausea and Vomiting? Anesth. Analg., June 1, 1999; 88(6): 1200 - 1200. [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||