Annals
Established in 1927 by the American College of Physicians
:
Advanced search
 
box Article
 arrow  Table of Contents                
space
 arrow  Abstract of this article Free
space
 arrow  Figures/Tables List
space
 arrow  Articles citing this article
space
box Services
 arrow  Send comment/rapid response letter
space
 arrow  Notify a friend about this article
space
 arrow  Alert me when this article is cited
space
 arrow  Add to Personal Archive
space
 arrow  Download to Citation Manager
space
 arrow  ACP Search                        
space
 arrow  Get Permissions
space
box Google Scholar
 arrow  Search for Related Content
space
box PubMed
Articles in PubMed by Author:
  arrow  Pouchot, J.
space
  arrow  Vinceneux, P.
space
 arrow  Related Articles in PubMed
space
 arrow  PubMed Citation
space
 arrow  PubMed
space

BRIEF COMMUNICATION

Reliability of Tuberculin Skin Test Measurement

right arrow Jacques Pouchot, MD; Anne Grasland, MD; Carole Collet, MD; Joel Coste, MD, PhD; John M. Esdaile, MD, MPH; and Philippe Vinceneux, MD

1 February 1997 | Volume 126 Issue 3 | Pages 210-214

Background: Important management decisions depend on results of the tuberculin skin test. However, the test is subject to several potential errors, and its reliability has not been adequately studied.

Objective: To ascertain the reliability of tuberculin skin testing.

Design: Cross-sectional study.

Setting: University hospital.

Participants: 96 persons who received a tuberculin skin test.

Measurements: Ballpoint-pen and palpation measures of induration.

Results: Global intra- and interobserver reliability coefficients of the ballpoint-pen technique were high. Five percent of the time, however, a second measurement by the same observer could be at least 2.7 mm less to 3.0 mm more than the first measurement and the measurement from the second observer could be at least 3.4 mm less to 3.7 mm more than the measurement from the first observer. This could lead to the reclassification of a positive test result as negative or vice versa. The area of imprecision was 38% less broad for the ballpoint-pen technique than for the palpation technique.

Conclusion: Reading of tuberculin skin tests may frequently result in misclassifications when measurements are close to the cutoff point that separates negative from positive results.


The tuberculin skin test has many potential sources of error and variability. Standardization of the tuberculin reagent and the meaning of the test results have been considered in some detail [1, 2], but little attention has been paid to the reading itself [3-10]. Measurement of the induration, however, is one of the most important potential sources of error. If the customary technique of palpation is used, the margins of the induration may be difficult to define. The alternative ballpoint-pen method, although advocated as more reliable than palpation [3], has not been discussed in official statements on tuberculosis [1, 2]. We investigated the reliability of the ballpoint-pen technique and compared this technique with the palpation method.


Methods
space
up arrowTop
dotMethods
down arrowResults
down arrowDiscussion
down arrowAuthor & Article Info
down arrowReferences

Patients and Procedures

Patients and health care personnel who were in an internal medicine department and needed a tuberculin skin test were invited to participate. Persons who had received bacille Calmette–Guérin vaccine were enrolled preferentially. Ninety-six persons who provided informed consent ultimately participated in the study.

Tuberculin Skin Tests and Measurement Methods

Ten units of tuberculin from Pasteur Merieux, Lyon, France (corresponding to the recommended 5 IU of purified protein derivative tuberculin), were injected intradermally on the volar surface of the forearm (Mantoux technique) [11]. Readings were done on the third day after the test was administered, and the diameter of induration was measured along the long axis of the forearm. Two experienced investigators each independently did three measurements. The first two measurements were taken with a blinded caliper using the ballpoint-pen technique [3]. With this technique, a medium-point ballpoint pen is used to draw a line starting 1 to 2 cm away from the skin reaction and moving toward its center. When the pen reaches the margin of the induration, an increased resistance to further movement is felt and the pen is lifted. The procedure is repeated on the opposite side of the skin reaction. The distance between the ends of the opposing lines at the margins of the induration is measured. In our study, the lines were erased and the measurement process was repeated. The lines were then erased again, and the third measurement was done by palpation [2]. To reproduce the usual conditions of testing, we used a flexible ruler.

The data were collected during eight sessions; 11 to 14 participants were tested per session. To reduce the chance that an observer would remember previous readings, three things were done. First, the results of measures that were obtained with the blinded caliper were recorded by a third investigator. Second, the first ballpoint-pen measure was done for all participants at each session, then the second ballpoint-pen measure, and then the palpation measure. Third, before the second and third readings, the third investigator verified that no minor landmarks persisted.

Statistical Analysis

To analyze the reliability of quantitative data, we used statistical methods that have been described elsewhere [12]. Intraclass correlation coefficients and their 95% CIs were computed using SAS soft-ware (SAS Institute, Cary, North Carolina) [13]. Induration diameters were used to classify skin reactions as positive or negative according to the 5-, 10-, and 15-mm cutoff points that have been recommended as indicating positivity in various situations [2]. Reliability was then assessed with {kappa} coefficients [14]. We also used a graphical analysis that focuses on the mean and the variation in the differences between repeated measurements [15]. Mean differences and the SD of the differences were calculated. An area of imprecision that was determined on the basis of the SD of the differences was placed around the arbitrarily chosen 10-mm cutoff value (10 mm ± 1.96 SD). If a first measurement fell within this area, particularly at or about the cutoff value, the likelihood that the second measurement would be sufficiently different to change the result of the tuberculin skin test from negative to positive (or vice versa) was high. Conversely, such reclassification would occur in only 5% of the cases that had values outside this area.


Results
space
up arrowTop
up arrowMethods
dotResults
down arrowDiscussion
down arrowAuthor & Article Info
down arrowReferences

Because of the study design, only 27 participants (28%) did not react to the tuberculin skin test.

Reliability of the Ballpoint-Pen Technique

Intraobserver Reliability

In persons who had no response to the tuberculin skin test, the intraobserver reliability was perfect (intraclass correlation coefficient = 1.0). Intraclass correlation coefficients were high for both observers and decreased only slightly after the nonresponders were excluded. The {kappa} coefficients also suggested good intraobserver reliability but were lower with the 10- and 15-mm cutoff values than with the 5-mm cutoff value (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Reliability Study of the Ballpoint-Pen and Palpation Methods of Induration Measurement for the Tuberculin Skin Test

 

The top panel of Figure 1 shows the difference between the two readings for each participant that were done by the first observer (range, –6.8 to +3.5 mm) plotted against the corresponding mean for each participant. The level of intraobserver reliability was evaluated by determining the 95% CI ( –2.68 to +2.96 mm) within which most of the differences were seen. This means that 5% of the time, the second measure of the test results done by using the ballpoint-pen method would be at least 2.7 mm less than or 3.0 mm more than the first one. This lack of reliability could lead to the reclassification of a negative tuberculin skin test result as positive or vice versa.



View larger version (18K):
[in this window]
[in a new window]
 
Figure 1. Top. Intraobserver reliability for the ballpoint-pen technique. Differences between the repeated measurements taken by observer 1 (M1Obs1 and M2Obs1) that were obtained using the ballpoint-pen technique are plotted against the mean of repeated measures. Middle. Interobserver reliability for the ballpoint-pen technique. Differences between the first measurement taken by each observer (M1Obs1 and M1Obs2) that were obtained using the ballpoint-pen technique are plotted against the mean of these first measures. Bottom. Interobserver reliability for the palpation technique. Differences between the measurement taken by each observer (Obs1 and Obs2) that were obtained using the palpation technique are plotted against the mean of these measurements. The horizontal lines in all panels indicate the mean difference (thick line) ± 2 SDs (thin lines). The closed circle with an asterisk in each panel corresponds to the superposition of the measures of 27 persons. Areas of imprecision (bars) that were derived from the SDs of the differences were defined for the 10-mm cutoff point for all comparisons. If a first measurement falls within this area, particularly at or about the cutoff value, a second measurement would probably be sufficiently different to lead to reclassification of a negative result of the tuberculin skin test as positive or vice versa. Conversely, such reclassification would occur in only 5% of the cases whose values lie outside this area. Open circles represent patients for whom the tuberculin skin test result was reclassified after the second measurement (all had a first measure that fell in the area of imprecision). Open circles with a number correspond to the superposition of two or four patients. The {square}s in the top and middle panels represent patients whose test results were reclassified after the second measurement even though their first measurement was outside the area of imprecision.

 

As shown in the top panel of Figure 1, an area of imprecision that straddles the cutoff value (7.2 to 12.8 mm for a 10-mm cutoff value) was generated using the SD of the differences. Test results for 8 of the 69 patients (12%) were reclassified. The first measurement for 30 of the 69 patients (43.5%) fell within this area of imprecision; 7 of those 30 patients (23.3%) were among the 8 patients whose test results were reclassified.

Interobserver Reliability

Agreement between observers, estimated by using the intraclass correlation and {kappa} coefficients, was high (Table 1). The first ballpoint-pen measures made by the two observers were used for these analyses.

Differences between first measures done by the two observers were between –5.1 and +7.3 mm (Figure 1, middle). The 95% CI of the differences was –3.39 to +3.69 mm; this means that 5% of the time, the result of a second tuberculin skin test measurement by another investigator would be at least 3.4 mm more than or 3.7 mm less than that of a first investigator. As in the top panel of Figure 1, an area of imprecision (6.5 to 13.5 mm) is shown in the middle panel of Figure 1; this area is slightly broader than that calculated for intraobserver reliability. Test results for 8 of the 69 patients (12%) were reclassified. The first measurement for 40 of the 69 patients (58%) fell within this area of imprecision; 7 of those 40 patients (17.5%) were among the 8 patients whose results were reclassified.

Reliability of the Palpation Technique

Except for the {kappa} coefficients at the 15-mm cutoff, assessment of agreement between observers showed that all reliability coefficients obtained with the palpation technique were slightly lower than those obtained with the ballpoint-pen method (Table 1).

The 95% CI of the differences between the measures of the two observers was –4.6 to +5.2 mm (Figure 1, bottom). This resulted in a much broader area of imprecision for the readings (5.1 to 14.9 mm). Test results were reclassified for 12 of the 69 patients (17.4%). The first measure of 43 of the 69 patients (62.3%) fell within this area of imprecision, and the 12 patients whose test results were reclassified were among those 43 (27.9%).

Agreement between Ballpoint-Pen and Palpation Methods

Although all the intraclass correlation coefficients were high, the {kappa} coefficients that were produced after persons with no response to the test were excluded suggested only moderate to good reliability (Table 1). The 95% CIs of the differences between the first ballpoint-pen and the palpation measures were –3.0 to +4.1 mm for readings taken by the first observer and –2.5 to +3.9 mm for readings taken by the second observer. The areas of imprecision for the measurements were from 6.4 to 13.6 mm for readings taken by the first observer and 6.8 to 13.2 mm for readings taken by the second observer. Reclassification occurred in 8 of 69 patients (12%) for both observers.


Discussion
space
up arrowTop
up arrowMethods
up arrowResults
dotDiscussion
down arrowAuthor & Article Info
down arrowReferences

In our study, the ballpoint-pen technique was reliable, as evaluated by global reliability coefficients. However, the graphical analysis provided a more meaningful representation of the level of variation. Intraobserver reliability may be the most important factor for such diagnostic tests as the tuberculin skin test, which are usually done by only one examiner for any given patient. Lack of reliability may lead to the frequent reclassification of results, particularly if readings are at or about the cutoff values. Reliability coefficients were slightly higher for the ballpoint-pen technique than for the palpation method. In addition, the 95% CI of the differences of the measures taken by the two observers was 38% broader for the palpation method than for the ballpoint-pen technique; this could result in more frequent misclassification.

Only one study [10] has addressed the interobserver reliability of the ballpoint-pen technique. That study relied on simple correlation coefficients to determine reliability. Reanalysis of the data from that study provided a {kappa} coefficient of 0.74 (using a cutoff point of 10 mm). Previous studies of the reliability of the palpation method [5, 6, 10] have also been restricted primarily to the assessment of interobserver agreement and have provided conflicting results. Recalculation from the data of one large survey of six studies on tuberculin skin testing [4] gave slightly wider CIs for the differences between readings by two observers (range, –3.1 to +3.3 mm to –4.2 to +5.0 mm) than between two readings from the same observer for the difference (range, –2.7 to +3.3 mm to –3.5 to +4.5 mm). Four studies [7-10] compared the ballpoint-pen and palpation methods and had discordant results. None of these studies used reliability coefficients, which are the generally recommended measures of agreement.

Recall bias would make our findings more important if it were present, because it would tend to improve repeatability. Because the tuberculin reaction could not be measured by a blinded technique, we cannot exclude an unconscious bias that would have favored the ballpoint-pen method. In addition, knowledge of the standard cutoff values may have influenced the observers when they used the ruler. Finally, the generalizability of our study results is limited because only two observers did the measurements, both of whom have considerable experience reading tuberculin skin test indurations. Persons who occasionally read induration from tuberculin skin tests may not have as much experience as these two observers have.

Results of tuberculin skin tests influence the decision to initiate antituberculous therapy [16-18]. Although errors in classification cannot be completely avoided, they should be minimized by the implementation of appropriate rules for measurement. Our study suggests that current recommendations for tuberculin skin testing should be reconsidered. We propose that induration be measured with the ballpoint-pen technique. Clinicians should be more cautious when interpreting a tuberculin skin test result that is close to the cutoff value than they would be when interpreting one that is far removed. To improve intraobserver reliability, the final result could be the average of two consecutive measures, preferably taken with a blinded gauge. Even when done in the most reliable way possible, however, the tuberculin skin test remains an imperfect diagnostic tool and should not replace clinical judgment.

Dr. Coste: Departement de Biostatistiques, Hopital Cochin, 75674 Paris Cedex 14, France.

Dr. Esdaile: Mary Pack Arthritis Centre and Vancouver Hospital, University of British Columbia, 895 West 10th Avenue, Vancouver, British Columbia V5Z 1L7, Canada.


Author and Article Information
space
up arrowTop
up arrowMethods
up arrowResults
up arrowDiscussion
dotAuthor & Article Info
down arrowReferences

From Hopital Louis Mourier, Colombes, France; Hopital Cochin, Paris, France; and Vancouver Hospital, Vancouver, British Columbia, Canada.
Acknowledgments: The authors thank Christine Chandemerle for her help in collecting the data.
Requests for Reprints: Jacques Pouchot, MD, Service de Medecine Interne V, Hopital Louis Mourier, 178 rue des Renouillers, 92700 Colombes, France.
Current Author Addresses: Drs. Pouchot, Grasland, Collet, and Vinceneux: Service de Medecine Interne V, Hopital Louis Mourier, 178 rue des Renouillers, 92700 Colombes, France.


References
space
up arrowTop
up arrowMethods
up arrowResults
up arrowDiscussion
up arrowAuthor & Article Info
dotReferences

1. American Thoracic Society. The tuberculin skin test. Am Rev Respir Dis. 1981; 124:356-63.

2. American Thoracic Society. Diagnostic standards and classification of tuberculosis. Am Rev Respir Dis. 1990; 142:725-35.

3. Sokal JE. Measurement of delayed skin-test responses [Editorial]. N Engl J Med. 1975; 293:501-2.

4. Nissen Meyer S, Hougen A, Edwards P. Experimental error in the determination of tuberculin sensitivity. Public Health Rep. 1951; 66:561-9.[Medline]

5. Loudon RG, Lawson RA Jr, Brown J. Variation in tuberculin test reading. Am Rev Respir Dis. 1963; 87:852-61.

6. Bearman JE, Kleinman H, Glyer VV, Lacroix OM. A study of variability in tuberculin test reading. Am Rev Respir Dis. 1964; 90:913-9.

7. Jordan TJ, Sunderam G, Thomas L, Reichman LB. Tuberculin reaction size measurement by the pen method compared to traditional palpation. Chest. 1987; 92:234-6.

8. Howard TP, Solomon DA. Reading the tuberculin skin test. Who, when, and how? Arch Intern Med. 1988; 148:2457-9.

9. Bouros D, Zeros G, Panaretos C, Vassilatos C, Siafakas N. Palpation vs pen method for the measurement of skin tuberculin reaction (Mantoux test). Chest. 1991; 99:416-9.

10. Bouros D, Maltezakis G, Tzanakis N, Tzortzaki E, Siafakas N. The role of inexperience in measuring tuberculin skin reaction (Mantoux test) by the pen or palpation technique. Respir Med. 1992; 86:219-23.

11. Mantoux C. L'intradermo-reaction a la tuberculine et son interpretation clinique. Presse Med. 1910; 18:10-3.

12. Shrout PE, Fleiss JL Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979; 86:420-8.

13. SAS/STAT Users Guide, Version 6. Cary, NC: SAS Institute; 1990:893-6.

14. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960; 20:37-46.

15. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1:307-10.

16. "Guidelines on management of tuberculosis and HIV infection in the United Kingdom. Subcommittee of the Joint Tuberculosis Committee of the British Thoracic Society. BMJ. 1992; 304:1231-3.".

17. Bass JB Jr, Farer LS, Hopewell PC, O'Brien R, Jacobs RF, Ruben F, et al. Treatment of tuberculosis and tuberculosis infection in adults and children. American Thoracic Society and The Centers for Disease Control and Prevention. Am J Respir Crit Care Med. 1994; 149:1359-74.[Abstract]

18. De Cock KM, Grant A, Porter JD. Preventive therapy for tuberculosis in HIV-infected persons: international recommendations, research, and practice. Lancet. 1995; 345:833-6.[Medline]


This article has been cited by other articles:


Home page
Am. J. Respir. Crit. Care Med.Home page
J. D. Mancuso, S. K. Tobler, and L. W. Keep
Pseudoepidemics of Tuberculin Skin Test Conversions in the U.S. Army after Recent Deployments
Am. J. Respir. Crit. Care Med., June 1, 2008; 177(11): 1285 - 1289.
[Abstract] [Full Text] [PDF]


Home page
ANN INTERN MEDHome page
D. Menzies, M. Pai, and G. Comstock
Meta-analysis: New Tests for the Diagnosis of Latent Tuberculosis Infection: Areas of Uncertainty and Recommendations for Research
Ann Intern Med, March 6, 2007; 146(5): 340 - 354.
[Abstract] [Full Text] [PDF]


Home page
Int J EpidemiolHome page
Ma. C. Garcia-Sancho F, L. Garcia-Garcia, Ma. E. Jimenez-Corona, M. Palacios-Martinez, L. D Ferreyra-Reyes, S. Canizales-Quintero, B. Cano-Arellano, A. Ponce-de-Leon, J. Sifuentes-Osornio, P. Small, et al.
Is tuberculin skin testing useful to diagnose latent tuberculosis in BCG-vaccinated children?
Int. J. Epidemiol., December 1, 2006; 35(6): 1447 - 1454.
[Abstract] [Full Text] [PDF]


Home page
ChestHome page
R. W. England, J. S. Nugent, K. W. Grathwohl, L. Hagan, and J. M. Quinn
High-Dose Inhaled Fluticasone and Delayed Hypersensitivity Skin Testing
Chest, April 1, 2003; 123(4): 1014 - 1017.
[Abstract] [Full Text] [PDF]


Home page
Int J EpidemiolHome page
J. Coste and J. Pouchot
A grey zone for quantitative diagnostic and screening tests
Int. J. Epidemiol., April 1, 2003; 32(2): 304 - 313.
[Abstract] [Full Text] [PDF]


Home page
PediatricsHome page
M. P. Kurbasic and J. T. Badgett
Underreading of the Tuberculin Skin Test Reaction
Pediatrics, July 1, 2000; 106(1): 160a - 161.
[Full Text]


Home page
JAMAHome page
B. S. Slovis, J. D. Plitman, and D. W. Haas
The Case Against Anergy Testing as a Routine Adjunct to Tuberculin Skin Testing
JAMA, April 19, 2000; 283(15): 2003 - 2007.
[Abstract] [Full Text] [PDF]


Home page
ANN INTERN MEDHome page
D. F. Hoft and J. M. Tennant
Persistence and Boosting of Bacille Calmette-Guerin-Induced Delayed-Type Hypersensitivity
Ann Intern Med, July 6, 1999; 131(1): 32 - 36.
[Abstract] [Full Text] [PDF]


Home page
Anesth. Analg.Home page
P. F. White and M. F. Watcha
Has the Use of Meta-Analysis Enhanced Our Understanding of Therapies for Postoperative Nausea and Vomiting?
Anesth. Analg., June 1, 1999; 88(6): 1200 - 1200.
[Full Text] [PDF]


box Article
 arrow  Table of Contents                
space
 arrow  Abstract of this article Free
space
 arrow  Figures/Tables List
space
 arrow  Articles citing this article
space
box Services
 arrow  Send comment/rapid response letter
space
 arrow  Notify a friend about this article
space
 arrow  Alert me when this article is cited
space
 arrow  Add to Personal Archive
space
 arrow  Download to Citation Manager
space
 arrow  ACP Search                        
space
 arrow  Get Permissions
space
box Google Scholar
 arrow  Search for Related Content
space
box PubMed
Articles in PubMed by Author:
  arrow  Pouchot, J.
space
  arrow  Vinceneux, P.
space
 arrow  Related Articles in PubMed
space
 arrow  PubMed Citation
space
 arrow  PubMed
space


 Home | Current Issue | Past Issues | In the Clinic | ACP Journal Club | CME | Collections | Audio/Video | Mobile | Subscribe | Tools | Help | ACP Online