Annals
Established in 1927 by the American College of Physicians
:
Advanced search
 

Podcast Transcript - March 4, 2008

[ Return to Podcast Home Page ]

Topic Time
New Tests For Diagnosing Tuberculosis 00:46
Interview with Dr. Dick Menzies, MD, MSc 05:17
Other Articles In This Week’s Issue 14:10
Excerpts From An Institute of Medicine Panel Discussion on The Future Of Comparative Effectiveness Studies: Comments By Dr. Mark McClellan, MD, PhD 16:04

Hello, and welcome to this weeks Annals of Internal Medicine Audio Summary for our March 4, 2008 issue. I’m Michael Berkwits, Deputy Editor at Annals.

We have another terrific issue for you this week, with articles on new blood tests for diagnosing tuberculosis; the use of breast density and gene expression profiling assays to predict breast cancer risk; and pharmacologic treatments for dementia. And this week’s summary has a special feature; you’ll hear from Dr. Mark McClellan, Director of the Engelberg Center for Health Care Reform at the Brookings Institution in Washington DC and former CMS Director and FDA Commissioner, who gave his impressions on the future of comparative effectiveness studies in a recent Institute of Medicine panel discussion.

But first, here’s an in-depth summary of this week’s featured article.

New Tests For Diagnosing Tuberculosis (Time: 00:46)

Our feature article this week is a comparison of the diagnostic accuracy of new tests for tuberculosis.

Interferon-gamma release assays, or IGRAs, provide a measure of interferon-gamma levels released by a patients’ T cells in response to tuberculosis antigens. There are 3 such tests now commercially available, and since the FDA approved the first one in 2005, the race has been on to characterize how, when, and in whom the tests might best be used. At the time, CDC guidelines noted that the test couldn’t distinguish active from latent tuberculosis infection, and each new variation of the test has brought with it the need to compare diagnostic accuracies all over again. Also, concerns remain about the tests’ performance in patients with anergy or compromised immune function.

In this week’s article, lead author Davinder Dosanjh and his colleagues from the Tuberculosis Immunology Group at Imperial College in London try to address some of these areas of uncertainty. They performed tuberculin skin testing and drew blood for two different interferon-gamma release assays, known as ELISpot or T-SPOT.TB tests, on 389 patients presenting with symptoms suggestive of tuberculosis at two hospitals in England over about a 3 year period.

They found that the interferon-gamma release assays had good sensitivity in the 194 patients with culture-confirmed and highly probable active disease -- 85% for the ELISPOT and 89% for the ELISPOT-PLUS tests, which differ by a single MTB antigen –and that sensitivity was near perfect, between 97 and 99%, when the blood tests were used in combination with the tuberculin skin test and when a positive test was defined as either one being positive. Specificity of the tests was less impressive, hovering around 70% for both assays.

Based on these findings, the authors conclude that the newer ELISPOT-PLUS assay appears to be more sensitive than the standard ELISPOT, and that the combination of the test with standard tuberculin skin testing can be used to exclude the diagnosis of tuberculosis in patient populations with a high prevalence of active TB. They acknowledge that the ELISPOT-PLUS assay is for now available only for research purposes, and that the tests’ specificity was modest because latent TB is also prevalent in the population, and the test can’t distinguish between latent and active disease. They also acknowledge that the high sensitivity of combination testing was dependent on the TB skin test and interferon-gamma release assay test results being frequently discordant; that is, when positive combination testing is defined as either of two tests being positive, and the tests frequently disagree, you’re likely to have a positive a lot of the time, driving up your estimate of sensitivity.

In an accompanying editorial, Dick Menzies of McGill University in Montreal Canada emphasizes that combination testing was highly sensitive only because of test discordance: if the tests always agreed with each other, the sensitivity of the combination, and therefore its ability to exclude tuberculosis, would have been less. Because the study authors could identify clinical reasons for the discordance in only about 1/3 of patients, he says, it’s hard to be confident that other investigators will be able to replicate the results. So in settings where TB prevalence is high and where the pretest probability of both latent and active infection in the population is therefore also high, a positive result from an interferon-gamma release assay will be useful only if the pretest-probability of latent infection in a given patient is quite low, and a negative result may not be enough to exclude active infection. So these tests are of interest, he suggests, but they are probably not yet reliable enough for routine use in settings where TB infects a high proportion of the patients population. For these patients, he concludes, “the search for a rapid and accurate test must go on.”

Interview with Dr. Dick Menzies, MD, MSc (Time: 05:17)

So this study was a good update on TB testing for an old guy like me who’s still hung up on how to read mm of induration through the intradermal bruising of tuberculin antigen injection, but it only looked at one of the 3 clinically available assays, so I wasn’t sure how to interpret the findings in the context of the full range of TB testing options. So I called the study’s editorialist, and asked him for a sense of the Bigger Picture. Dr. Dick Menzies is Professor of Medicine and Epidemiology and Biostatistics at McGill University in Montreal, Canada, and is the author of numerous papers on tuberculosis, including a recent meta-analysis published in Annals in May of last year that clarified the many outstanding areas of uncertainty in TB testing, and recommended future areas for research. He was gracious enough to review for me what these new tests are, and how and in whom they should be used.

Q: Dr. Menzies thanks for talking to me.

A: A pleasure. Good afternoon.

Q: Can you review for us what the interferon gamma release assays are, and what they do?

A: These are ex vivo blood tests that essentially look for an immune response to tuberculosis antigens. In some ways the principle is similar to the tuberculin skin test in that their measuring a T-cell based immune response. The difference is that it’s not in-vivo; you take blood and stimulate the lymphocytes in the lab. And the second difference is that the antigens you stimulate the lymphocytes with are very specific for Mycobacterium tuberculosis.

Q: So now there’s three commercially available tests and one research assay. Can you review for us the difference between the two Quantiferon tests and the two ELISPOT tests?

A: The two Quantiferon tests are very similar. The only real difference is that in the Quantiferon Gold test you draw the blood up and then you have to transfer it in the lab in measured amounts to tubes where the stimulation goes on. In the Quantiferon In-Tube, it’s the same antigens that are used but the difference is that you draw the blood directly into small tubes, shake those tubes up, and then those tubes go directly in the incubator because the tubes are already pre-lined with the antigens for stimulating the lymphocyte to elicit the immune response to TB. The ELISPOT tests are more cumbersome because they take peripheral blood, spin it, you actually extract the white blood cells, or the lymphocytes. You then put a measured number of lymphocytes in each separate test tube, then add TB antigens to them and then you incubate them overnight. Then, the next day, you look to see which lymphocytes are producing interferon gamma which is a cytokine but it’s sort of a messenger of inflammation and appears to be an important mediator in TB immune response.

Q: Are the tests expensive, and given that tuberculin skin testing is relatively inexpensive, do preliminary data suggest anything about the cost-effectiveness of these interferon gamma release assays?

A: There is definitely quite a big difference in price of course between tuberculin and these tests. These new tests will cost anywhere from $35 to $75 a test. The manufacturer’s cost for the Quantiferon gold intube is $20to $25 U.S. per test, and for the T-SPOT it’s around $50 per test. So then you have to add both handling and tech time and whatnot to that. Several cost-effectiveness studies have been done and in essence the most cost-effective is when you do a tuberculin test first and then if they are positive, and you are otherwise in a low risk situation -- so a person who’s like a health care worker being screened for the first time, or maybe an immigrant being screened for the first time -- you do one of these interferon-gamma release assays. If that’s positive then you consider that’s true infection, if it’s negative you say okay it’s not infection: you don’t treat them, you don’t follow them, you don’t do more investigations. And in that scenario that is cost-effective. On the other hand, if you just take all comers and screen everybody it’s usually not cost-effective, certainly as a first line test if you use them in close contacts, it’s generally not as cost-effective as tuberculin.

Q: Great, so let’s move on to this week’s study. The title of your editorial asks if we can eat our cake and have it too by using interferon-gamma release assays to the diagnosis of active infection. What were you getting at by asking that particular question?

A: Okay, so these are tests that are designed to be as sensitive as possible to pick up latent TB infection. Because in TB control we want to pick up people who are contacts, who are for some reason at future risk to develop active TB. That’s what the tuberculin test is really designed to do. And we know from many studies that that’s what it can do. So the interferon-gamma release assays are being touted as a replacement for the tuberculin test. They seem to have very similar sensitivity and in many populations they have better specificities. So they do seem poised to ultimately replace the tuberculin test for latent TB infection. But now the problem is, let’s say we want to use them for diagnosis of active TB. The immune-based tests don’t distinguish between latent and active. So if you have latent, it’s going to be positive just as much as if it’s active. So if you take, let’s say, an immigrant population coming from a high incidence country, that population, especially among adults, half of them may have latent TB infection. So if you do it on immune-based test, you’re going to find half of them positive. And so you’re not going to be very much further ahead in diagnosing active TB, and in fact what you’re going to end up with is that a lot of people with positive tests who actually have pneumonia or they have lung cancer or they have a 101 other causes of a lung problem and they also have latent TB infection. And that’s what I mean: you can’t design a test that’s very sensitive to pick up all latent infection and then turn around and say that it’s going to be useful in diagnosing active TB because it’s going to pick up far more people with latent than with active.

Q: So what would you tell us this study tells us from a research perspective that we didn’t know before? And what would you say are the next steps?

A: I think that at the moment the work to find the characteristics of the immune response that would distinguish active TB from latent TB is ongoing, that is right now still at a much more fundamental level that involves exploration of multiple different antigens and multiple different responses. So instead of just interferon-gamma, people are looking at IL4, IL10, IL12, different cytokines which are inflammatory mediators. They are looking at totally different antigens and peptides than are used now. And I think that most people realize that the difference in immune response between latent and active TB is very complex, and it is very, very unlikely that a single quite simple test like these interferon-gamma release assays will accurately distinguish latent from active TB. And so in the future, I think that’s where research work is going. I think the promise of the interferon-gamma release assays is in the diagnosis of latent TB, and ultimately, we hope, in identifying people who are truly at increased risk for active disease in the future and who therefore would benefit from latent TB therapy with INH or other agents.

Q: So given the role of these interferon gamma release assays in the diagnosis of latent infection can you tell listeners in whom they should be used and when?

A: I think that these assays are very useful in someone who’s got a positive tuberculin response who is otherwise judged to be of low risk to develop TB. So a healthy person with a normal x-ray in whom you are really suspecting a false positive result, often they are foreign born, they’ve got a history or questionable history of BCG vaccination and you’re suspecting that this has nothing to do with TB and is just a BCG-related reaction. Probably the greatest utility of these assays is in this type of patient population. You may even want to argue that you could directly screen this population, skip the tuberculin and go straight to interferon-gamma release assays. The other group where interferon-gamma release assays may be considered is when you’ve done a tuberculin in an immunocompromised patient and it’s negative and you’re concerned it may be false negative because of anergy, and you’re still concerned about latent infection. Then an interferon-gamma release assay may be positive and certainly the combination may be more sensitive than one test or the other by itself. So when you want to enhance sensitivity you can do these tests in addition to tuberculin by considering either positive you will certainly enhance sensitivity, pick up more people at risk who would benefit from therapy of latent infection. And on practical grounds, I’d recommend a Quantiferon In-Tube because it seems to be the most practical and easy. If you want to choose on the basis of enhancing sensitivity without regard to cost, or technical simplicity, then the T-SPOT or the ELISPOT seems to be the better.

Q: Dr. Menzies thanks so much for talking to me.

A: My pleasure.

That was Dr. Dick Menzies of McGill University discussing the findings of this week’s lead article defining the diagnostic accuracy of two interferon-gamma release assays in the diagnosis of active tuberculosis.

Other Articles In This Week’s Issue (Time: 14:10)

Other articles in this week’s issue include a nationally representative cohort study of variability in liver enzyme test results, suggesting that about 1/3 of people with initially abnormal values will have normal values on retesting.

We have two articles this week on prediction and breast cancer, the first of which reports the accuracy of a statistical model that uses measures of radiographic breast density to predict new breast cancers in a large cohort of women, and the other reviews the predictive accuracy of gene expression profiling assays for characterizing early-stage breast cancer;

This issue also contains a systematic review of evidence about the effectiveness of cholinesterase inhibitors and memantine for treating dementia, concluding that trials to date show improvements in cognition that are statistically but not always clinically significant; that review is accompanied by a clinical practice guideline recommending that decisions about starting treatments for dementia, and choices between those treatments, should be individualized;

We have a perspective on the challenges of measuring health care quality and physician performance in small office practice settings, with suggestions for potential solutions;

And we have an entry in our occasional Trials That Matter editorial series that reminds readers about large, recently published, important trials that they may have missed in other journals. This one covers the ADVANCE trial, published in The Lancet in September, which randomized over 11,000 patients with type 2 diabetes to a fixed combination pill containing an ACE inhibitor/thiazide diuretic. The editorial has two bottom lines, neither of which is entirely new: first, in patients with type 2 diabetes, pay as close attention to lowering blood pressure as to lowering blood glucose and HbA1c levels; and second, throw away the standard threshold of 140/90 in diabetics; treatment should be started for any BP above 130/80, and should be intensified until pressures fall below those levels.

And this issue has this month’s In The Clinic supplement, on COPD.

Excerpts From An Institute of Medicine Panel Discussion on The Future Of Comparative Effectiveness Studies: Comments By Dr. Mark McClellan, MD, PhD (Time 16:04)

We’ll close today with excerpts from a recent panel discussion conducted at the Institute of Medicine in Washington on comparative effectiveness studies. Regular Annals readers will have noticed that we’ve published a steady stream of comparative effectiveness reviews for common conditions, including diabetes, osteoporosis, and prostate cancer. These studies are funded by the Agency for Healthcare Research and Quality, which is charged by Congress with conducting evidence syntheses and research on topics of high priority to Medicare, Medicaid, and the State Children’s Health Insurance Program (SCHIP). These studies are a good fit with Annals priorities and those of our readers, but as I’ve mentioned in previous podcasts, their greatest contribution often lies in revealing what we don’t know. In the following excerpt, Dr. Mark McClellan, Director of the Engelberg Center for Health Care Reform at the Brookings Institution in Washington DC and former CMS Director and FDA Commissioner, outlines a vision of where comparative effectiveness studies have to go to be really relevant for the practice of medicine.

Broadening The Concept of Comparative Effectiveness

What a lot of people think of when they think of comparative effectiveness is a head-to-head trial: one specific treatment versus another. What I want to emphasize is that that is not the only kind of treatment decision, maybe not even the most important kind of treatment decision where we need better evidence.

In actual medical practice, styles of practice seem to differ in ways that go well-beyond one major treatment versus another. It’s not just one specific procedure versus another or a brand-named drug versus generic that seems to differ systematically around the country, but rather much more subtle and complex differences in medical practices. When you are treating a patient with diabetes or heart failure and other chronic illness: how often do you see them in your office? How often do you refer them to a specialist? Which specialist do you refer them to? How often do you follow-up lab tests? Which lab tests do you order? What imaging procedures do you do, and how often? These are more subtle but very important aspects of medical management and there is not any medical textbook where you can look up easily the answer to these kinds of questions. And these questions have very important implications for cost variation and, if you look over time, for cost growth as well, and, it’s not clear that those kinds of questions are going to be answered by standard head-to-head clinical trials.

In addition, we’re seeing health care move in the direction of more targeted or personalized therapy. When you think about most clinical trials, you think about a report on average effect in a population of patients. Well, there are getting to be fewer and fewer “average” patients out there. Maybe you can address that by having trials that look at sub-group analyses or particular subpopulation, but that gets more challenging, requiring bigger sample sizes, larger studies, longer studies. When in fact we may need to make some kind of shift in the way that we approach answering these questions of what are we trying to learn and focus more on how do we develop policies and approaches to medical care to get the right treatment to the right individual patient, rather than focusing on estimating average effects alone. Relevant evidence for targeted health care.

Integrating Data Collection and Health System “Learning” With Actual Medical Practice

One aspect that I want to emphasize is whether this is an add-on or an approach that is more integrated with the actual delivery of health care. Now the reason I think this is important is that, again, some people may think of comparative effectiveness as being about providing new funding for large clinical studies on top of our existing health care system. After all, that’s how you do a typical head-to-head trial between two different drugs or surgical versus medical management, or something like that. I just want to admit, that if that’s the way we’re viewing this, even if we do find a few billion dollars or more to conduct these studies, I’m not sure that it’s going to fundamentally change the big gaps that we have in the availability of evidence that’s relevant to particular practical medical decisions for individual patients.

An alternative to this is to think about an approach, wherever it’s housed, that involves supporting a better infrastructure for learning from actual medical practice, not traditional clinical trials, but rather learning more from the increasingly detailed and sophisticated amount of information that’s available through our increasingly electronic health care system. We’ve got a long way to go. We’ve got an awful lot of data available now and medical practice that is potentially important for learning about variations in treatment and things that we aren’t using that well now. Is it possible to build what you might call a distributed data network where individual health plans or particular data sources within our health care system could learn more by finding ways to put their information together? Related to this are how would these kinds of studies be coordinated and supported? Where is the funding going to come from? And, again, if we can find ways to build this into our health care system it may well turn out to be significantly cheaper, more bang for the buck, than if we try to fund the additional traditional studies on top of it or outside of it.

There are ideas out there now about finding ways to build incentives and support for developing evidence development into health care delivery. While I was at CMS, we, and also many other private payers developed ideas around the concept of coverage with evidence development where there was sufficient evidence available to suggest that we didn’t want to turn down access to a medical treatment or a new approach to medical practice. But, at the same time, there’s a lot we didn’t know about potential benefits and risks for particular kinds of patients. And if there are ways to build in the development of better evidence to actual delivery of care as part of coverage, well that can help provide a funding mechanism in the way of supporting getting to this more of a learning health care system, as I’ve been talking about.

And then, how are these studies going to be conducted? The “gold standard” as everybody knows, is the randomized, controlled trial: double-blinding and other conditions set up so that you can be quite confident that if you see a statistically significantly different set of results in populations who got one treatment versus another it’s likely to be causally due to that treatment. The challenge is that if we’re trying to learn about practical questions and actual medical practice, it’s very difficult to develop that kind of evidence from many RCTs. Sometimes patients don’t want to be randomized, other times the treatments themselves are hard to study. In a traditional clinical trial context most of the questions that we have today are related to issues of effectiveness or comparative effectiveness and particular sub-populations of patients that may have many comorbid diseases, that may be taking other medications or receiving other combinations of treatments that may influence their outcomes. All very hard to develop timely evidence at a low and a feasible cost through traditional clinical trials. This has led to a lot of emphasis on developing ways of learning from actual medical practice, which has its challenges. Actual medical practice is based on observational data, patients typically are not randomized or at least not cleanly randomized to a particular treatment over time. So, the challenge of sorting out the effect of the treatment from the effects of measurement problems or comfounding patient factors or other treatments that are being given to the patient at the same time can make it difficult to reach important and competent conclusions about the impact of treatment. All that said, the growth in registries, the growth in the availability of observational data sets and the like all support finding ways to learn more and more competently from actual medical practice.

Developing Measures To Narrow The Gap Between Evidence and Practice

Add to that the last point about, of evaluating impact which is that if we are really focused on having an effect of these kinds of better evidence on medical practice, it’s not just enough to do the study itself, we need to learn more about finding ways to get the information or the evidence developed through these studies to the point where they’re actually influencing medical practice. There is a huge drop-off now between what is known and what is actually delivered in health care, and this is such a big gap that I think it’s not something we can ignore. It’s something we have to build into the methods that we’re developing and applying. And if you want to learn about policy approaches or strategies, again, to get the best treatment to each individual patient at the right time, you need to look at methods that are going to be focused on actual clinical practice. And to evaluate things like, not comparison of one treatment versus another, but methods that compare health care policies, and different formulary approaches which are going to have impact on how widely and which patients receive a particular drug in a population lead to better outcomes that will lower costs for that population. Do different dissemination strategies for helping physicians and patients get information about their treatment options lead to a positive impact on outcomes and costs? That’s the bottom line – improving health, avoiding unnecessary health care costs, and just looking at the evidence for trying to develop evidence on one treatment versus another isn’t going to necessarily address that problem. At least we’re seeing some very huge gaps in our health care system today.

Developing Better Methods for Understanding Observational Data

There are a number of I think promising methods in development for learning from imperfectly randomized or observational data. Data from our actual health care system. Data that could help us turn what we have now into a true learning health care system that can do a much better job of developing relevant evidence for a particular patient. These include methods like propensity scores and other approaches that adjust for observable and measurable differences between patients; instrumental variables analysis, something I used to do early in my career, a very familiar technique for economists looking at forces of variation unrelated to patient’s factors that influence treatment choices and thus can give you potentially unbiased estimates of the impact of changes and use of a treatment in a population; Bayesian and other methods that enable you to combine information from previous or other types of studies with the further analysis that’s being done now to add power and borrowed strength are also a potentially important approach.

Over the last 30 years, I think we’ve seen a tremendous amount of progress in our sophistication in methods for randomized clinical trials. We’re just now starting to see, I think, some more interest in developing these methods for non-randomized studies for real world practice settings. We have a long way to go to get to a high level of competence. But, if we’re going to make the approaches to learning relevant evidence about a particular patient work this is clearly an important area for further work and support.

My hope is that through better answers to these kinds of questions we’re going to be able to do a much better job of getting to what the Institute of Medicine has called the goal of a learning health care system. As we develop a better evaluation infrastructure in health care delivery, we will be able to have available better measures of what we really want, which is the health result for our population and their cost. With more sophisticated infrastructure and better methods and measures we can conduct better studies to develop evidence that can help us get the right treatment to the right patient. Interpreting and communicating results is a key part of this, and with this better evidence we can do more to reform payments, reform health care policies, to support further improvements in care, and hopefully have the kind of positive impact on health care delivery and on innovation of our health care system that gets us both better health outcomes and much better results for patients at a more sustainable cost.

That was Dr. Mark McClellan discussing the future of comparative effectiveness studies in the context of a “learning” health care system. This is part 1 of a 2-part series; I’ll include comments from the other participants in our next summary, but listeners interested in hearing more can go to the IOM website where the interview is posted in its entirety. It’s a long web address, so I won’t spell it out for you here, but I’ll include it in the transcript of our podcast, available on our podcast homepage at www.annals.org/podcast.

[Dr. Mark McClellan Interview URL: www.iom.edu/?id=51642]

Well, that’s it for today.

As usual, our theme music is by Brian Poole and Kwesi Marles.

Send comments, criticisms, feedback, and suggestions about these summaries and the journal, to podcast{at}annals.org.

Technical support for this summary was provided by Andrew Langman, Neil Kohl, and Beth Jenkinson.

In honor of our lead article this week, we’ll go out today with a single off Document Records’ 2005 Leadbelly Volume 1 release, from 1939, here’s Huddie Ledbetter, better known as Leadbelly, singing the T.B. Blues.

[ Return to Podcast Home Page ]



 Home | Current Issue | Past Issues | Audio/Video | CME | Collections | In the Clinic | Mobile | Subscribe | Tools | Help | ACP Online 

Copyright © 2008 by the American College of Physicians.