Home |
Current Issue |
Past Issues |
In the Clinic |
ACP Journal Club |
CME |
Collections |
Audio/Video |
Mobile |
Subscribe |
Tools |
Help |
ACP Online
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
15 October 1997 | Volume 127 Issue 8 Part 2 | Pages 743-750
Generic health surveys have been proposed for use in increasingly diverse applications and populations. This paper describes the history of generic tools in the past 30 years and suggests a more modern measurement platform for advances in the 21st century. Many generic tools lack the precision required for effective health care decision making. A meaningful goal for the next era of development of generic measures should be the generation of equiprecise measurement for generic health concepts. Equiprecise tests yield measures of equal precision at all levels of the underlying construct. Equiprecise measurement can be achieved through conjoint use of computerized-adaptive testing as the survey platform and item response theory as the measurement theory.
QUALITY MEASUREMENT AND IMPROVEMENT
Generic Health Measurement: Past Accomplishments and a Measurement Paradigm for the 21st Century
Over the past 30 years, researchers have generated numerous tools that use self-reporting to measure functional status, emotional well-being, and subjective perceptions of health [1-22]. Uses of generic surveys have increased dramatically in recent years as a result of the outcomes movement. Some authors have advocated the routine inclusion of data on generic health status in large databases [23], but others, such as Liang and Shadick [24], caution that the utility of doing so remains largely unproven. The purpose of this paper is to describe the history of generic health measurement and to suggest a more modern measurement paradigm for the 21st century. The conjoint use of computerized adaptive testing and item response theory offers distinct advantages for health outcomes assessment that could improve the feasibility and utility of including patient-centered data in large administrative databases.
The Evolution of Generic Health Measurement
![]()
Figure 1 presents a timeline of the evolution of generic health measures with respect to broader developments in health policy and health status assessment. Roughly coincidental with the publication of the World Health Organization's definition of health [25] was the emergence of clinically based, global rating scales whose content extended beyond organ function to encompass human function. Such measures as the Karnofsky performance status scale [26] and the function scale of the American Rheumatoid Association [27] were intended to supplement physiologic measures in an attempt to better understand treatment effectiveness. Around the same time, efforts to modernize national health indicators, including the incorporation of single-item indicators of activity limitations and perceived health in the National Health Interview Survey [28], were initiated [29].
|
The policy initiatives of the "War on Poverty" in the mid-1960s prompted two advances in health measurement. First, the social indicators movement ushered in measurement of quality of life in general populations [30, 31] and provided indicators of how well we lived, which were to be used with existing measures of how much we produced and spent [32]. Second, unified indexes of mortality and morbidity were developed for planning and evaluation purposes at the population health level [33-35].
A watershed for generic health assessment can be traced to the Human Population Laboratory, which launched measurement work in physical, mental, and social health [1-4]. As important, the Human Population Laboratory demonstrated that respondents will complete long surveys by mail [36], a finding that reduced the bias against mail surveys.
In the 1970s, the development of generic tools proliferated, in part as a result of extramural support from the National Center for Health Services Research. Definitional expansiveness was the signature of this era, and multi-item scales replaced single-item measures. The Quality of Well-Being Scale, developed for priority setting and program evaluation, represented a meaningful advance by measuring the value components of a social indicator of health [6, 37]. Next, the Sickness Impact Profile [7, 38] was developed for health care evaluation. The 136 items in this profile were obtained from patients, providers, and caregivers and yielded individual health profiles and summary scores. The McMaster Health Index Questionnaire [8, 39] followed. Intended for use in clinical and health services research, it measured physical, social, and mental health by using 59 items. The Health Perceptions Questionnaire [40] was constructed for use in health planning and evaluation and tapped the elusive realm of "positive health."
In 1979, health status measures for the adult general population emerged from the Health Insurance Experiment [9]. Next, the Nottingham Health Profile [10, 41] was developed for use in population surveys, clinical trials, and clinical practice. The 38 items in the Nottingham Health Profile tapped six health concepts and were derived from patients. The Duke Health Profile [11] was developed for use in research and clinical applications in primary care. The 63 items in this profile covered four health concepts and were obtained from the literature.
In the early 1980s, development of new measures took a respite but health research increasingly applied existing measures [42-44]. Interest in methodologic issues increased [45-48]. By the mid-1980s, interest had developed in the use of generic tools in everyday clinical practice, largely because of research showing poor correspondence between clinician and patient ratings of function and well-being [49-51]. In addition, growing recognition of the biopsychosocial model [52] and its relevance to an aging population resulted in increased appreciation that the preservation of function and well-being is an important goal of medical care [53]. Clinical practice applications ushered in the era of practicality. Shorter tools were developed: The Functional Status Questionnaire consisted of 34 items [13], and the Dartmouth COOP Charts had 9 items [14]. These tools were developed with measurement priorities directed toward practical efficiency (for example, ease of administration and scoring), which was achieved at the expense of measurement precision [54, 55].
The most recent era of health measurement is that of psychometric efficiency, which has several underpinnings. First, the outcomes movement gained momentum after Ellwood's Shattuck lecture was published [23] and the Agency for Health Care Policy and Research was established in 1989. Large-scale studies of patient-based outcomes were imminent. Second, burdened by study costs that spanned outcomes ranging from pathophysiology to quality of life, the clinical trials community sought more economical measures of health status. Third, concerns about respondent burden among severely ill patients encouraged shorter surveys.
The Medical Outcomes Study (MOS) Short Form (SF) 20 Survey [16, 56] was the first to surface. The 20 items derived largely from the Health Insurance Experiment and tapped six health concepts. Next emerged the Duke Health Profile [17], a 17-item survey that was empirically derived from the original Duke Health Profile. The SF-36 [21] developed out of the SF-20 and the 149-Item Functioning and Well-Being Profile, which measures 16 health concepts [19]. The SF-6 Survey, derived from the Functioning and Well-Being Profile, uses a single item to tap 6 health concepts [19]. The SF-12 Survey is an empirically derived short form of the SF-36 [22].
Over the past 30 years, we have greatly improved our measurement bandwidth in generic health assessment (the breadth of health dimensions measured). Many different health concepts are now measured across the armamentaria of generic tools, although specific surveys differ in bandwidth (for example, the Sickness Impact Profile measures 12 health concepts, whereas the McMaster Health Index Questionnaire measures just 3). However, many generic measures, even those with excellent bandwidth, still have problems of fidelity (that is, thoroughness and depth of measurement). Thus, although we now quantify many different dimensions of health, we often do so at the expense of precision. Overall, many generic tools lack the precision required for effective health care decision making. Precision is conceptualized here and elsewhere [57] as a property of a measure that encompasses both the range or depth of measurement and the number of distinct levels enumerated by a scale (fineness of specification).
Prevailing Measurement Paradigm
|
|---|
First, fixed-length health surveys tend to bore healthier respondents (because they have to wade through items that are easy for them to do, such as bathing) and frustrate more impaired respondents (because they have to respond to items that are clearly impossible for them to do, such as running one block). Such complaints about generic surveys are common from respondents. Respondents do not object to survey length itself; rather, they are frustrated by redundant items and items that to them are of low salience and relevance [58-60].
Second, because item selection is geared toward the middle-of-the-road in content coverage and difficulty, the end points of the health continuum tend to be poorly defined. This yields ceiling effects for general populations and floor effects for more disabled populations. For many generic measures [54, 55, 61], score distributions are often highly skewed, such that a plurality of respondents are classified as being in a state of "perfect" health at or near to the ceiling of the scale. Very large ceiling effects (up to 70%) have been observed in general and primary care populations [41, 62-64]. Ceiling effects are more prevalent than floor effects because many generic tools represent health as the absence of limitations.
Score imprecision has two principal consequences. First, it is impossible to distinguish among persons at the ceiling or floor, even though they probably vary in the underlying construct. For example, as shown in Figure 2, 55% of patients participating in the MOS scored perfectly on the SF-6 measure of physical functioning [54], but 69% of those patients had less than a perfect score on the SF-36 physical functioning scale, a longer parallel-form measure. For health care managers and policy-makers, ceiling effects paint a more favorable image of population health than is true. For researchers, ceiling effects produce type II errors in hypothesis testing. For clinicians, ceiling effects yield false-negative outcomes (that is, sensitivity at the upper end of the scale is low). The second consequence is that it is impossible to measure decline in health over time for persons at the floor and improvement in health over time for persons at the ceiling. Thus, score distributions that are skewed at baseline underestimate or miss the effects of treatment or natural history on health status.
|
Paradigm To Achieve Precision across Populations and Applications
|
|---|
A logical progression of generic health assessment, especially if routine inclusion of these data in administrative databases is desirable, is to move from paper-and-pencil surveys to computerized adaptive testing of health status. This method uses a computer (or a computerized telephone interview) to administer items to respondents and is adaptive in a literal sense because each "test" is tailored to the unique ability level of each respondent. Each person taking a computerized adaptive test is taking a different version of the test because items are administered on the basis of the respondent's previous answers (for example, whether one passes or fails items). As discussed below, item response theory allows all of the different forms of a test to connect to each other on the same metric or yardstick.
Computerized adaptive testing is becoming commonplace in knowledge-based testing. For example, academic admissions examinations, such as the Graduate Management Admission Test, are increasingly based on computerized adaptive testing. National nursing licensing examinations now use this method [67, 68], and other medical boards are moving in that direction [69-72]. The advantages of computerized adaptive testing for large-scale testing, whether it be for credentialing or health status assessment, are numerous and include improved test security, increased flexibility of test scheduling, reduction of testing time by one quarter to one half, and rapid scoring and feedback of results [68, 72, 73].
How would computerized adaptive testing work for generic health assessment? Once important preconditions are met (outlined below), the following computerized algorithms would be developed: starting rules; item selection procedures; answering rules, such as time limits; scoring rules; stopping rules; and reporting generation. The following example illustrates the use of computerized adaptive testing. A 63-year-old woman comes to a clinic for an annual examination. She is seated at a station with a large screen. The only skills she requires are knowing how to use the space bar, enter key, and tab key, each of which is large and is color-coded. She is prompted for her age and sex, which are used to select the starting point for her "physical functioning computerized adaptive test." Her starting question asks about walking the length of a city block. If she passes this item (that is, she can accomplish the task), the computer bypasses all of the easier items (such as walking across the room) and moves to more difficult questions (such as walking 1 mile). As the test proceeds, she is given questions that narrow in on her zone of physical functioning.
With computerized adaptive testing, the degree of precision obtained for a person depends on the uses to which the data will be put. For example, a clinician may want high precision when the consequences of assessment include nursing home placement. The value of computerized adaptive testing is that precision is determined by the user, not by the test or by the population. Use of computerized adaptive testing for generic health assessment could reduce the human capital involved in administering health questionnaires, challenge respondents at their ability level instead of boring or discouraging them, provide users with the exact amount of precision desired for each application, and provide "real-time" scores to users. This new era of measurement is probably 3 to 8 years away, but it is possible.
Development of a computerized adaptive testing strategy would require three phases of methodologic work. The first task would be to assemble item banks on different generic health concepts. An item bank consists of many questionnaire items that are matched to a given concept or task [74, 75]. Items would be assembled from existing measures. The language and structure of some items would have to be modernized, and the reading level of other items would need to be decreased. An appropriate rating scale would need to be selected to maximize reliable variance and minimize respondent burden and response invalidity.
The second task would be to conduct cognitive interviews with various patient groups to obtain in-depth information on the respondents' understanding and acceptance of the revised items. Previous research that used cognitive interviews showed numerous problems with the extent to which respondents understand health questions and the manner in which they answer them; these problems can compromise the validity of the obtained data [76, 77]. These interviews could also obtain input from respondents on gaps in item content.
The third methodologic task would be to use techniques subsumed under item response theory to evaluate the caliber of each banked item and its location (that is, relative difficulty) on the underlying trait (for example, physical functioning) and to build efficient tests from the newly evaluated item banks. Item response theory is both a theoretical framework and a collection of quantitative techniques used to construct tests, scale responses, and equate scores, as well as to identify item bias and facilitate computerized adaptive testing. Item response theory is increasingly embraced as an alternative to classic test theory, the theory under which many generic and disease-specific tools have been evaluated [74, 78].
There are several important differences between item response theory and classic test theory. First, quantitative indexes of psychometric performance (such as validity or reliability) derived under classic test theory are not intrinsically generalizable across populations (younger compared with older persons or sick compared with healthy persons), applications (cross-sectional compared with longitudinal studies), or testing situations (mail compared with telephone administration or completion at home compared with in the clinic), but most users naively assume that they are. This lack of generalizability undermines the integration of generic health data in large databases that span different patient groups and different testing situations. Second, as its name implies, classic test theory is test driven rather than item driven. Different tests cannot be placed along a common metric. Each test is its own separate yard-stick-each occupies different planes of a space rather than different spots on a common, underlying continuum. Thus, a large database that contains the Sickness Impact Profile cannot be compared with one that contains the SF-20.
Because it is item focused, item response theory goes beyond the weak assumptions and the "boundedness" classic test theory. The unit of analysis in item response theory is the item and, more specifically, its location on the underlying, continuous trait of interest (the latent trait). An important feature of item response theory is that it synergistically analyzes respondent ability and item difficulty and mathematically places them together on the same metric (unlike other measurement theories, which divorce ability from item difficulty). The item response theory model provides the empirical link between individual responses (observable performance) and the latent trait (unmeasured construct) and estimates a score for the respondent on the underlying trait and a difficulty estimate for the item on the underlying trait. Various mathematical models (for example, one-, two-, and three-parameter models and binary and polytomous response models), each with different assumptions, can be used to estimate the association between person ability and item difficulty.
The strengths of item response theory for advances in generic health assessment are twofold. First, the theory is a powerful technique for understanding the structure, order, and interrelations of items. For computerized adaptive testing, extensive item response theory modeling would be conducted on the item banks. Each item bank would have to be large enough to accommodate the fact that some items would perform poorly in terms of empirical tests and would have to be discarded or revised. Modeling according to item response theory would identify gaps in item difficulty and content coverage on the underlying trait. Items could then be developed to fill in these gaps; this process would reduce skewed score distributions and make headway toward equiprecise measurement. Second, empirical parameters of respondent ability and item performance would then be used to develop efficient computerized adaptive testing algorithms and to equate different forms of a given computerized adaptive test with each other. It is at this point that widespread computerized adaptive testing becomes a reality and that different users of the same item bank can speak a shared language through test equating.
Item response theory methods have been used in rehabilitation medicine [79-81], mental health [82, 83], and disease-specific instruments [84-89]. These methods have been used more to validate tests [90-94] than to construct instruments or score scales. If item response theory and computerized adaptive testing seem so promising, why haven't they seen greater application in generic health measurement to date? First, item response theory requires measurement of unidimensional concepts; not all generic concepts meet this criterion. For example, multifactorial mental health scales that tap anxiety, depression, and psychological well-being would not satisfy unidimensionality criteria. Second, item response theory rests on an ordered continuum of items, which differs greatly from many generic scales constructed in the Likert tradition. Third, fitting of an item response theory model, which is an essential a priori task for computerized adaptive testing, requires iterative modeling and sensitivity tests. It is a very time-consuming method that requires highly specialized skills. Fourth, item response theory is a large-sample method, requiring hundreds of participants for modeling. Fifth, computerized adaptive testing requires large item banks, the construction of which is a substantial methodologic task. Finally, computerized adaptive testing would yield truly generic surveys. Scientists and users alike would need to support "no-name" tools, which may run counter to today's proprietary and trademarked instruments.
The most efficient way to proceed from paper-and-pencil surveys to a 21st century computerized adaptive testing paradigm would be to establish publicly supported measurement centers for different generic health concepts. Each center would be responsible for accomplishing the three broad agendas outlined above on a given health concept and for helping investigators to use generic adaptive tools and interpret the results. These centers would be clearinghouses for measurement, validation, calibration, and interpretation. With the support of the Department of Veterans Affairs, I am developing an item bank for physical health concepts. This project is a preliminary feasibility assessment of the use of item response theory models (and subsequently computerized adaptive testing) for measuring generic health status.
What To Do in the Meantime
|
|---|
For now, investigators will continue to grapple with the bandwidth-fidelity dilemma. The challenge is to match the focus of the study with generic concepts that are known or have been hypothesized to covary with it. Selection of generic concepts should be hypothesis driven [95]. Health concepts that have the greatest clinical, policy, and social bearing should be measured with the greatest precision possible. Because the fixed costs of data collection are high and survey response rates are inelastic (insensitive) to survey length [96, 97], it may be advantageous, when in doubt, to opt for more precise measurement. It will also be important to ascertain where a sample is likely to fall on the score distribution. Different generic measures vary in overall precision and in the location of their precision [57]. The power of hypothesis testing is enhanced when a suitable correspondence exists between a measure's precision position and where a given sample will be distributed at baseline and follow-up. As suggested above, the new era of computerized adaptive testing will facilitate the correspondence between respondent ability and item difficulty.
Individual-Patient Applications
Provision of reports on functional status and well-being to clinicians has not led to changes in practice style and has not improved patients' health outcomes [98-101]. These disappointing findings may be the consequence of using group-level tools for individual-patient assessment [55]. Group-level measures yield imprecise (as seen in large confidence intervals) and insensitive (as reflected by false-negative outcomes associated with large ceiling effects) scores for individual patients [55]. A major disadvantage of using group-level tools with individual patients is that the scores obtained are not easily interpretable: Scores between the lowest and highest possible values can be achieved by countless combinations of item responses. Item response theory, on the other hand, yields scores that can be more easily interpreted in terms of cause, and departures from the expected order can be determined. Item response theory also yields reliability estimates at the level of the individual person, which facilitates the assessment of longitudinal change in health.
The Role of Item Response Theory-Based Computerized Adaptive Testing in Measuring Outcomes by Using Large Databases
|
|---|
Computerized adaptive testing-based measurement (generic and disease-specific) could yield effective information for health care decision making at many levels. Users would choose the precision desired for any given application, instead of being held hostage by the fixed parameters of existing instruments. Item response theory is the theoretical and mathematical glue that allows tests of different precision to be compared with each other. Successful item response theory modeling transforms scales that differ in precision into a common currency of ability (test equating). For example, a single-item measure of depression used in an employer survey could be equated to a somewhat longer measure used in a clinical trial, which in turn could be equated to an even longer measure used diagnostically in clinical practice. Item response theory-based computerized adaptive testing would enable the equating of scores into a common yardstick across tests, individuals, or time. The development of a shared language that goes beyond specific items to location on an ability scale would provide users tremendous flexibility in building and maintaining an outcomes capacity within and across different databases.
Conclusion
|
|---|
|
|
|---|
Author and Article Information
|
|---|
|
|
|---|
References
|
|---|
|
|
|---|
1. Breslow L. A quantitative approach to the World Health Organization definition of health: physical, mental and social well-being. Int J Epidemiol. 1972; 1:347-55.
2. Belloc NB, Breslow L, Hochstim JR. Measurement of physical health in a general population survey. Am J Epidemiol. 1971; 93:328-36.
3. Berkman PL. Measurement of mental health in a general population survey. Am J Epidemiol. 1971; 94:105-11.
4. Renne KS. Measurement of social health in a general population survey. Social Science Research. 1974; 3:25-44.
5. Dupuy HJ. The psychological section of the current health and nutrition examination survey. Proceedings of the Public Health Conference on Records and Statistics. U.S. Department of Health, Education, and Welfare publication no. (HRA) 74-1214. Washington, DC: US Gov Pr Office; 1973.
6. Bush JW, Chen MM, Patrick DL. Social indicators for health based on function status and prognosis. Proceedings of the American Statistical Association, Social Statistics Section. 1972:71-80.
7. Gilson BS, Gilson JS, Bergner M, Bobbit RA, Kressel S, Pollard WE, et al. The sickness impact profile. Development of an outcome measure of health care. Am J Public Health. 1975; 6:1304-10.
8. Chambers LW, Sackett DL, Goldsmith CH, MacPherson AS, Mcauley RG. Development and application of an index of social function. Health Serv Res. 1976; 11:430-41.
9. Brook RH, Ware JE Jr, Davies-Avery A, Stewart AL, Donald CA, Rogers WH, et al. Overview of adult health status measures fielded in RAND's health insurance study. Med Care. 1979; 17(7 Suppl):1-131.
10. Hunt SM, McEwen J. The development of a subjective health indicator. Sociol Health Illn. 1980; 2:231-46.
11. Parkerson GR Jr, Gelbach SH, Wagner EH, James SA, Clapp NE, Muhlbaier LH. The Duke-UNC Health Profile: an adult health status instrument for primary care. Med Care. 1981; 19:806-28.
12. Torrance GW, Boyle MH, Horwood SP. Application of multi-attribute utility theory to measure social preferences for health states. Operations Research. 1982; 30:1043-69.
13. Jette AM, Davies AR, Cleary PD, Calkins DR, Rubenstein LV, Fink A, et al. The Functional Status Questionnaire: reliability and validity when used in primary care. J Gen Intern Med. 1986; 1:143-9.
14. Nelson E, Wasson J, Kirk J, Keller A, Clark D, Dietrich A, et al. Assessment of function in routine clinical practice: description of the COOP Chart method and preliminary findings. J Chronic Dis. 1987; 40(Suppl 1):55S-69S.
15. Rosser R. A health index and output measure. In: Walker SR, Rosser RM, eds. Quality of Life: Assessment and Application. Lancaster, England: MTP Pr; 1987:138-60.
16. Stewart AL, Hays RD, Ware JE Jr. The MOS Short-Form General Health Survey. Reliability and validity in a patient population. Med Care. 1988; 26:724-35.
17. Parkerson GR Jr, Broadhead WE, Tse CK. The Duke Health Profile. A 17-item measure of health and dysfunction. Med Care. 1990; 28:1056-72.
18. EuroQOL-a new facility for the measurement of health-related quality of life. The EuroQOL Group. Health Policy. 1990; 16:199-208.
19. Stewart AL, Sherbourne CD, Hays RD, Wells KB, Nelson EC, Kamberg CJ, et al. Summary and discussion of MOS measures. In: Stewart AL, Ware JE, ed. Measuring Functioning and Well-Being: The Medical Outcomes Study Approach. Chapel Hill, NC: Duke Univ Pr; 1992:345-71.
20. Ware JE, Nelson EC, Sherbourne CD, Stewart AL. Preliminary tests of a 6-item general health survey. In: Stewart AL, Ware JE, eds. Measuring Functioning and Well-Being: The Medical Outcomes Study Approach. Chapel Hill, NC: Duke Univ Pr; 1992:291-303.
21. Ware JE Jr, Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36): I. Conceptual framework and item selection. Med Care. 1992; 30:473-83.
22. Ware J Jr, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996; 34:220-33.
23. Ellwood PM. Shattuck lecture-outcomes management. A technology of patient experience. N Engl J Med. 1988; 318:1549-56.
24. Liang MH, Shadick N. Feasibility and utility of adding disease-specific outcome measures to administrative databases to improve disease management. Ann Intern Med. 1997; 127(8 pt 2):739-742.
25. World Health Organization. Constitution of the World Health Organization. New York: World Health Organization; 1947.
26. Karnofsky DA, Abelmann WH, Craver LF, Burchenal JH. The use of the nitrogen mustards in the palliative treatment of carcinoma: with particular reference to bronchogenic carcinoma. Cancer. 1948; 1:634-56.
27. Steinbrocker O, Traeger CH, Batterman RC. Therapeutic criteria in rheumatoid arthritis. JAMA. 1949; 140:659-62.
28. National Center for Health Statistics. Origin, program, and operation of the U.S. National Health Survey. Washington, DC: US Publ Health Serv; 1963.
29. Linder FE. National health survey. Science. 1958; 127:1275-80.
30. Andrews FM, Withey SB. Social Indicators of Well-Being: American's Perceptions of Life Quality. New York: Plenum Pr; 1976.
31. Campbell A, Converse PE, Rodgers WL. The Quality of American Life. New York: Russell Sage Foundation; 1976.
32. Bauer RA, ed. Social Indicators. Cambridge, MA: MIT Pr; 1966.
33. Sanders BD. Measuring community health levels. Am J Public Health. 1964; 54:1063-70.
34. Sullivan DF. A single index of mortality and morbidity. HSMHA Health Rep. 1971; 86:347-54.
35. Chen MK. The gross national health product: a proposed population health index. Public Health Rep. 1979; 94:119-23.
36. Hochstim JR. A critical comparison of three strategies of collecting data from households. Journal of the American Statistical Association. 1967; 62:976-89.
37. Patrick DL, Bush JW, Chen MM. Toward an operational definition of health. J Health Soc Behav. 1973; 14:6-21.
38. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care. 1981; 19:787-805.
39. Chambers LW. The McMaster Health Index Questionnaire: an update. In: Walker SR, Rosser RM, ed. Quality of Life: Assessment and Application. Lancaster, England: MTP Pr; 1988:113-31.
40. Ware JE, Karmos AH. Development and Validation of Scales to Measure Perceived Health and Patient Role Propensity: Carbondale, IL: Southern lllinois Univ; 1976.
41. Hunt SM, McEwen J, McKenna SP. Measuring Health Status. Dover, NH: Croom Helm; 1986.
42. Ott CR, Sivarajan ES, Newton KM, Almes MJ, Bruce RA, Bergner M, et al. A controlled randomized study of early cardiac rehabilitation: the Sickness Impact Profile as an assessment tool. Heart Lung. 1983; 12:162-70.
43. Toevs CD, Kaplan RM, Atkins CJ. The costs and effects of behavioral programs in chronic obstructive pulmonary disease. Med Care. 1984; 22:1088-100.
44. Bombardier C, Ware J, Russell IJ, Larson M, Chalmers A, Read JL. Auranofin therapy and quality of life in patients with rheumatoid arthritis. Results of a multicenter trial. Am J Med. 1986; 81:565-78.
45. McCusker J, Stoddard AM. Use of a surrogate for the Sickness Impact Profile. Med Care. 1984; 22:789-95.
46. Deyo RA. Pitfalls in measuring the health status of Mexican Americans: comparative validity of the English and Spanish Sickness Impact Profile. Am J Public Health. 1984; 74:569-73.
47. Patrick DL, Sittampalam Y, Somerville SM, Carter WB, Bergner M. A cross-cultural comparison of health state values. Am J Public Health. 1985; 75:1402-7.
48. Balaban DJ, Sagi PC, Goldfarb NI, Nettler S. Weights for scoring the quality of well-being instrument among rheumatoid arthritics. A comparison to general population weights. Med Care. 1986; 24:973-80.
49. Jachuck SJ, Brierly H, Jachuck S, Willcox PM. The effect of hypotensive drugs on the quality of life. J R Coll Gen Pract. 1982; 32:103-5.
50. Nelson E, Conger B, Douglass R, Gephart D, Kirk J, Page R, et al. Functional health status levels of primary care patients. JAMA. 1983; 249:3331-8.
51. Rubenstein LZ, Schairer C, Wieland GD, Kane R. Systematic biases in functional status assessment of elderly adults: effects of different data sources. J Gerontol. 1984; 39:686-91.
52. Engel GL. The need for a new medical model: a challenge for biomedicine. Science. 1976; 196:129-36.
53. Cluff LE. Chronic disease, function and the quality of care. J Chronic Dis. 1981; 34:299-304.
54. McHorney CA, Ware JE Jr, Rogers WR, Raczek A, Lu JF. The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts. Results from the Medical Outcomes Study. Med Care. 1992; 30(5 Suppl):MS253-65.
55. McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995; 4:293-306.
56. Stewart AL, Greenfield S, Hays RD, Wells K, Rogers WH, Berry SD, et al. Functional status and well-being of patients with chronic conditions. Results from the Medical Outcomes Study. JAMA. 1989; 262:907-13.
57. Kessler RC, Mroczek DK. Measuring the effects of medical interventions. Med Care. 1995; 33(4 Suppl):AS109-19.
58. Chen AL, Broadhead WE, Doe EA, Broyles WK. Patient acceptance of two health status measures: the Medical Outcomes Study Short-form General Health Survey and the Duke Health Profile. Fam Med. 1993; 25:536-9.
59. Beaton DE, Bombardier C, Hogg-Johnson SA. Measuring health in injured workers: a cross-sectional comparison of five generic health status instruments in workers with musculoskeletal injuries. Am J Ind Med. 1996; 29:618-31.
60. McHorney CA, Bricker DE, Wilson M, Martin J, Bukstein D, Thies S, et al. Barriers to practice-based functional health assessment: patient and physician perspectives. Scientific poster presented at the 14th Annual Meeting of the Association for Health Services Research, June 1997.
61. Essink-Bot ML, Krabbe PJ, van Agt HM, Bonsel GJ. NHP or SIP-A comparative study in renal insufficiency associated anemia. Qual Life Res. 1996; 5:91-100.
62. Brazier J, Jones N, Kind P. Testing the validity of the Euroqol and comparing it with the SF-36 health survey questionnaire. Qual Life Res. 1993; 2:169-80.
63. McHorney CA, Kosinski M, Ware JE Jr. Comparisons of the costs and quality of norms for the SF-36 health survey collected by mail versus telephone interview: results from a national survey. Med Care. 1994; 32:551-67.
64. Perneger TV, Leplege A, Etter JF, Rougemont A. Validation of a French-language version of the MOS 36-Item Short Form Health Survey (SF-36) in young healthy adults. J Clin Epidemiol. 1995; 48:1051-60.
65. Weiss DJ. Improving measurement quality and efficiency with adaptive testing. Applied Psychological Testing. 1982; 6:473-92.
66. Weiss DJ. Adaptive testing by computer. J Consult Clin Psychol. 1985; 53:774-89.
67. Fields FA. Computerized adaptive testing for NCLEX-PN. J Pract Nurs. 1992; 42:8-10.
68. Bergstrom BA. Computerized adaptive testing for the national certification examination. Journal of the American Association of Nurse Anesthetists. 1996; 64:119-24.
69. Shea JA, Norcini JJ, Webster GD. An application of item response theory to certifying examinations in internal medicine. Eval Health Prof. 1988; 11:283-305.
70. Kelley PR, Schumacher CF. The Rasch model: its use by the National Board of Medical Examiners. Eval Health Prof. 1984; 7:443-54.
71. Kramer GA, DeMarais DR. Setting a standard on the pilot national board dental examination. Journal of Dental Education. 1992; 56:684-8.
72. Norcini J. Computers in physician licensure and certification: new methods of assessment. Journal of Educational Computing Research. 1994; 10:161-71.
73. Lunz ME, Deville CW. Validity of item selection: a comparison of automated computerized adaptive testing and manual paper and pencil examinations. Teaching and Learning in Medicine. 1996; 8:152-7.
74. Hambleton RK, Swaminathan H. Item Response Theory: Principles and Applications. Boston: Kluwer Nijoff; 1985.
75. Flaugher R. Item pools. In: Wainer H, ed. Computerized Adaptive Testing. Hillsdale, NJ: Lawrence Erlbaum Associates; 1990:41-63.
76. Jobe JB, Mingay DJ. Cognitive research improves questionnaires. Am J Public Health. 1989; 79:1053-5.
77. Jobe JB, Mingay DJ. Cognitive laboratory approach to designing questionnaires for surveys of the elderly. Public Health Rep. 1990; 105:518-24.
78. Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of Item Response Theory. Newbury Park, CA: Sage; 1991.
79. Ludlow LH, Haley SM, Gans BM. A hierarchial model of functional performance in rehabilitation medicine. Eval Health Prof. 1992; 15:59-74.
80. Silverstein B, Fisher WP, Kilgore KM, Harley JP, Harvey RF. Applying psychometric criteria to functional assessment in medical rehabilitation: II. Defining interval measures. Arch Phys Med Rehabil. 1992; 73:507-18.
81. Fisher AG. The assessment of IADL motor skills: an application of many-faceted Rasch analysis. Am J Occup Ther. 1993; 47:319-29.
82. Schaffer NC. An application of item response theory to the measurement of depression. In: Clogg C, ed. Sociological Methodology. San Francisco: Jossey-Bass; 1988:271-307.
83. de Bonis M, Lebeaux MO, de Boeck P, Simon M, Pichot P. Measuring the severity of depression through a self report inventory. A comparison of logistic, factorial and implicit models. J Affect Disord. 1991; 22:55-64.
84. Gilbert FS. Development of a "Steps Questionnaire." J Stud Alcohol. 1991; 52:353-60.
85. McArthur DL, Cohen MJ, Schandler SL. Rasch analysis of functional assessment scales: an example using pain behaviors. Arch Phys Med Rehabil. 1991; 72:296-304.
86. McCown W, Johnson J. The Basic HIV Disease Knowledge Questionnaire: a Rasch-scaled instrument to measure essential HIV knowledge. Psychol Rep. 1991; 69:543-9.
87. Rosier MJ, Bishop J, Nolan T, Robertson CF, Carlin JB, Phelan PD. Measurement of functional severity of asthma in children. Am J Respir Crit Care Med. 1994; 149:1434-41.
88. Engberg A, Garde B, Kreiner S. Rasch analysis in the development of a rating scale for assessment of mobility after stroke. Acta Neurol Scand. 1995; 91:118-27.
89. Kopec JA, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping DL, et al. The Quebec Back Pain Disability Scale: conceptualization and development. J Clin Epidemiol. 1996; 49:151-61.
90. Haley SM, McHorney CA, Ware JE Jr. Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. Unidimensionality and reproducibility of the Rasch item scale. J Clin Epidemiol. 1994; 47:671-84.
91. Teresi JA, Golden RR, Cross P, Gurland B, Kleinman M, Wilder D. Item bias in cognitive screening measures: comparisons of elderly white, Afro-American, Hispanic and high and low education subgroups. J Clin Epidemiol. 1995; 48:473-83.
92. Cella DF, Dineen K, Arnason B, Reder A, Webster KA, Karabatsos G, et al. Validation of the functional assessment of multiple sclerosis quality of life instrument. Neurology. 1996; 47:129-39.
93. Tennant A, Hillman M, Fear J, Pickering A, Chamberlain MA. Are we making the most of the Stanford Health Assessment Questionnaire? Br J Rheumatol. 1996; 35:574-8.
94. McHorney CA, Haley SM, Ware JE Jr. Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II. Comparison of relative precision using Likert and Rasch scoring methods. J Clin Epidemiol. 1997; 50:451-61.
95. Cleary PD. Future directions of quality of life research. In: Spilker B, ed. Quality of Life and Pharmacoeconomics in Clinical Trials. 2d ed. Philadelphia: Lippincott-Raven; 1996:73-8.
96. Heberlein TA, Baumgartner R. Factors affecting response rates to mailed questionnaires: a quantitative analysis of the published literature. American Sociological Review. 1978; 43:447-61.
97. Dillman DA, Sinclair, Clark JR. Effects of questionnaire length, respondent-friendly design, and a difficult question on response rates for occupant-addressed census mail surveys. Public Opinion Quarterly. 1993; 57:289-304.
98. Rubenstein LV, Calkins DR, Young RT, Cleary PD, Fink A, Kosecoff J, et al. Improving patient function: a randomized trial of functional disability screening. Ann Intern Med. 1989; 111:836-42.
99. Kazis LE, Callahan LF, Meenan RF, Pincus T. Health status reports in the care of patients with rheumatoid arthritis. J Clin Epidemiol. 1990; 43:1243-53.
100. Calkins DR, Rubenstein LV, Cleary PD, Davies AR, Jette AM, Fink A, et al. Functional disability screening of ambulatory patients: a randomized controlled trial in a hospital-based group practice. J Gen Intern Med. 1994; 9:590-2.
101. Rubenstein LV, McCoy JM, Cope DW, Barrett PA, Hirsch SH, Messer KS, et al. Improving patient quality of life with feedback to physicians about functional status. J Gen Intern Med. 1995; 10:607-14.
102. Bice TW. Comments on health indicators: methodological perspectives. Int J Health Serv. 1976; 6:509-20.
This article has been cited by other articles:
![]() |
A. M Jette, S. M Haley, W. Tao, P. Ni, R. Moed, D. Meyers, and M. Zurek Prospective Evaluation of the AM-PAC-CAT in Outpatient Rehabilitation Settings Physical Therapy, April 1, 2007; 87(4): 385 - 398. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. R. Vetter A Primer on Health-Related Quality of Life in Chronic Pain Medicine Anesth. Analg., March 1, 2007; 104(3): 703 - 718. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L. Dassow Measuring Performance in Primary Care: What Patient Outcome Indicators Do Physicians Value? J Am Board Fam Med, January 1, 2007; 20(1): 1 - 8. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Dunn, K. Resnicow, and L. M. Klesges Improving measurement methods for behavior change interventions: opportunities for innovation Health Educ. Res., December 1, 2006; 21(suppl_1): i121 - i124. [Full Text] [PDF] |
||||
![]() |
I-P. Hsueh, W.-C. Wang, C.-H. Wang, C.-F. Sheu, S.-K. Lo, J.-H. Lin, and C.-L. Hsieh A Simplified Stroke Rehabilitation Assessment of Movement Instrument Physical Therapy, July 1, 2006; 86(7): 936 - 943. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Quittner, A. Buu, M. A. Messer, A. C. Modi, and M. Watrous Development and Validation of the Cystic Fibrosis Questionnaire in the United States: A Health-Related Quality-of-Life Measure for Cystic Fibrosis Chest, October 1, 2005; 128(4): 2347 - 2354. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. McHorney Ten Recommendations for Advancing Patient-Centered Outcomes Measurement for Older Persons Ann Intern Med, September 2, 2003; 139(5_Part_2): 403 - 409. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Pesudovs, E. Garamendi, J. P. Keeves, and D. B. Elliott The Activities of Daily Vision Scale for Cataract Surgery Outcomes: Re-evaluating Validity with Rasch Analysis Invest. Ophthalmol. Vis. Sci., July 1, 2003; 44(7): 2892 - 2899. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Lindeboom, M. Vermeulen, R. Holman, and R. J. De Haan Activities of daily living instruments: Optimizing scales for neurologic assessments Neurology, March 11, 2003; 60(5): 738 - 742. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Rumsfeld Health Status and Clinical Practice: When Will They Meet? Circulation, July 2, 2002; 106(1): 5 - 7. [Full Text] [PDF] |
||||
![]() |
Y. G. Doyle, A. D. Tsouros, P. C. Cryer, S. Hedley, and C. Russell-Hodgson Practical lessons in using indicators of determinants of health across 47 European cities Health Promot. Int., December 1, 1999; 14(4): 289 - 299. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Wolfe and S. X Kong Rasch analysis of the Western Ontario MacMaster Questionnaire (WOMAC) in 2205 patients with osteoarthritis, rheumatoid arthritis, and fibromyalgia Ann Rheum Dis, September 1, 1999; 58(9): 563 - 568. [Abstract] [Full Text] |
||||
![]() |
J. S. Rumsfeld, S. MaWhinney, M. McCarthy Jr, A. L. W. Shroyer, C. B. VillaNueva, M. O'Brien, T. E. Moritz, W. G. Henderson, F. L. Grover, G. K. Sethi, et al. Health-Related Quality of Life as a Predictor of Mortality Following Coronary Artery Bypass Graft Surgery JAMA, April 14, 1999; 281(14): 1298 - 1303. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||