Reliability and construct validity of the Chinese (Hong Kong) SF-36 for patients
in primary care
C L K Lam 林露娟
HK Pract 2003;25:468-475
Summary
Objective: To assess the internal and test-retest reliability, and
construct validity of the Chinese (Hong Kong) SF-36 for patients in primary care.
Design: Cross-sectional questionnaire face-to-face interviews and
retest by telephone interviews.
Subjects: 500 Chinese patients aged 18 or above attending a government
general outpatient primary care clinic in Hong Kong.
Main outcome measures: Internal reliability was measured by Cronbach's
alpha. Test-retest reliability was measured by the difference between test-retest
scores and intraclass correlation. Construct validity was assessed by the correlations
between the Chinese (Hong Kong) SF-36 scores and the Chinese COOP/WONCA Chart scores,
and the correlation between the Chinese (Hong Kong) SF-36 scores and the total number
of chronic diseases.
Results: Internal reliability coefficients of all the Chinese (Hong
Kong) SF-36 scales exceeded 0.7; there was no clinically important difference between
test-retest scores of the Chinese (Hong Kong) SF-36. The expected correlations were
observed between the Chinese (Hong Kong) SF-36 scores and the COOP/WONCA Chart scores.
There was a negative correlation between the total number of chronic diseases and
the scores of five scales of the Chinese (Hong Kong) SF-36.
Conclusion: The Chinese (Hong Kong) SF-36 was reliable for group
comparison and had good convergent and divergent construct validity for patients
in primary care.
Keywords: Quality of life, the SF-36, COOP/WONCA Charts, Chinese,
Validity, Reliability.
摘要
目的: 評估SF-36中文(香港)譯本的內部及重複測試的可靠性和有效性。
設計: 橫向性面對面訪問的問卷調查和電話訪問的重複測試。
研究對象: 500位18歲或以上,曾使用某一所政府普通科診所的中國籍病人。
主要測量內容: 內部可靠性是以 Cronbach's alpha 來量度, 重複測試的可靠性是以重複測試評分的差異和等級內部的相互關係來量度。
有效性則以中文(香港)譯本SF-36評分與中文譯COOP/WONCA評分之間的相互關係, 以及中文(香港)譯本SF-36評分與慢性疾病的總數的相互關係來作評估。
結果: 中文(香港)譯本SF-36的所有量具的內部可靠性係數皆超過0.7,它的重複測試評分並無重要的臨床上差異。中文(香港)譯本
SF-36評分和COOP/WONCA評分達到預期的相互關係。慢性疾病的總數與中文(香港)譯本SF-36的五個範疇的評分形成負的相互關係。
結論: 對基層醫療服務,中文(香港)譯本SF-36用於組別比較是可靠的,並有良好的有效性。
詞彙: 生活質素,SF-36,COOP/WONCA表,中國人,有效性,可靠性。
Introduction
The Chinese (Hong Kong) SF-36 is a Chinese translation of the MOS 36-item Short-form
Health Survey (SF-36) adapted to the Chinese population in Hong Kong. The SF-36
is a generic measure of health-related quality of life (HRQOL). It has eight scales:
the physical functioning (PF), role-physical (RP), i.e. limitation of daily roles
due to physical problems, bodily pain (BP), general health (GH), vitality (VT),
social functioning (SF), role-emotional (RE), i.e. limitation of daily roles due
to emotional problems, and mental health (MH). Each scale has a range of 100 with
higher scores indicating better HRQOL.1 It has been shown to be acceptable
and relevant to the Chinese in Hong Kong in an earlier study,2 and a
norm reference has been established for the general adult population in Hong Kong.3
The aim of this study was to assess the reliability and validity of the Chinese
(Hong Kong) SF-36 for patients in primary care. Reliability is defined as the degree
to which an instrument is free from errors.4,5 The level of reliability
determines the highest degree of validity possible but it does not automatically
imply validity. Validity means that the instrument really measures what it purports
to measure.4,6 These are two most important properties of a measuring
tool, which must be confirmed before the instrument can be applied to the relevant
population.
There are two types of reliability. The first is scale internal reliability, which
is based on the theory that the result is likely to be an accurate representation
of the actual state if the results measured by different items are consistent.4,6
The other type of reliability is whether the measure gives reproducible results
on repeated measurements of the same condition.4,6 Earlier studies showed
that the internal reliability of the social functioning and general health scales
of the Chinese (Hong Kong) SF-36 were just short of the generally expected standard
of 0.7.2,3 This study would further assess the internal reliability and
determine the test-retest reliability of the Chinese (Hong Kong) SF-36 for Chinese
patients in primary care.
Ideally, the validity of a measure should be compared to a gold standard (criterion),
but this is not available for HRQOL measurement. In the absence of a gold standard,
the best that one can test is construct validity.4,7,8 A construct is
an abstract variable that is constructed to reflect a hypothesis on how measurable
variables will correlate with one another.6,9 There are three steps in
the testing of construct validity: The first starts with the construction of the
domain of variables; the second is the establishment of the internal structure of
observed variables, and the third is the verification of the hypothesised correlation
between the theoretical construct and other external criteria.4,7,10,11
The construct of the SF-36 is the eight domains of HRQOL measured by eight scales.
The construct validity of the internal structure of the observed variables of the
scales has been confirmed in an earlier study.2 This study would try
to verify the correlations between the SF-36 scale scores and external criteria.
Methods and subjects
Chinese patients aged 18 or above attending a government general outpatient clinic
in Hong Kong were randomly selected by a pre-determined random number table matching
the appointment number of the patient for the particular clinic session. Each eligible
patient was invited to be interviewed face-to-face by a trained interviewer with
the Chinese (Hong Kong) SF-36, the Chinese COOP/WONCA Charts, and a structured questionnaire
on sociodemography and the presence of chronic diseases. A copy of the Chinese SF-36,
the COOP/WONCA questions without the illustrations and the questionnaire can be
obtained from the author upon written request.
The COOP/WONCA Charts is a HRQOL measure that assesses six domains (physical fitness,
feelings, daily activities, social activities, change in health and overall health)
with six single-item charts.12,13 Each chart is rated on a five-point
scale with higher scores indicating worse HRQOL. It has been translated, and shown
to be valid, reliable and sensitive on Chinese patients in primary care.14,16
Chronic morbidity was measured by the total number and diagnosis of self-reported
chronic diseases. Each subject was asked if he/she had ever been diagnosed for more
than one month by a registered medical practitioner to have hypertension, diabetes
mellitus, heart disease of any kind, stroke, chronic pulmonary disease (asthma or
other chronic respiratory problems), chronic joint problem, psychological illness
or any other chronic disease. The total number of chronic diseases was calculated
by the summation of the number of positive responses to these questions.
Five hundred and three eligible patients were sampled but three patients refused
to be interviewed. Five hundred (99.4%) subjects completed the initial interview.
Subjects were interviewed by telephone with the Chinese (Hong Kong) SF-36 again
within one week from the first interview to assess test-retest reliability. Three
hundred and sixty-two (72.4%) of those who had the first interview completed the
second one. The characteristics of all the subjects and those who completed both
the first and second interviews are shown in Table 1. There was
no significant difference between the two samples.
Data analysis and hypotheses
The responses to the Chinese (Hong Kong) SF-36 were re-coded and the scale scores
were calculated by the standard algorithm described in the SF-36 Manual.17
The distribution by the proportion of subjects of the scores of the Chinese COOP/WONCA
Charts was determined.
The internal reliability of the Chinese (Hong Kong) SF-36 was measured by Cronbach's
alpha and 0.7 or above was used as the standard for group comparison.5,18,19
Test-retest reliability was assessed by the difference between the test-retest scores,
the statistical significance of which was analysed by the paired samples t tests.
It was hypothesised that the difference should not be statistically significant,
and 95% of the test-retest differences should be within 2 standard deviations (SD)
of the mean differences if the measure was reproducible.4,20 Test-retest
reliability was further assessed by intraclass correlation (ICC), which measures
the average similarity of subjects' actual scores on the two ratings, and 0.7 or
above is the desirable standard for group evaluation.4,5
The Chinese COOP/WONCA Chart scores and the total number of chronic diseases were
used as external criteria for testing the construct validity of the Chinese (Hong
Kong) SF-36.
It was hypothesised that there should be significant correlations (convergent validity)
between related domains of the Chinese (Hong Kong) SF-36 and the Chinese COOP/WONCA
Charts: the SF-36 physical functioning (PF) score should correlate with the COOP/WONCA
physical fitness score; the SF-36 role-physical and role-emotional (RP and RE) scores
should correlate with the COOP/WONCA daily activities score; the SF-36 social functioning
(SF) score should correlate with the COOP/WONCA social activities score, the SF-36
general health (GH) score should correlate with the COOP/WONCA overall health score,
and the SF-36 mental health (MH) score should correlate with the COOP/WONCA feelings
score. A review by McDowell et al showed that 0.4 was generally accepted
as the minimal standard for convergent validity.4,21 On the other hand,
there should not be any significant correlation (divergent validity) between scores
of unrelated domains: the SF-36 PF score should not be related to the COOP/WONCA
feelings score, and the SF-36 MH score should not be related to the COOP/WONCA physical
fitness score. The correlations between the Chinese (Hong Kong) SF-36 and Chinese
COOP/WONCA Chart scores were measured by Spearman's rho correlations.
It was hypothesised that patients with chronic diseases should have worse HRQOL
than those without any chronic disease. Two sample t tests were used to test the
statistical significance of the SF-36 scores between the two groups. Furthermore,
there should be negative correlations between the Chinese (Hong Kong) SF-36 scores
and the total number of chronic diseases, which was measured by Pearson correlations.
All data analyses were carried out with the SPSS Programme for Windows 11.0 (SPSS
Inc, 2002).
Results
One of 500 subjects did not answer question 11d (item GH5) of the Chinese (Hong
Kong) SF-36. There was no missing or out of range data from the Chinese COOP/WONCA
Charts. The distribution of the Chinese COOP/WONCA Chart scores and the mean Chinese
(Hong Kong) SF-36 scores of the sample are shown in Table 2. Scores
of the COOP/WONCA Charts are presented in proportions because they are categorical.
Reliability of the Chinese (Hong Kong) SF-36
The internal reliability (Cronbach's alpha) and test-retest reliability of the eight
SF-36 scales are shown in Table 3. The internal reliability was
above the standard of 0.7 for all scales including the social functioning scale.
The differences between the test and retest scores were all less than five points;
a statistically significant difference was found in only the bodily pain (BP) and
social functioning (SF) scales. The proportions of differences that were within
2 SD of the mean difference were near 95% for all but the RP scale. Intraclass correlations
were above 0.7 for six scales, it was just short of the standard for the role-emotional
(RE) scale but it was below 0.5 for the social functioning (SF) scale.
Construct validity of the Chinese (Hong Kong) SF-36
Table 4 shows the Spearman's correlations between the Chinese (Hong
Kong) SF-36 scores and the Chinese COOP/WONCA Chart scores that were statistically
significant (p<0.05), and correlations that were >0.4 are shown in bold. The
expected direction of correlations between scores of related domains was negative
because higher SF-36 scores indicate better HRQOL but higher COOP/WONCA Chart scores
represent poorer HRQOL. There was a strong correlation (>0.4) between the physical
functioning (PF) and physical fitness scores, the general health (GH) and overall
health scores, and the vitality (VT) and overall health scores, the role-physical
(RP) and daily activities scores, the social functioning (SF) and social activities
scores, and the mental health (MH) and feelings scores. The role-emotional (RE)
score correlated weakly (r = -0.13 to - 0.27) with all COOP/WONCA Chart scores except
the physical fitness scores. There was no significant correlation between the PF
and feelings scores or between the MH and physical fitness scores, supporting divergent
validity.
Table 5 compares the SF-36 scores of patients with and without
any chronic disease. The Hong Kong Chinese adult population mean and standard deviation
(S.D.) are also shown for comparison.3 The SF-36 scores of the physical
health related domains of subjects were generally lower than those of the general
population norm, but their mental health related domain scores were higher. The
scores of patients with chronic diseases were lower than those of patients without
any chronic disease in five scales, and the differences were statistically significant
for physical functioning (PF), bodily pain (BP) and general health (GH). The social
functioning score of patients with chronic diseases was significantly higher than
those of patients without any chronic disease. There was a negative correlation
between the total number of chronic diseases and the physical functioning (PF),
bodily pain (BP), general health (GH), vitality (VT) and mental health (MH) scores,
but a significant positive correlation between the total number of chronic diseases
and the social functioning score (Table 6).
Discussion
The Cronbach's alphas on internal reliability of the SF-36 scales were all above
0.7, and those of the role-physical, bodily pain and social functioning scales exceeded
the standard of 0.9 for individual assessment. The results were better than those
found in an earlier study on patients in primary care,2 probably because
the sample of this study was larger and there was more variation between subjects.4
The Chinese (Hong Kong) SF-36 scores were generally reproducible on repeated measurements.
The direction of the change in the scores was not consistent among the different
scales, suggesting that there was no systematic bias and the variations were mostly
random. Although the difference in the BP and SF scores were statistically significant,
the effect size differences (mean score difference/S.D. of the first interview)
were less than 0.3, which is generally not considered to be clinically important.22-24
The intraclass correlation (ICC) of the role-emotional (RE) scale was just short
of 0.7 and many experts agree that 0.5 may be adequate for group assessments.4,5
However, the low ICC in the social functioning scale deserves further evaluation.
The repeat interview was carried out by telephone interview, which gave similar
results to those obtained by face-to-face interview, suggesting that the data collected
by these two methods can be pooled together. Lam et al also showed that
telephone interviews gave similar results on health service utilisation as those
found in the face-to-face household survey.25 These findings are important
because telephone interview is becoming a popular survey method in Hong Kong and
it is often used in combination with face-to-face interviews in the same study.
There may be a concern that subjects could remember their answers of the first interview
when the interview was repeated within one week, leading to falsely high test-retest
reliability. This was unlikely with the large number of questions that each subject
had to answer. Subjects' conditions may have changed if the test-retest interval
is too long, resulting in a falsely low reliability for a responsive measure. Most
experts recommend an interval of one to two weeks between interviews for assessing
test-retest reliability.4,26
Construct validity of the Chinese (Hong Kong) SF-36
The hypothesised correlations between the Chinese (Hong Kong) SF-36 scores and the
Chinese COOP/WONCA Chart scores were generally observed, confirming convergent and
divergent construct validity. As other studies have found, the COOP/WONCA daily
activities score correlated strongly with the role-physical (RP) score but only
moderately with the role-emotional (RE) score.27,28 The RE score correlated
significantly with the COOP/WONCA feelings score but not the physical fitness score,
supporting the construct validity of this scale in measuring role limitations related
to emotional rather than physical problems. The results support the construct of
RE and RP as two separate scales. The combination of these two SF-36 scales into
one single role functioning scale, as proposed by Fukuhara et al,29,30
may miss the limitations caused by emotional problems.
The hypothesised negative correlations between the Chinese (Hong Kong) SF-36 scores
and the total number of chronic diseases were found in only five of the eight scales.
It was unexpected that the role-physical, role-emotional and social functioning
scores of patients with chronic diseases were higher than those of patients without
any chronic disease. One possible explanation was that subjects without any chronic
disease consulting the clinic were likely to have acute illnesses that had interfered
with their daily or social activities.
Conclusion
The Chinese (Hong Kong) SF-36 has been shown to have good internal and test-retest
reliability among Chinese patients in primary care. There was little difference
between the results obtained by face-to-face and telephone interviews suggesting
that data obtained by these two methods can be pooled together for analysis. The
construct validity of the Chinese (Hong Kong) SF-36 was confirmed by significant
correlations (convergent validity) with related domain scores and insignificant
correlation (divergent validity) with unrelated domain scores of the Chinese COOP/WONCA
Charts. There was a negative correlation between the total number of chronic diseases
and several scales of the Chinese (Hong Kong) SF-36, further supporting its construct
validity.
The Chinese (Hong Kong) SF-36 can be used to assess HRQOL of patients in primary
care in Hong Kong reliably and validly. The inclusion of HRQOL as an outcome measure
of the impact of illnesses and the effects of treatments can make health care more
patient-centred.
Acknowledgement
Parts of this paper have been submitted to the University of Hong Kong for the award
of the Doctor of Medicine degree.
Key messages
- The Chinese (Hong Kong) SF-36 is a health-related quality of life (HRQOL) measure.
- It has been shown to have good internal reliability and adequate test-retest reliability
for group assessment among Chinese patients in primary care.
- There was no clinically important difference between results obtained by face-to-face
and telephone interviews.
- It has been shown to have construct validity for Chinese patients in primary care.
- The Chinese (Hong Kong) SF-36 can be used to assess patient perceived effect of
an illness or treatment.
C L K Lam, MBBS, MICGP, FRCGP, FHKAM(Family Medicine)
Associate Professor,
Family Medicine Unit, The University of Hong Kong.
Correspondence to : Dr C L K Lam, 3/F, Ap Lei Chau Clinic, 161 Main Street,
Ap Lei Chau, Hong Kong.
References
- Ware JE, Snow KK, Kosinski M, et al. SF-36 Health Survey Manual & Interpretation
Guide. Boston: The Health Institute, New England Medical Center; 1993.
- Lam CLK, Gandek B, Ren XS, et al. Tests of scaling assumptions and construct validity
of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol 1998;51:1139-1147.
- Lam CLK, Lauder IJ, Lam TP, et al. Population based norming of the Chinese (HK)
version of the SF-36 Health Survey. HK Pract 1999;21:460-470.
- McDowell I, Newell C. The theoretical and technical foundations of health measurement.
In: McDowell I, Newell C (Ed). Measuring Health - A Guide to Rating Scales and Questionnaire,
New York: Oxford University Press; 1996;10-46.
- Nunnally JC, Bernstein RH. The Assessment of reliability. In: Nunnally JC, Bernstein
RH (Ed). Psychometric Theory. New York: McGraw-Hill, Inc; 1994;248-292.
- Nunnally JC, Bernstein RH. Validity. In: Nunnally JC, Bernstein RH (Ed). Psychometric
Theory. New York: McGraw Hill, Inc.; 1994;83-113.
- Guyatt GH, Jaeschke R, Feeny DH, et al. Measurements in Clinical Trials: Choosing
the Right Approach. In: Spilker B (Ed). Quality of Life and Pharmacoeconomics in
Clinical Trials. Philadelphia: Lippincott-Raven Publishers, 1996;41-48.
- Muldoon MF, Barger SD, Flory JD, et al. What are quality of life measurements measuring?
BMJ 1998;316:542-545.
- Ware JE, Keller SD. Interpreting General Health Measures. In: Spilker B (Ed). Quality
of Life and Pharmacoeconomics in Clinical Trials. Philadelphia: Lippincott-Raven
Publishers, 1996;445-460.
- McHorney CA, Ware JE, Raczek AE. The MOS 36-Item Short Form Health Survey (SF-36),
II: Psychometric and clinical tests of validity in measuring physical and mental
health constructs. Med Care 1993;31:247-263.
- Gandek B, Ware JE. Methods for validating and norming translations of health status
questionnaires: the IQOLA Project approach. J Clin Epidemiol 1998;51:953-959.
- Scholten JHG, van Weel C. Functional status assessment in family practice: the Darmouth
COOP functional health Assessment Charts/WONCA. Lelystad: Meditekst; 1992.
- van Weel C, Kong-Zahn C, Touw-Otten FWMM, et al. Measuring Functional Status with
the COOP/WONCA Charts: A Manual. Groningen, The Netherlands: Northern Centre for
Health Care Research (NCH); 1995.
- Lam CLK, van Weel C, Lauder IJ. Can the Dartmouth COOP/WONCA Charts be used to assess
the functional status of Chinese patients? Family Practice 1994;11:85-94.
- Lam CLK, Lauder IJ. The impact of chronic diseases on the health-related quality
of life (HRQOL) of Chinese patients in primary care. Family Practice 2000;17:159-166.
- Lam CLK, Lauder IJ, Lam DTP. How does a change in the administration method affect
the reliability of the COOP/WONCA Charts? Family Practice 1999;16:184-189.
- Ware JE, Snow KK, Kosinski M, et al. Scoring the SF-36. In: Ware JE, et al. (Ed).
SF-36 Health Survey Manual & Interpretation Guide. Boston: The Health Institute,
New England Medical Center, 1993;6:1-6:22.
- Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika
1951;16:297-334.
- Bland JM, Altman DG. Statistics notes - Cronbach's alpha. BMJ 1997;314:572.
- Bland JM, Altman DG. Statistical methods for assessing agreement between two methods
of clinical measurement. Lancet 1986;I:307-310.
- Bullinger M, Anderson R, Cella D, et al. Developing and evaluating cross-cultural
instruments from minimum requirements to optimal models. Quality of Life Research
1993;2:451-459.
- Norman GR, Sridhar FG, Walter SD, et al. The relation of distribution- and anchor-based
approaches in interpretation of changes in health related quality of life. Med Care
2001;39:1039-1047.
- Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health
status. Med Care 1989;27:S178-S189.
- Cohen J. The t test for measures. In: Cohen J (Ed). Statistical Power Analysis for
the Behavioural Sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1988;19-74.
- Lam TH, Kleevans WL, Wong CM. Doctor-consultation in Hong Kong: a comparison between
findings of a telephone interview with general household survey. Community Medicine
1988;10:175-179.
- Deyo RA, Diehr PD, Patrick DL. Reproducibility and responsiveness of health status
measures - Statistics and strategies for evaluation. Controlled Clinical Trials
1991;12:142s-158s.
- van Weel C, Kong-Zahn C, Touw-Otten FWMM, et al. Validity. In: van Weel C et al.
(Ed), Measuring Functional Health Status with the COOP/WONCA Charts: A Manual. Groningen,
the Netherlands: Northern Centre of Health Care Research, 1995;12-15.
- Siu AL, Ouslander JG, Osterweil D, et al. Change in self-reported functioning in
older persons entering a residential care facility. J Clin Epidem 1993;46:1093-1101.
- Fukuhara S, Ware JE, Kosinski M, et al. Psychometric and clinical tests of validity
of the Japanese SF-36 Health Survey. J Clin Epidemiol 1998;51:1045-1053.
- Fukuhara S, Bito S, Green J, et al. Translation, adaptation, and validation of the
SF-36 Health Survey for use in Japan. J Clin Epidemiol 1998;51:1037-1044.
|