The Hong Kong Practitioner

December 2008, Vol 30, No. 4

Original Articles

Psychometrics and population norm of the Chinese (HK) SF-36 Health Survey_Version 2

Elegance T P Lam 林定珮, Cindy L K Lam 林露娟, Yvonne Y C Lo 盧宛聰, Barbara Gandek

HK Pract 2008;30:189-197

Summary

Objective: To establish the psychometric properties and norm of the Chinese (HK) SF-36_ version 2 Health Survey for the adult population in Hong Kong (HK) to facilitate its application and interpretation.
Design: A cross-sectional random telephone survey of the general adult population.
Subjects: 2410 Chinese adults randomly selected from the general Chinese adult population in Hong Kong. The mean age of the subjects was 42.9 (S.D. 17.3) years, 48% were men and 38% had one or more chronic disease.
Main outcome measures: Responses to the SF-36v2 Health Survey questions were extracted. Item-scale correlations, internal and test-retest reliabilities, and the factor structure of the SF-36v2 Health Survey scores were analysed. The SF-36v2 Health Survey scores were calculated by the standard algorithm to establish the population norm.
Results: All items had 100% scaling success indicating discriminant validity. Internal consistency and test-retest reliabilities of all scales were good (coefficients 0.66 to 0.89). The hypothesized two-factor structure underlying construction of the physical and mental health summary scales was confirmed. The psychometric properties of the SF-36v2 Health Survey were generally better than version 1. There were significant differences in the population norms between versions 1 and 2 of the Chinese (HK) SF-36 Health Survey, especially in the rolephysical and role-emotional scales.
Conclusion: The Chinese (HK) SF-36v2 Health Survey is valid and reliable for measuring HRQOL of Chinese adults in Hong Kong, and population norm is now available to support the interpretation of its scores.
Keywords: Quality of life, SF-36, Norm, Chinese, validity, reliability, psychometrics

摘要

目的：確定中國人(香港)SF-36第二版的健康調查的精神測定特性和標準適用於香港成年人口，以方便其應用和解釋。設計：橫切面的隨機抽樣電話調查，目標是普羅成年人。對象：從普羅的香港中國成年人口隨機抽樣抽出2410人。平均年齡為42.9(S.D.17.3)歲， 48% 為男性， 38% 有一樣和超個一樣的慢性病。
主要測量內容：SF-36v2健康調查結果全收集起來。項目/標度相關，內部和重複測試可靠性，和SF-36v2成份結構分析。然後用標準的方程式去計算SF-36v2健康調查的分樓來算出人口的特性和標準。
結果：所有項目均達到100% 定標成功，標示出辨別的可靠性。內部協合和重複測試可靠性都非常好( 系數為0 . 6 6 至0.89) 。雙份子結構的假設在其基本結構，生理和心理健康，標分受到確定。SF-36v2 的表現比SF-36v1 為佳。SF- 36v2 和SF-36v1 的人口特性有很明顯分別，尤其是在心理功用和生理功用標度方面。
結論：中文(香港)SF-36v2健康調查是可靠和可信的，可準確量度香港成年人口的生活質素。現在我們已找出其人口特性來支持其分數的解釋。
主要詞彙：生活質素， SF-36 ，特性，中文，可信性，可靠性，精神測定

Introduction

The SF-36 Health Survey developed by Ware et al. is the most widely used health-related quality of life (HRQOL) measure in Hong Kong (HK) and worldwide.^1-3 The survey includes 35 items measurement HRQOL that are summarized into eight multi-item scales, along with 1 item on health change. The first version of the SF-36 Health Survey (version 1) has been adapted and validated in more than 40 populations with norm references available from 14 populations including Hong Kong.^{3, 4} Several weaknesses of version 1 were identified: the layout of the questions was inconsistent, colloquial or double negative wordings were used, psychometric performance of the role-physical (RP) and role-emotional (RE) scales (questions 4 and 5, respectively) was suboptimal due to the use of dichotomous (yes/no) responses; and the differentiation between some of the six response options of question 9 may be difficult for some respondents.^5-7 Modification to the SF-36 Health Survey was carried out by the original authors to address these problems and version 2.0 (SF-36v2 Health Survey) was produced in 1996.⁴

The changes included reformatting the layout of the questions and answers to a consistent horizontal format, revision of some wordings of questions 3, 4, 5 and 9; replacement of the dichotomous response choices with a 5-point frequency scale for the items of questions 4 and 5; and deleting the response option ‘a good bit of the time’ from question 9 to change the 6-point to a 5-point scale. Most of the changes in the wordings had already been incorporated into the Chinese (HK) SF-36 Health Survey during the version 1 translation process, and the Chinese translations for all the statements and response options included in the SF-36v2 Health Survey could be extracted from version 1, so repeat translation was not necessary. The differences between the two versions of the Chinese (HK) SF-36 Health Survey are shown in the Appendix.

Studies in the United States, United Kingdom, Sweden and Australia have confirmed that SF-36v2 Health Survey was superior to version 1 in that it had fewer missing data, lower floor and ceiling effects, better score precision and a higher sensitivity for the role functioning scales.4, 8-11 Version 2 is expected to replace version 1 of the SF-36 in the near future. We need to establish the psychometric properties and norm of the SF-36v2 Health Survey before it can be applied and interpreted properly in our Hong Kong population.

In the population validation and norming study of the Chinese (HK) SF-36 Health Survey in Hong Kong the two role-functioning questions that were modified for version 2 were also included, providing an opportunity for the extraction of population data on the SF-36v2 Health Survey. The aim of this study was to determine the psychometric properties and population norm of the Chinese (HK) SF-36v2 Health Survey.

Methods

Sample

2410 Chinese adults randomly selected from the general Chinese adult population in HK were interviewed by telephone. The response rate of this survey was 84.4% (2410 out of 2857 sampled). The mean age of the subjects was 42.9 (range 18 to 88, S.D. 17.3) years old, 48% were men and 38% had one or more chronic disease. The sociodemographic characteristics of the subjects were comparable to those found in the population Census. The details of the sampling and survey methods are described in earlier papers.^{3, 12}

The first 240 subjects who agreed to a repeat survey were contacted 2 weeks later to answer the questions again to determine test-retest reliability and 200 (83%) completed the retest.

Survey instruments

The survey instruments consisted of the Chinese (HK) SF-36 Health Survey (version 1), followed by questions 4 (role-physical items) and 5 (role-emotional items) of the Chinese (HK) SF-36v2 Health Survey, and a s truc ture d qu estionnai re o n so cio demo grap hy, morbidity and service utilization. The instruments were administered by trained interviewers in Cantonese.

Data analysis

All data analyses were carried out with the SPSS for Windows 15.0 programme. Statistical significant levels were set at p values less than 0.05.

The Chinese (HK) SF-36v2 Health Survey data were extracted from the responses to questions 1, 3, 6, 7, 8, 9, 10 and 11 of version 1, which were the same as those of version 2, and the responses to the SF-36v2 Health Survey questions 4 and 5. The response value ‘a good bit of the time’ for question 9 items was recoded randomly to the adjoining values of ‘most of the time’ or ‘some of the time’ for the calculation of the SF-36v2 vitality and mental health scale scores. Item responses recoding and scale score calculations were carried out according to the standard methods described in the SF-36v2 Health Survey manual.¹⁰

The construct validity of the Chinese (HK) SF-36v2 Health Survey scales was tested by item-scale Pearson correlation to assess whether items:

were substantially correlated (r ³ 0.4) to the hypothesized scale score.
have similar item-scale correlations and equal variance (standard deviation) in the same scale to justify summation without weighting.
correlated significantly higher (greater than two standard errors) with their hypothesized scale than other scales. The percentage of this scaling success on item discriminant validity by the total number of item-scale correlations of each scale was calculated.

Factor analysis using the varimax rotation method was done on the scale scores to extract two principal components and test the hypothesized two-dimensional (physical and mental) of the SF-36v2 Health Survey. The two principal components should explain ³ 60% of the total variance of the SF-36v2 Health Survey scores, and ³ 70% of the reliable variance of each scale score, as found in the US and other populations.13 The pattern of correlations between the eight scales and two rotated components was examined to determine the basis for the components interpretation as physical and mental summary measures.

Internal consistency (reliability) of scale scores was measured by the Cronbach’s alpha coefficient and testretest reliability was assessed by intra-class correlation (ICC). The recommended standard is 0.7 or greater for group comparisons.

Descriptive statistics including mean, standard deviation (SD), ceiling and floor proportions of the scale scores of SF-36v2 Health Survey were calculated for the whole sample and by age-sex groups to be used as the population norm reference.

Results

Validity and reliability of the scales

Table 1 shows the item-scale Pearson correlations between each item and the scales. The correlation between each item and its hypothesized scale (after correction for overlap) was >0.4 except for PF10 (0.38) and GH3 (0.32), supporting item internal consistency. The item-scale correlations and standard deviations of items of the same scale were similar, supporting equal item weighting. The item-scale correlations for the SF-36v2 Health Survey RP items (0.75 to 0.78) were generally greater than those of version 1 (0.64 to 0.68). The same was also found with the RE items (0.70 to 0.77 for version 2 vs 0.62 to 0.71 for version 1). The item-scale correlations of the version 2 vitality (VT) and mental health (MH) items were similar to those of version 1 despite a change from the 6-point to a 5-point response scale. The item-hypothesized scale correlations were significantly higher than item-other scales correlations for a ll i tems , wh ic h me an s sc al in g su cc es s on i tem discriminant validity was perfect (100%) for all scales.

Cronbach’s alpha coefficients of internal consistency (reliability) were above 0.7 for all the Chinese (HK) SF-36v2 Health Survey scales except the general health (GH) scale (0.66), which was the same for both versions 1 and 2 (Table 2). The reliabilities of the RP and RE scales of version 2 were better than those of version 1 (0.89 in RP_v2 vs. 0.83 in RP_v1; 0.86 in RE_v2 vs. 0.82 in RE_v1). There was almost no difference in the internal reliability in the VT and MH scales between version 1 and 2 despite a reduction in the number of response options in version 2. Intra-class coefficients (ICC) measuring testretest reliability were above 0.7 for all scales.

Validity of the two principal component factor structure

The results of the factor analysis with varimax rotation on the Chinese SF-36v2 Health Survey scale scores are shown in Table 3. Two principal component factors (physical and mental) with eigenvalues greater than 1.0 (3.61 for factor 1 and 1.01 for factor 2) were extracted from the eight scale scores. The two principal components explained 59% of the total variance of SF-36 scores and 64 to 87% of the reliable variance of each individual scale. The correlations between the scale scor es and the two f actors were simi lar to those hypothesized and to those of version 1.¹⁴ The population specific factor coefficients of the Chinese (HK) SF-36v2 Health Survey are compared with those of version 1 with reference to the US standard in Table 4.

Population norm of the Chinese (HK) SF-36v2 Health Survey

Table 5 shows the distribution of Chinese (HK) SF-36v2 Health Survey scale scores, compared with corresponding values of version 1 as appropriate. The scores of the PF, BP, GH, SF scales were the same for version 1 and 2 because there was no difference in their items or response options. There were significant differences in the mean and SD of the RP and RE scale scores between the two versions. The floor effect (the proportion of respondents scoring at the lowest scores) was markedly reduced from 7.5% in version 1 to 0.6% in version 2 and from 16.4% in version 1 to 0.3% in version 2 for the RP and RE scales, respectively. There was slight improvement in the ceiling effects (the proportion of respondents scoring at the highest scores) in these two role functioning scales in version 2, but they were still very large. Table 6 shows the population mean Chinese (HK) SF-36v2 Health Survey scores of all subjects and by age and sex groups.

Discussion

The results confirmed that the Chinese (HK) SF-36v2 Health Survey satisfied all the scaling assumptions with a scaling success rate of 100%. The correlation between item GH3 and the GH score was relatively low (0.32), which was the same with both versions. Feedback from the interviewers revealed that some respondents said that they were not sure of the answer to this item because they did not know the health status of others. This might also be the reason why the internal reliability of the GH scale did not reach the standard of 0.7

The psychometric properties of version 2, especially for the two role functioning (RP and RE) scales, were much better than those of version 1 in terms of internal re l i ab i l i t y, f lo o r e f f e c t s an d sma l l e r s t an d a r d deviations, suggesting that it would be more sensitive and responsive. This illustrated the advantage of a 5-point response scale over that of a dichotomous scale. The floor effects were almost eliminated by the change from version 1 to 2, implying that the measure wo u l d b e mo r e ab l e t o d e t e c t d e t e r i o r a t io n in HRQOL. The ceiling effects of the SF-36v2 Health Survey were not much improved from those of version 1, as found in population studies in the US and other countries.^{4, 8-10} High ceiling effects in the pain and ro l e - fu n c t io n in g s c a l e s a r e in t r in s i c to g e n e r a l population studies because most subjects are healthy. It is more important for a HRQOL measure to have low floor effect in the normal population so that it can detect deterioration from ‘normal’, as in the case of the SF-36v2 Health Survey. On the other hand, the ceiling effect should be low in patient populations if the measure is to be used to assess the effectiveness of treatment. Further studies are required to determine the ceiling effect of the SF-36v2 Health Survey in patient populations.

Th e h y p o th e s i z e d two p r in c ip a l c omp o n e n t factor structure that is the conceptual base of the SF-36 physical and mental summary (PCS & MCS) sco res was al so co nfirmed. The two compone nts explained 59% of the total variance of the Chinese (HK) SF-36 v 2 He a lth Su rve y s c o re s , wh ic h was slightly better than the 58% found in version 1,¹⁴ and approaching the expected standard of 60%. The two components explained 70% or more of the reliable variance of each scale score except for the BP (64%), VT (6 9%) and SF (65%) sc al e s, s imi la r to th os e found with version 1.¹⁴ The replication of the two principal factor structure means that summation of th e Ch ine s e (HK) SF-3 6v 2 He a l th Su rve y s c a l e scores into the physical and mental summary scores is valid. The physical and mental factor coefficients (weightings for the calculation of the SF-36 physical and mental summary scores) were almost the same be twe en v ers ion s 1 and 2, an d compa rab le to the s t a n d a r d d e r i v e d f r om t h e U S p o p u l a t i o n . Eq uiv ale nce b etween th e p opu lat ion s pec ifi c and standard (US) summary scale scoring algorithms of Chinese (HK) SF-36 Health Survey was confirmed in a p r e v io u s s tu d y. ¹⁴ Th u s th e Ch i n e s e (HK) S F - 3 6 v 2 He a l t h S u r v e y p h y s i c a l an d me n t a l summary (PCS and MCS) scales should be scored by th e s t an d a r d a lg o r i thm fo r b e t t e r in t e rn a t i o n a l comparability.

T h e r e wa s s i g n i f i c an t d i f f e r e n c e i n t h e population mean RP and RE scale sc ores between version 1 and version 2 indicating th at normative v a l u e s o f v e r s i o n 1 c an n o t b e u s e d f o r t h e i n t e r p r e t a t i o n o f v e r s i o n 2 d a t a . Th e t o t a l p o p u la t io n me an s c o r e s an d s t an da rd d e v iat io n s shown in Table 6 shou ld be used for norm-based s c o r i n g o f t h e Ch i n e s e (HK) S F - 3 6 v 2 He a l t h Survey.^{14, 15} We believe the normative scores are still applicable although the data were collected nearly 10 years ago, based on the findings by studies in the US showing that population mean SF-36 Health Survey scores remained very stable with a change of less than 3% (3 points in a scale range of 100) over 10 years.¹⁵

Limitation

The Chinese (HK) SF-36v2 Health Survey data presented in this paper were extracted from answers to relevant version 1 questions and the two SF-36v2 Health Survey role-functioning questions that were administered after version 1 questions, which could have an order effec t on the res ponses. A g eneral population survey with a stand-alone SF-36v2 Health S u r v e y s h o u l d b e c a r r i e d o u t t o c o n f i rm t h e psychometric properties and update the population norm if resources are available.

Conclusion

Th e v a l i d i t y an d r e l iab i l i ty o f th e Ch in e s e (HK) SF-36_Version 2 have been confirmed for the adult population in Hong Kong. Population norm (mean and standard deviation) of the Chinese (HK) SF-36v2 Health Survey is now available to facilitate the interpretation of scores. There was significant difference in the population means between versions 1 and 2, the appropriate norm reference should be used for comparison. Version 2 of the Chinese (HK) SF-36 Health Survey should be preferred to version 1 i n fu t u r e ap p l i c a t i o n s b e c a u s e i t h a s b e t t e r p s y c h ome t r i c p r o p e r t i e s . Th e Ch i n e s e (HK) SF-3 6 v 2 He a l th Su rv e y i s e x p e c t e d to b e mo r e sensitive and responsive than the original version, which will need to be confirmed by further studies.

Acknowledgement

T h e S F - 3 6 â an d S F - 3 6 v 2 â a r e r e g i s t e r e d trademarks of Medical Outcomes Trust. A copy of t h e Ch i n e s e (HK) S F -3 6 v 2 He a l t h S u r v e y an d licence to use can be obtained from QualityMetric http://www.qualitymetric.com.

Key messages

Version 2 of the SF-36 Health Survey (SF-36v2) has improvements in the clarity of wording, questionnaire format and number of response options over the first version.
The Chinese (HK) SF-36v2 is valid with 100% scaling success in convergent and discriminant validity, and reliable.
The two principal component factor structure and coe fficients of the Chinese (HK) SF-36v2 is equivalent to the US original, so the standard scoring algorithm for the calculation of the two summa ry score s is applicable to the Chinese population.
The SF-36v2 is likely to be more sensitive than version 1 because it has less floor effect and a better internal reliability.
The appropriate population norms should be used for the interpretation of the data of version 1 or 2 of the SF-36 Health Survey because there were significant differences in their population mean scores.

Funding: This study was funded by Health Services Research Grant (#711026), the Government of the HKSAR.

Elegance TP Lam, BSc, MMedSc
PhD Candidate,

Cindy LK Lam, MBBS, MD (HK), FRCGP, FHKAM (Family Medicine)
Clinical Professor,

Yvonne YC Lo, MBChB, FRACGP, FHKCFP, FHKAM (Family Medicine)
Clinical Assistant Professor,
Family Medicine Unit, Department of Medicine, the University of Hong Kong.

Barbara Gandek, MS
Scientist,
Health Assessment Lab, Waltham, USA

Correspondence to: Professor Cindy L K Lam, Family Medicine Unit, the University of Hong Kong, 3/F, Ap Lei Chau Clinic, 161 Main Street, Ap Lei Chau, Hong Kong SAR.

Reference

Ware JE Jr., Snow KK, Kosinski M, et al. SF-36 Health Survey - Manual and Interpretation Guide. Boston: The Health Institute, 1993.
Lam CLK, Gandek B, Ren XS, et al. Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol 1998;51:1139-1147.
Lam CLK, Lauder IJ, Lam TP, et al. Population based norming of the Chinese (HK) version of the SF-36 health survey. HK Pract 1999;21:460- 470.
Ware JE Jr. SF-36 Health Survey Update. SPINE 2000;25:3130-3139.
Bullinger M, Alonso J, Apolone G, et al. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998;51:913- 923.
Keller SDWare JE, Gandek B, et al. Testing the equivalence of translations of widely used response choice labels: Results from the IQOLA Project. J Clin Epidemiol 1998;51:933-948.
Wagner AK, Gandek B, Aaronson NK, et al. Cross-cultural comparisons of the content of SF-36 translations across 10 countries: Results from the IQOLA Project. J Clin Epidemiol 1998:51:925-932.
Jenkinson C, Stewart-Brown S, Petersen S, et al. Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health 1999;53: 46-50.
Taft C, Karlsson J, Sullivan M. Performance of the Swedish SF-36 version 2.0. Qual Life Res 2004;13:251-256.
Ware JE Jr., Kosinski MA, Dewey JE. How to score version 2 of the SF-36 health survey. Lincoln: Quality Metric Inc., 2000.
Hawthorne G, Osborne RH, Taylor A, et al. The SF36 Version 2: critical analyses of population weights, scoring algorithms and population norms. Qual Life Res 2007;16:661-673.
Lam CLK, Fong DY, Lauder IJ, et al. The effect of health-related quality of life (HRQOL) on health service utilisation of a Chinese population. Soc Sci Med 2002;55:1635-1646.
Ware JE Jr., Kosinski M, Gandek B, et al. The factor structure of the SF-36 Health Survey in 10 countries: results from the IQOLA Project. J Clin Epidemiol 1998;51:1159-1165.
Lam CLK, Tse EYY, Gandek B, et al. The SF-36 summary scales were valid, reliable, and equivalent in a Chinese population. J Clin Epidemiol 2005;58: 815-822.
Ware JE Jr., Kosinski M. SF-36 Physical & Mental Health Summary Scales: A Manual for Users of Version 1. Lincoln, Rohode Island: QualityMetric, 2001.
Steiger JH. Tests for comparing elements of a correlation matrix. Psychol Bull 1980; 87: 245-251.

Appendix: Summary of Difference between Versions 1 and 2 of the Chinese (HK) SF-36 Health Survey

Question Number

3, introduction

Version 1

The following items are about activities you might do during a typical day. Does your health now limit you in these activities? If so, how much?

Version 2

The following questions are about activities you might do during a typical day. Does your health now limit you in these activities? If so,how much?