The Hong Kong Practitioner

October 2007, Volume 29, No. 10

Original Articles

To validate the Chinese version of the 2Q and PHQ-9 questionnaires in Hong Kong Chinese patients

Chi-man Cheng 鄭志文, Michael Cheng 鄭子誠

HK Pract 2007;29:381-390

Summary

Objective: To study the criterion validity of the Chinese version of the 2Q and the PHQ-9 questionnaires for screening of depression in primary care in Hong Kong.

Design: The 2Q and the PHQ-9 questionnaires from the Primary Care Evaluation of Mental Disorders Procedure (PRIME-MD) were translated into Chinese. Patients from 14 general practice clinics in Hong Kong were asked to fill in the questionnaires before they saw their doctors. The general practitioners, blind to the results, then applied the 17 items Chinese Hamilton Depression Rating Scale (CHDS) for the patients. The 2Q and the PHQ-9 were then validated against the CHDS, which served as the gold standard for depression detection.

Subjects: 357 patients from 14 general practice clinics in Hong Kong.

Main outcome measures: Sensitivity and specificity of 2Q and PHQ-9, Pearson Correlation between PHQ-9 and CHDS.

Results: Sensitivity of the 2Q was 96.7% and specificity was 73.4%. The sensitivity of the PHQ-9 at cut-off point of 9 was 80% and specificity was 92%. The Pearson Correlation between the PHQ-9 and the CHDS was 0.793 (p < 0.01).

Conclusion: The Chinese version of the 2Q and the PHQ-9 were valid as instruments for screening of depression in primary care in Hong Kong. The characteristics of the questionnaires were comparable to studies in other countries.

Keywords: Depression, Validate, 2-Q, PHQ-9, Hong Kong GP Patients.

摘要

目的： 驗證以中文版的2Q問卷和PHQ-9問卷在香港基層醫療進行篩查抑鬱症的標準正確度。

設計： 將《基層醫療精神疾病評估工具(PRIME-MD)》中的2Q和PHQ-9翻譯成中文。在14所普通科醫務所，請求候診病人填寫該問卷。另外醫生在不知結果的情況下，對病人採用有17項問題的中文版漢密爾頓抑鬱量表(CHDS)評估，以此作為診斷抑鬱症的標準，再為2Q和PHQ-9的結果驗證其有效性。

研究對象： 357位在14所普通科醫務所候診的病人。

主要測量內容： 2Q和PHQ-9的敏感度和特異度；PHQ-9和CHDS之間的Pearson關聯。

結果： 2Q的靈敏度為96.7%，特異度為73.4%。當分界點為9，PHQ-9的靈敏度為80%，特異度為92%。PHQ-9和CHDS之間的Pearson關聯度為0.793(p<0.01)。

結論： 驗證結果可確定中文版2Q和PHQ-9該能有效地在香港基層醫療篩查抑鬱症。該問卷的各項特點可與其他國家研究相比。

主要詞彙： 抑鬱，校驗，2-Q，PHQ-9，香港普通病人。

Introduction

Depression is common

The WHO identified depression as the fourth leading cause of worldwide disease in 1990, and depressive illness is projected to be the second leading cause of disability worldwide in 2020.¹ Each year the WHO records 100 million cases of depression. For frontline primary care physicians this represents a formidable challenge, a challenge in both recognition and management of this illness. In Hong Kong, a study showed the prevalence rate of functional disorders was 16.9% in family practice.² Another recent study by the Mood Disorder Centre of the Chinese University of Hong Kong found that about 8.3 per cent of people had symptoms of depression in the previous 12 months.³

Screening is useful

In 1996, the USPSTF (United States Preventive Services Task Force) found insufficient evidence to recommend for or against routine screening for depression with standardized questionnaires.⁴

In 2002, after a large systematic review,⁵ the USPSTF recommended screening adults for depression in clinical practices that had systems in place to assure accurate diagnosis, effective treatment, and follow-up. It found good evidence that screening improved the accurate identification of depressed patients in primary care settings and that treatment of depressed adults identified in primary care settings decreases clinical morbidity. It concluded that the benefits of screening were likely to outweigh any potential harms.⁶

Aims and objectives

We aim at validating a Chinese instrument that can be used in the local Chinese population. This instrument should be simple and can be completed in 2 minutes or less. We have chosen two components of the Primary Care Evaluation of Mental Disorder Procedure (PRIME-MD): the 2-Question questionnaire, and the PHQ-9. We translated both the 2Q and the PHQ-9 into Chinese and combined them into a two-part questionnaire. We recruited a group of general practitioners (GPs) from different parts of Hong Kong to take part in the study.

We would like to study the criterion validity of the translated Chinese versions of these two components of the PRIME-MD (the 2-Q and the PHQ-9) in primary care patients in Hong Kong, and to compare the operating characteristics with other studies.

Methods

Instrument

The instruments chosen were the two-question test (2Q) and the Patient Health Questionnaire (PHQ-9). These were arranged as Part A and Part B of our questionnaire. The questionnaire was then translated into Chinese. The Chinese Hamilton Depression Rating Scale (CHDS) was chosen as the validation tool. The results obtained from the two parts of the questionnaire were then compared individually against the CHDS.

The 2Q was derived from the Primary Care Evaluation of Mental Disorders Procedure (PRIME-MD). The PRIME-MD has a 27-item screening questionnaire and follow-up clinician interview designed to facilitate the diagnosis of common mental disorders in primary care. The questionnaire includes two questions about depressed mood and anhedonia:

"During the past month, have you often been bothered by feeling down, depressed, or hopeless?", and
"During the past month, have you often been bothered by little interest or pleasure in doing things?"

The original PRIME-MD study reported that a"yes" answer to one of these two questions was 86% sensitive and 75% specific compared with a subsequent telephone interview diagnosis of major depressive disorder.⁷

The 2Q was chosen because of the high sensitivity and that it was simple to use. It is a useful measure for detecting depression in primary care. It has similar test characteristics to other case-finding instruments and is less time-consuming.

The Patient Health Questionnaire (PHQ) is a self-report version of PRIME-MD. The PHQ-9 is the depression module, which scores each of the 9 DSM-IV criteria as"0" (not at all) to"3" (nearly every day). The diagnostic validity of the PHQ has recently been established in 2 studies involving 3,000 patients in 8 primary care clinics⁸ and 3,000 patients in 7 obstetrics-gynaecology clinics.⁹ The PHQ-9 had been validated in the same study involving 3000 primary care patients. A PHQ-9 score > 9 had a sensitivity of 88% and a specificity of 88% for major depression.¹⁰ In addition to making criteria-based diagnoses of depressive disorders, the PHQ-9 is also a reliable and valid measure of depression severity.¹⁰ These characteristics plus its brevity make the PHQ-9 a useful clinical and research tool. (Although the PHQ-9 had been diagnostic validated against previous research, the idea of this study was validating the PHQ-9 for screening rather than diagnosis of depression.)

The Hamilton Depression Scale (HAMD) has been used as the gold standard for the diagnosis and outcome measurement for depression in research for 40 years.¹¹ It is still one of the most commonly used measure of depression since its introduction in the late 1950s.¹² Although it had recently been criticised to be psychometrically and conceptually flawed after a review by Michael et al,¹³ it had been validated and was the available alternative to structured interview.

Its Chinese version is also a validated tool.¹⁴ We used the 17 questions version rather than the 21 questions version because it has already been shown to be effective in the assessment of depression.

Translation

We had a bilingual person with experience in translation translated the 2Q and PHQ-9 into Chinese. This initial draft was then presented at a tutors meeting for comment. All nine tutors present were invited to give comments about the wording. From the suggestions given during the presentation the wording was modified in 2 of the questions, 1(f) and 1(i) in the PHQ-9. Question 1(f) asked whether the person felt bad about themselves or felt they had let themselves or their family down. Question 1(i) asked whether the person had thoughts of self-harm. Another bilingual person with experience in translation then translated the modified questionnaire back into English. This was to ensure that the meaning was the same. The final draft was then presented at the next tutors meeting for appraisal and this modified version was adopted (Appendix 1). The questionnaire was tried out in a pilot study involving 1 - 5 patients in each participating clinic. The patients interviewed had no difficulty in understanding and answering the questions.

Study population and sampling

GPs were recruited from graduates of the Postgraduate Diploma in Community Psychological Medicine. This is a course run by the University of Hong Kong. It is a course designed for GPs who have an active interest in managing patients with psychological problems.

A total of 14 GPs voluntarily participated in the study. They were all male doctors. Their respective clinics were distributed throughout Hong Kong.

Hong Kong Island: 7 (Central x 2, HK West x 3, HK South x 2 )

Kowloon: 5 (Mongkok x 1, Kowloon East x 4)

New Territories: 2 (Tsuen Wan, Tin Shui Wai).

The study population was patients who attended these clinics. Convenience sampling was used.
Inclusion Criteria: Adults (18 years or over), Chinese, literate.
Exclusion Criteria: Illiterate or cannot understand questionnaire, patient with bipolar illness or recent bereavement.
Each clinic was allocated 30 questionnaires. A trial run involving 1-5 questionnaires was done in early September 2004. Feedback was collected from all the participating clinics. The study was then carried out over a 3-month period (September to December 2004).

Data collection

All participating GPs were briefed together beforehand by a psychiatrist in order to ensure that the CHDS was assessed in the same way and to minimize inter-rater variation. The briefing session was taped and the tape was sent to all participating doctors to use as reference. The respective doctor then briefed their clinic staff about the questionnaire and the protocol. The psychiatrist was consulted whenever there was any question about scoring the CHDS. Answers obtained were communicated to all participating doctors via email.

Study design

It was a multi-centre cross sectional study.

When subjects (patients) attended the target clinics, they were asked if they were willing to take part in a study by filling a questionnaire while they were waiting to be seen by the GP. With the subjects' verbal consent, a set of questionnaire was presented together with a covering letter, which explained the purpose of the study (Appendix 2). The completed questionnaires were collected by the clinic staff before the GPs consultation.

For each patient, the GP had to fill in 2 forms. The first one was about the patient's demographics and includes a brief medical history (Appendix 3). Patient information included age, sex, marital status, education level, family history, medical history, current medication, substance abuse, recent bereavement and history of psychiatric illness. The second one was the CHDS. During the consultation the GP was blinded to the results of the questionnaire that the patient had completed earlier. This whole process took about 15 minutes. The GP then proceeded with the rest of the consultation.

Statistical analysis

The data from the questionnaires collected were coded and entered into an Excel spreadsheet. SPSS version 12.0 was used to analyze the data. The sensitivity and specificity of the 2Q and PHQ-9 were compared separately against the CHDS. The Pearson Correlation test was applied to assess the relationship between the PHQ-9 and CHDS.

Results

A total of 368 questionnaires were collected. Most of the clinics collected more than 20 questionnaires although one clinic only provided 2 responses. There was no record of any refusal. Eleven questionnaires were discarded because the age of the respondent was missing or they were younger than 18 years old. The remaining 357 questionnaires were used for analysis.

Demographics

There were 357 respondents, in which 212 were females and 134 were males. The gender information was missing in 11 questionnaires, which was equivalent to 3.1% of the total number collected. However, their data were still used in the final analysis as this was not part of the exclusion criteria and we felt it did not impact the validity of the results.

The respondents' age ranged from 18 to 90 years old. The mean age was 40.89. The descriptive statistics of the age of the male and female respondents were shown in Table 1. Although there were more female than male respondents, their mean age and their standard deviation were similar.

Correlation

The results of the PHQ-9 showed a good correlation with those from the CHDS. This is shown in Figure 1. The Pearson Correlation is 0.793. This is statistically significant to the 0.01 level (2-tailed).

Sensitivity and specificity

The CHDS was used as the gold standard. The 17-question format was used. The cut-off point for depression was set at 16. This means that a person who had a score of 16 or less on the CHDS was considered not to have significant depression. Those that had a CHDS > 16 were considered positive for depression. There were 30 respondents with a positive score. This gave a prevalence for depression of 8.4% in our study.

The sensitivity and the specificity of the questionnaire were analyzed in 2 parts. The first part was a comparison of the results between the 2Q and the CHDS. Anyone who answered yes to either item in the 2Q was considered positive.¹⁵ These two groups (2Q = 0 and 2Q > 0) were then validated against those that had a CHDS" 16 and CHDS > 16. The results are shown in Table 2.

The sensitivity was 96.7%, i.e. 29 out of 30 were positive for both the 2Q and the CHDS. The specificity was 73.4%, i.e. 240 out of 327 were negative in both the 2Q and the CHDS.

In the second part the results of the PHQ-9 score were compared against the CHDS. The PHQ-9 cut-off point was set at 9.¹⁸ A score of 9 or less was considered to be negative for significant depression and a score of >9 was positive for depression. The results are shown in Table 3.

The sensitivity was 80%, i.e. 24 out of 30 were positive for both the PHQ-9 and CHDS. The specificity was 92%, i.e. 301 out of 327 were negative in both the PHQ-9 and CHDS.

These results show that when validated against the CHDS, the 2Q has superior sensitivity whereas the PHQ-9 is more specific. To us this means that the 2Q with its high sensitivity can be used as the initial screening tool. Those that answer yes to either component in the 2Q can then proceed to the PHQ-9 for confirmation. The whole process should take about 2 minutes.

Discussion

Prevalence of depression was 8.4%. This prevalence rate had no statistical significance, because convenience sampling was adopted in this study. It was also not representative of the general population. As subjects were only selected if they could be accessed easily and conveniently, non-randomized selection might lead to bias sampling.

However, it was interesting to note that the prevalent rate was similar to other studies. It was comparable with the US study for primary care (4.8%-8.6%). It was also similar to the 8.3% of the recent local survey by the Chinese University of Hong Kong. Prevalence might be expected to be higher in our study as compared to GP cases as a whole because those doctors taking part in the study were doctors with special interest in community psychological medicine.

The 2-Q

In our study, the sensitivity was 96.7% and the specificity was 73.4%. This was comparable and in fact better in respect of specificity when compared with other studies. A sensitivity of 96% and a specificity of 57% was found in the study by Whooley et al with 590 patients in an urgent care clinic in San Francisco.¹⁶ A sensitivity of 97% and a specificity of 67% was found in the study by Arroll et al with 421 patients from 15 general practitioners in Auckland.¹⁵

The PHQ-9

A Pearson Correlation Coefficient of 0.793 (p<0.01) was found between the PHQ-9 and the CHDS. This showed that the 2 questionnaires correlated well with a linear relationship.

In our study, the sensitivity was 80% and the specificity was 92%.

In a large scale validating study involving 3000 patients of 62 primary care physicians conducted by Kroenke et al in the United States, the sensitivity of the PHQ-9 (with score > 9) was 88% and specificity was 88%.

Wulsin et al found the Spanish version of the PHQ-9 valid with a sensitivity of 77% and a specificity of 100% (cut-off point at PHQ-9 score >9) on 199 Honduran women in primary care clinics.¹⁷

Rizzo et al found a sensitivity of 78% and a specificity of 83% in their study involving 1413 primary care patients in Italy, using PHQ-9 score of > 8 as cut-off point.¹⁸

Dumont et al, in their study in the HUG, found that the PHQ-9 distinguished subjects with and without depressive disorders and was a good screener for severe disorders but had a poor capacity of discrimination when disorders were mild.¹⁹

Williams et al found that a PHQ-9 score > 9 had 91% sensitivity and 89% specificity for major depression, and 78% sensitivity and 96% specificity for any depression diagnosed in 316 post-stroke patients.²⁰

The results of our study were comparable and similar to studies in other countries.

The 2-stage questionnaire

In our study, the 2-Q and the PHQ-9 were combined. Upon analysis on assumption that only patients answered positive to the 2-Q would proceed to answer the PHQ-9, a sensitivity of 76% and a specificity of 93% were found. Test positive cases were defined as patients with answer positive to 2-Q and PHQ-9 score > 9.

Screening

The Chinese version of both the 2-Q and the PHQ-9 were found valid and were useful for screening for depression in adult patients in primary care. Both could be self-administered by patients.

The 2-Q was simple, very sensitive (96.7%), though less specific (73.4%). It also bore the advantage of being easily asked as questions during consultation by the busy doctor. This could even cater for illiterate patients.

The PHQ-9 was also simple to use, less sensitive (80%), but more specific (92%). The selection of which instrument to use depended on a trade-off between the questionnaire characteristics. One useful strategy would be a 2-step test. Only patients with positive response to the 2-Q would proceed to the PHQ-9. Those with PHQ-9 score > 9 would be classified positive in the depression screening and be subjected to diagnostic interview or referral. Those with 2-Q positive but PHQ < 9 would be recorded and offered a follow-up appointment for a second screening after a specific time period, say, four weeks. It is important to note that all screened-positive cases should be clinically evaluated before diagnosis of depression is made and treatment be given. One point to remember is that the USPSTF recommendation for screening for depression in adult patients only applied when positive results were followed by accurate diagnosis, effective treatment, and careful follow-up. Benefits from screening were unlikely to be realized unless such systems were functioning well. It is important to have this point emphasized during the promotion and implementation of screening programme.

Conclusion

The Chinese version of the 2Q and the PHQ-9 were valid as instruments for screening of depression in primary care in Hong Kong. The characteristics of the questionnaires were comparable to studies in other countries.

Key messages
In comparing general practitioners with and without higher qualification, those with higher qualification are more likely to:

Depression is common.
Depression screening for adults in primary care setting is useful.
It can be done as simply as asking 2 short questions (2-Q) or by distributing a short questionnaire (PHQ-9) to patients.
This study translated and validated the Chinese Version of 2-Q and PHQ-9 as instruments for screening of depression in primary care in Hong Kong.
Screened positive cases should be evaluated clinically before diagnosis and treatments are given.

Chi-man Cheng, MBBS(HK), MRCGP (UK), MFM (CUHK), PDipComPsychMed (HK)
Family Physician in Private Practice

Michael Cheng, MBBS (UNSW), MFM (CUHK), FRACGP, FHKCFP
Family Physician in Private Practice

Correspondence to : Dr Chi-man Cheng, Shop 5, G/F, Andes Plaza, 323 Queen's Road West, Hong Kong.

References

Murray CJ, Lopez AD. The global burden of disease: a comprehensive assessment of mortality and disability from diseases, injuries, and risk factors in 1990 and projected to 2020. Cambridge, Mass.: Harvard University Press, 1996.
Chan DSL, Wong MCS, Yuen NCL. The prevalence of functional disorders seen in family practice in Hong Kong. HK Pract 2003;25:413-418.
SCMP, April 25, 2005
Guide to Clinical Preventive Services. 2nd ed. U.S. Preventive Services Task Force. Baltimore: Williams & Wilkins; 1996:541-546.
Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med 2002;136:765-766.
www.preventiveservices.ahrq.gov
Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. JAMA 1994;12:1749-1756.
Spitzer RL, Kroenke K, Williams JBW. Patient Health Questionnaire Study Group. Validity and utility of a self-report version of PRIME-MD: the PHQ Primary Care Study. JAMA 1999;282:1737-1744.
Spitzer RL, Williams JBW, Kroenke K, et al. Validity and utility of the Patient Health Questionnaire in assessment of 3000 obstetric-gynecologic patients: the PRIME-MD Patient Health Questionnaire Obstetrics-Gynecology Study. Am J Obstet Gynecol 2000;183:759-769.
Spitzer RL, Williams JB, Kroenke K. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001;606-613.
Michael R, Andrew G, Deborah R, et al. The Hamilton Depression Rating Scale: Has the Gold Standard Become a Lead Weight? Am J Psychiatry 2004;161:2163-2177.
Hedlund JL, Vieweg BW. The Hamilton Rating Scale for Depression: a comprehensive review. J Operational Psychiatry 1979;10:149-165.
Michael R, Andrew G, Deborah R, et al. The Hamilton Depression Rating Scale: Has the Gold Standard Become a Lead Weight? Am J Psychiatry 2004; 161:2163-2177.
Zheng YP, Zhao JP, Phillips M, et al. Validity and Reliability of the Chinese Hamilton Depression Rating Scale. Br J Psychiatry 1988;152:660-664.
Arroll B, Khin N, Kerse N. Screening for depression in primary care with two verbally asked questions: cross sectional study. BMJ 2003;327:1144-1146.
Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression. Two questions are as good as many. J Gen Intern Med 1997;12:439-445.
Wulsin L, Somoza E, Heck J. The Feasibility of Using the Spanish PHQ-9 to Screen for Depression in Primary Care in Honduras. Prim Care Companion. J Clin Psychiatry 2002;4:191-195.
Rizzo R, Piccinelli M, Mazzi MA, et al. The Personal Health Questionnaire: a new screening instrument for detection of ICD-10 depressive disorders in primary care. Psychol Med 2000;30:831-840.
Dumont P, Andreoli A, Borgacci S, et al. [Quick detection of depression: a significant clinical issue] [Article in French]. Rev Med Suisse 2005;1(5):344-346, 349.
Williams LS, Brizendine EJ, Plue L, et al. Performance of the PHQ-9 as a screening tool for depression after stroke. Stroke 2005;36:635-638. Epub 2005 Jan 27.