To validate the Chinese version of the 2Q and PHQ-9 questionnaires in Hong Kong
Chinese patients
Chi-man Cheng 鄭志文, Michael Cheng 鄭子誠
HK Pract 2007;29:381-390
Summary
Objective: To study the criterion validity of the Chinese version
of the 2Q and the PHQ-9 questionnaires for screening of depression in primary care
in Hong Kong.
Design: The 2Q and the PHQ-9 questionnaires from the Primary Care
Evaluation of Mental Disorders Procedure (PRIME-MD) were translated into Chinese.
Patients from 14 general practice clinics in Hong Kong were asked to fill in the
questionnaires before they saw their doctors. The general practitioners, blind to
the results, then applied the 17 items Chinese Hamilton Depression Rating Scale
(CHDS) for the patients. The 2Q and the PHQ-9 were then validated against the CHDS,
which served as the gold standard for depression detection.
Subjects: 357 patients from 14 general practice clinics in Hong
Kong.
Main outcome measures: Sensitivity and specificity of 2Q and PHQ-9,
Pearson Correlation between PHQ-9 and CHDS.
Results: Sensitivity of the 2Q was 96.7% and specificity was 73.4%.
The sensitivity of the PHQ-9 at cut-off point of 9 was 80% and specificity was 92%.
The Pearson Correlation between the PHQ-9 and the CHDS was 0.793 (p < 0.01).
Conclusion: The Chinese version of the 2Q and the PHQ-9 were valid
as instruments for screening of depression in primary care in Hong Kong. The characteristics
of the questionnaires were comparable to studies in other countries.
Keywords: Depression, Validate, 2-Q, PHQ-9, Hong Kong GP Patients.
摘要
目的: 驗證以中文版的2Q問卷和PHQ-9問卷在香港基層醫療進行篩查抑鬱症的標準正確度。
設計: 將《基層醫療精神疾病評估工具(PRIME-MD)》中的2Q和PHQ-9翻譯成中文。在14所普通科醫務所, 請求候診病人填寫該問卷。另外醫生在不知結果的情況下,對病人採用有17項問題的中文版漢密爾頓抑鬱量表(CHDS)評估,
以此作為診斷抑鬱症的標準,再為2Q和PHQ-9的結果驗證其有效性。
研究對象: 357位在14所普通科醫務所候診的病人。
主要測量內容: 2Q和PHQ-9的敏感度和特異度;PHQ-9和CHDS之間的Pearson關聯。
結果: 2Q的靈敏度為96.7%,特異度為73.4%。當分界點為9,PHQ-9的靈敏度為80%, 特異度為92%。PHQ-9和CHDS之間的Pearson關聯度為0.793(p<0.01)。
結論: 驗證結果可確定中文版2Q和PHQ-9該能有效地在香港基層醫療篩查抑鬱症。該問卷的各項特點可與其他國家研究相比。
主要詞彙: 抑鬱,校驗,2-Q,PHQ-9,香港普通病人。
Introduction
Depression is common
The WHO identified depression as the fourth leading cause of worldwide disease in
1990, and depressive illness is projected to be the second leading cause of disability
worldwide in 2020.1 Each year the WHO records 100 million cases of depression.
For frontline primary care physicians this represents a formidable challenge, a
challenge in both recognition and management of this illness. In Hong Kong, a study
showed the prevalence rate of functional disorders was 16.9% in family practice.2
Another recent study by the Mood Disorder Centre of the Chinese University of Hong
Kong found that about 8.3 per cent of people had symptoms of depression in the previous
12 months.3
Screening is useful
In 1996, the USPSTF (United States Preventive Services Task Force) found insufficient
evidence to recommend for or against routine screening for depression with standardized
questionnaires.4
In 2002, after a large systematic review,5 the USPSTF recommended screening
adults for depression in clinical practices that had systems in place to assure
accurate diagnosis, effective treatment, and follow-up. It found good evidence that
screening improved the accurate identification of depressed patients in primary
care settings and that treatment of depressed adults identified in primary care
settings decreases clinical morbidity. It concluded that the benefits of screening
were likely to outweigh any potential harms.6
Aims and objectives
We aim at validating a Chinese instrument that can be used in the local Chinese
population. This instrument should be simple and can be completed in 2 minutes or
less. We have chosen two components of the Primary Care Evaluation of Mental Disorder
Procedure (PRIME-MD): the 2-Question questionnaire, and the PHQ-9. We translated
both the 2Q and the PHQ-9 into Chinese and combined them into a two-part questionnaire.
We recruited a group of general practitioners (GPs) from different parts of Hong
Kong to take part in the study.
We would like to study the criterion validity of the translated Chinese versions
of these two components of the PRIME-MD (the 2-Q and the PHQ-9) in primary care
patients in Hong Kong, and to compare the operating characteristics with other studies.
Methods
Instrument
The instruments chosen were the two-question test (2Q) and the Patient Health Questionnaire
(PHQ-9). These were arranged as Part A and Part B of our questionnaire. The questionnaire
was then translated into Chinese. The Chinese Hamilton Depression Rating Scale (CHDS)
was chosen as the validation tool. The results obtained from the two parts of the
questionnaire were then compared individually against the CHDS.
The 2Q was derived from the Primary Care Evaluation of Mental Disorders Procedure
(PRIME-MD). The PRIME-MD has a 27-item screening questionnaire and follow-up clinician
interview designed to facilitate the diagnosis of common mental disorders in primary
care. The questionnaire includes two questions about depressed mood and anhedonia:
- "During the past month, have you often been bothered by feeling down, depressed,
or hopeless?", and
- "During the past month, have you often been bothered by little interest or pleasure
in doing things?"
The original PRIME-MD study reported that a"yes" answer to one of these two questions
was 86% sensitive and 75% specific compared with a subsequent telephone interview
diagnosis of major depressive disorder.7
The 2Q was chosen because of the high sensitivity and that it was simple to use.
It is a useful measure for detecting depression in primary care. It has similar
test characteristics to other case-finding instruments and is less time-consuming.
The Patient Health Questionnaire (PHQ) is a self-report version of PRIME-MD. The
PHQ-9 is the depression module, which scores each of the 9 DSM-IV criteria as"0"
(not at all) to"3" (nearly every day). The diagnostic validity of the PHQ has recently
been established in 2 studies involving 3,000 patients in 8 primary care clinics8
and 3,000 patients in 7 obstetrics-gynaecology clinics.9 The PHQ-9 had
been validated in the same study involving 3000 primary care patients. A PHQ-9 score
> 9 had a sensitivity of 88% and a specificity of 88% for major depression.10
In addition to making criteria-based diagnoses of depressive disorders, the PHQ-9
is also a reliable and valid measure of depression severity.10 These
characteristics plus its brevity make the PHQ-9 a useful clinical and research tool.
(Although the PHQ-9 had been diagnostic validated against previous research, the
idea of this study was validating the PHQ-9 for screening rather than diagnosis
of depression.)
The Hamilton Depression Scale (HAMD) has been used as the gold standard for the
diagnosis and outcome measurement for depression in research for 40 years.11
It is still one of the most commonly used measure of depression since its introduction
in the late 1950s.12 Although it had recently been criticised to be psychometrically
and conceptually flawed after a review by Michael et al,13 it had been
validated and was the available alternative to structured interview.
Its Chinese version is also a validated tool.14 We used the 17 questions
version rather than the 21 questions version because it has already been shown to
be effective in the assessment of depression.
Translation
We had a bilingual person with experience in translation translated the 2Q and PHQ-9
into Chinese. This initial draft was then presented at a tutors meeting for comment.
All nine tutors present were invited to give comments about the wording. From the
suggestions given during the presentation the wording was modified in 2 of the questions,
1(f) and 1(i) in the PHQ-9. Question 1(f) asked whether the person felt bad about
themselves or felt they had let themselves or their family down. Question 1(i) asked
whether the person had thoughts of self-harm. Another bilingual person with experience
in translation then translated the modified questionnaire back into English. This
was to ensure that the meaning was the same. The final draft was then presented
at the next tutors meeting for appraisal and this modified version was adopted (Appendix 1). The questionnaire
was tried out in a pilot study involving 1 - 5 patients in each participating clinic.
The patients interviewed had no difficulty in understanding and answering the questions.
Study population and sampling
GPs were recruited from graduates of the Postgraduate Diploma in Community Psychological
Medicine. This is a course run by the University of Hong Kong. It is a course designed
for GPs who have an active interest in managing patients with psychological problems.
A total of 14 GPs voluntarily participated in the study. They were all male doctors.
Their respective clinics were distributed throughout Hong Kong.
Hong Kong Island: 7 (Central x 2, HK West x 3, HK South x 2 )
Kowloon: 5 (Mongkok x 1, Kowloon East x 4)
New Territories: 2 (Tsuen Wan, Tin Shui Wai).
The study population was patients who attended these clinics. Convenience sampling
was used.
Inclusion Criteria: Adults (18 years or over), Chinese, literate.
Exclusion Criteria: Illiterate or cannot understand questionnaire, patient with
bipolar illness or recent bereavement.
Each clinic was allocated 30 questionnaires. A trial run involving 1-5 questionnaires
was done in early September 2004. Feedback was collected from all the participating
clinics. The study was then carried out over a 3-month period (September to December
2004).
Data collection
All participating GPs were briefed together beforehand by a psychiatrist in order
to ensure that the CHDS was assessed in the same way and to minimize inter-rater
variation. The briefing session was taped and the tape was sent to all participating
doctors to use as reference. The respective doctor then briefed their clinic staff
about the questionnaire and the protocol. The psychiatrist was consulted whenever
there was any question about scoring the CHDS. Answers obtained were communicated
to all participating doctors via email.
Study design
It was a multi-centre cross sectional study.
When subjects (patients) attended the target clinics, they were asked if they were
willing to take part in a study by filling a questionnaire while they were waiting
to be seen by the GP. With the subjects' verbal consent, a set of questionnaire
was presented together with a covering letter, which explained the purpose of the
study (Appendix 2). The
completed questionnaires were collected by the clinic staff before the GPs consultation.
For each patient, the GP had to fill in 2 forms. The first one was about the patient's
demographics and includes a brief medical history (Appendix
3). Patient information included age, sex, marital status, education
level, family history, medical history, current medication, substance abuse, recent
bereavement and history of psychiatric illness. The second one was the CHDS. During
the consultation the GP was blinded to the results of the questionnaire that the
patient had completed earlier. This whole process took about 15 minutes. The GP
then proceeded with the rest of the consultation.
Statistical analysis
The data from the questionnaires collected were coded and entered into an Excel
spreadsheet. SPSS version 12.0 was used to analyze the data. The sensitivity and
specificity of the 2Q and PHQ-9 were compared separately against the CHDS. The Pearson
Correlation test was applied to assess the relationship between the PHQ-9 and CHDS.
Results
A total of 368 questionnaires were collected. Most of the clinics collected more
than 20 questionnaires although one clinic only provided 2 responses. There was
no record of any refusal. Eleven questionnaires were discarded because the age of
the respondent was missing or they were younger than 18 years old. The remaining
357 questionnaires were used for analysis.
Demographics
There were 357 respondents, in which 212 were females and 134 were males. The gender
information was missing in 11 questionnaires, which was equivalent to 3.1% of the
total number collected. However, their data were still used in the final analysis
as this was not part of the exclusion criteria and we felt it did not impact the
validity of the results.
The respondents' age ranged from 18 to 90 years old. The mean age was 40.89. The
descriptive statistics of the age of the male and female respondents were shown
in Table 1. Although there
were more female than male respondents, their mean age and their standard deviation
were similar.
Correlation
The results of the PHQ-9 showed a good correlation with those from the CHDS. This
is shown in Figure 1. The
Pearson Correlation is 0.793. This is statistically significant to the 0.01 level
(2-tailed).
Sensitivity and specificity
The CHDS was used as the gold standard. The 17-question format was used. The cut-off
point for depression was set at 16. This means that a person who had a score of
16 or less on the CHDS was considered not to have significant depression. Those
that had a CHDS > 16 were considered positive for depression. There were 30 respondents
with a positive score. This gave a prevalence for depression of 8.4% in our study.
The sensitivity and the specificity of the questionnaire were analyzed in 2 parts.
The first part was a comparison of the results between the 2Q and the CHDS. Anyone
who answered yes to either item in the 2Q was considered positive.15
These two groups (2Q = 0 and 2Q > 0) were then validated against those that had
a CHDS" 16 and CHDS > 16. The results are shown in
Table 2.
The sensitivity was 96.7%, i.e. 29 out of 30 were positive for both the 2Q and the
CHDS. The specificity was 73.4%, i.e. 240 out of 327 were negative in both the 2Q
and the CHDS.
In the second part the results of the PHQ-9 score were compared against the CHDS.
The PHQ-9 cut-off point was set at 9.18 A score of 9 or less was considered
to be negative for significant depression and a score of >9 was positive for depression.
The results are shown in Table 3.
The sensitivity was 80%, i.e. 24 out of 30 were positive for both the PHQ-9 and
CHDS. The specificity was 92%, i.e. 301 out of 327 were negative in both the PHQ-9
and CHDS.
These results show that when validated against the CHDS, the 2Q has superior sensitivity
whereas the PHQ-9 is more specific. To us this means that the 2Q with its high sensitivity
can be used as the initial screening tool. Those that answer yes to either component
in the 2Q can then proceed to the PHQ-9 for confirmation. The whole process should
take about 2 minutes.
Discussion
Prevalence of depression was 8.4%. This prevalence rate had no statistical significance,
because convenience sampling was adopted in this study. It was also not representative
of the general population. As subjects were only selected if they could be accessed
easily and conveniently, non-randomized selection might lead to bias sampling.
However, it was interesting to note that the prevalent rate was similar to other
studies. It was comparable with the US study for primary care (4.8%-8.6%). It was
also similar to the 8.3% of the recent local survey by the Chinese University of
Hong Kong. Prevalence might be expected to be higher in our study as compared to
GP cases as a whole because those doctors taking part in the study were doctors
with special interest in community psychological medicine.
The 2-Q
In our study, the sensitivity was 96.7% and the specificity was 73.4%. This was
comparable and in fact better in respect of specificity when compared with other
studies. A sensitivity of 96% and a specificity of 57% was found in the study by
Whooley et al with 590 patients in an urgent care clinic in San Francisco.16
A sensitivity of 97% and a specificity of 67% was found in the study by Arroll et
al with 421 patients from 15 general practitioners in Auckland.15
The PHQ-9
A Pearson Correlation Coefficient of 0.793 (p<0.01) was found between the PHQ-9
and the CHDS. This showed that the 2 questionnaires correlated well with a linear
relationship.
In our study, the sensitivity was 80% and the specificity was 92%.
In a large scale validating study involving 3000 patients of 62 primary care physicians
conducted by Kroenke et al in the United States, the sensitivity of the PHQ-9 (with
score > 9) was 88% and specificity was 88%.
Wulsin et al found the Spanish version of the PHQ-9 valid with a sensitivity of
77% and a specificity of 100% (cut-off point at PHQ-9 score >9) on 199 Honduran
women in primary care clinics.17
Rizzo et al found a sensitivity of 78% and a specificity of 83% in their study involving
1413 primary care patients in Italy, using PHQ-9 score of > 8 as cut-off point.18
Dumont et al, in their study in the HUG, found that the PHQ-9 distinguished subjects
with and without depressive disorders and was a good screener for severe disorders
but had a poor capacity of discrimination when disorders were mild.19
Williams et al found that a PHQ-9 score > 9 had 91% sensitivity and 89% specificity
for major depression, and 78% sensitivity and 96% specificity for any depression
diagnosed in 316 post-stroke patients.20
The results of our study were comparable and similar to studies in other countries.
The 2-stage questionnaire
In our study, the 2-Q and the PHQ-9 were combined. Upon analysis on assumption that
only patients answered positive to the 2-Q would proceed to answer the PHQ-9, a
sensitivity of 76% and a specificity of 93% were found. Test positive cases were
defined as patients with answer positive to 2-Q and PHQ-9 score > 9.
Screening
The Chinese version of both the 2-Q and the PHQ-9 were found valid and were useful
for screening for depression in adult patients in primary care. Both could be self-administered
by patients.
The 2-Q was simple, very sensitive (96.7%), though less specific (73.4%). It also
bore the advantage of being easily asked as questions during consultation by the
busy doctor. This could even cater for illiterate patients.
The PHQ-9 was also simple to use, less sensitive (80%), but more specific (92%).
The selection of which instrument to use depended on a trade-off between the questionnaire
characteristics. One useful strategy would be a 2-step test. Only patients with
positive response to the 2-Q would proceed to the PHQ-9. Those with PHQ-9 score
> 9 would be classified positive in the depression screening and be subjected to
diagnostic interview or referral. Those with 2-Q positive but PHQ < 9 would be recorded
and offered a follow-up appointment for a second screening after a specific time
period, say, four weeks. It is important to note that all screened-positive cases
should be clinically evaluated before diagnosis of depression is made and treatment
be given. One point to remember is that the USPSTF recommendation for screening
for depression in adult patients only applied when positive results were followed
by accurate diagnosis, effective treatment, and careful follow-up. Benefits from
screening were unlikely to be realized unless such systems were functioning well.
It is important to have this point emphasized during the promotion and implementation
of screening programme.
Conclusion
The Chinese version of the 2Q and the PHQ-9 were valid as instruments for screening
of depression in primary care in Hong Kong. The characteristics of the questionnaires
were comparable to studies in other countries.
Key messages
In comparing general practitioners with and without higher qualification, those
with higher qualification are more likely to:
- Depression is common.
- Depression screening for adults in primary care setting is useful.
- It can be done as simply as asking 2 short questions (2-Q) or by distributing a
short questionnaire (PHQ-9) to patients.
- This study translated and validated the Chinese Version of 2-Q and PHQ-9 as instruments
for screening of depression in primary care in Hong Kong.
- Screened positive cases should be evaluated clinically before diagnosis and treatments
are given.
Chi-man Cheng, MBBS(HK), MRCGP (UK), MFM (CUHK), PDipComPsychMed (HK)
Family Physician in Private Practice
Michael Cheng, MBBS (UNSW), MFM (CUHK), FRACGP, FHKCFP
Family Physician in Private Practice
Correspondence to : Dr Chi-man Cheng, Shop 5, G/F, Andes Plaza, 323 Queen's
Road West, Hong Kong.
References
- Murray CJ, Lopez AD. The global burden of disease: a comprehensive assessment of
mortality and disability from diseases, injuries, and risk factors in 1990 and projected
to 2020. Cambridge, Mass.: Harvard University Press, 1996.
- Chan DSL, Wong MCS, Yuen NCL. The prevalence of functional disorders seen in family
practice in Hong Kong. HK Pract 2003;25:413-418.
- SCMP, April 25, 2005
- Guide to Clinical Preventive Services. 2nd ed. U.S. Preventive Services Task Force.
Baltimore: Williams & Wilkins; 1996:541-546.
- Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a
summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med 2002;136:765-766.
- www.preventiveservices.ahrq.gov
- Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing
mental disorders in primary care. The PRIME-MD 1000 study. JAMA 1994;12:1749-1756.
- Spitzer RL, Kroenke K, Williams JBW. Patient Health Questionnaire Study Group. Validity
and utility of a self-report version of PRIME-MD: the PHQ Primary Care Study. JAMA
1999;282:1737-1744.
- Spitzer RL, Williams JBW, Kroenke K, et al. Validity and utility of the Patient
Health Questionnaire in assessment of 3000 obstetric-gynecologic patients: the PRIME-MD
Patient Health Questionnaire Obstetrics-Gynecology Study. Am J Obstet Gynecol 2000;183:759-769.
- Spitzer RL, Williams JB, Kroenke K. The PHQ-9: validity of a brief depression severity
measure. J Gen Intern Med 2001;606-613.
- Michael R, Andrew G, Deborah R, et al. The Hamilton Depression Rating Scale: Has
the Gold Standard Become a Lead Weight? Am J Psychiatry 2004;161:2163-2177.
- Hedlund JL, Vieweg BW. The Hamilton Rating Scale for Depression: a comprehensive
review. J Operational Psychiatry 1979;10:149-165.
- Michael R, Andrew G, Deborah R, et al. The Hamilton Depression Rating Scale: Has
the Gold Standard Become a Lead Weight? Am J Psychiatry 2004; 161:2163-2177.
- Zheng YP, Zhao JP, Phillips M, et al. Validity and Reliability of the Chinese Hamilton
Depression Rating Scale. Br J Psychiatry 1988;152:660-664.
- Arroll B, Khin N, Kerse N. Screening for depression in primary care with two verbally
asked questions: cross sectional study. BMJ 2003;327:1144-1146.
- Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression.
Two questions are as good as many. J Gen Intern Med 1997;12:439-445.
- Wulsin L, Somoza E, Heck J. The Feasibility of Using the Spanish PHQ-9 to Screen
for Depression in Primary Care in Honduras. Prim Care Companion. J Clin Psychiatry
2002;4:191-195.
- Rizzo R, Piccinelli M, Mazzi MA, et al. The Personal Health Questionnaire: a new
screening instrument for detection of ICD-10 depressive disorders in primary care.
Psychol Med 2000;30:831-840.
- Dumont P, Andreoli A, Borgacci S, et al. [Quick detection of depression: a significant
clinical issue] [Article in French]. Rev Med Suisse 2005;1(5):344-346, 349.
- Williams LS, Brizendine EJ, Plue L, et al. Performance of the PHQ-9 as a screening
tool for depression after stroke. Stroke 2005;36:635-638. Epub 2005 Jan 27.
|