Abstract
Survival and freedom from disability are arguably the most important patient-centered outcomes after surgery, but it is unclear how postoperative disability should be measured. The authors thus evaluated the World Health Organization Disability Assessment Schedule 2.0 in a surgical population.
The authors examined the psychometric properties of World Health Organization Disability Assessment Schedule 2.0 in a diverse cohort of 510 surgical patients. The authors assessed clinical acceptability, validity, reliability, and responsiveness up to 12 months after surgery.
Criterion and convergent validity of World Health Organization Disability Assessment Schedule 2.0 were supported by good correlation with the 40-item quality of recovery scale at 30 days after surgery (r = −0.70) and at 3, 6, and 12 months after surgery with physical functioning (The Katz index of independence in Activities of Daily Living; r = −0.70, r = −0.60, and rho = −0.47); quality of life (EQ-5D; r = −0.57, −0.60, and −0.52); and pain interference scores (modified Brief Pain Inventory Short Form; r = 0.72, 0.74, and 0.81) (all P < 0.0005). Construct validity was supported by increased hospital stay (6.9 vs. 5.3 days, P = 0.008) and increased day 30 complications (20% vs. 11%, P = 0.042) in patients with new disability. There was excellent internal consistency with Cronbach’s α and split-half coefficients greater than 0.90 at all time points (all P < 0.0005). Responsiveness was excellent with effect sizes of 3.4, 3.0, and 1.0 at 3, 6, and 12 months after surgery, respectively.
World Health Organization Disability Assessment Schedule 2.0 is a clinically acceptable, valid, reliable, and responsive instrument for measuring postoperative disability in a diverse surgical population. Its use as an endpoint in future perioperative studies can provide outcome data that are meaningful to clinicians and patients alike.
Abstract
In a multicenter, multinational study of over 500 patients, the World Health Organization Disability Assessment Schedule 2.0 was shown to be a clinically acceptable, valid, reliable, and responsive instrument for measuring postoperative disability in a diverse surgical population.
Supplemental Digital Content is available in the text.
Although survival is commonly measured after surgery, survival without disability is rarely measured, and it is unclear whether disability measures used in medical populations are appropriate to define disability after surgery
In a multicenter, multinational study of over 500 patients, the World Health Organization Disability Assessment Schedule 2.0 was shown to be a clinically acceptable, valid, reliable, and responsive instrument for measuring postoperative disability in a diverse surgical population
“To cure sometimes, to relieve often, to comfort always”
—Hippocrates
THE main aims of surgery are to cure or to at least relieve distressing symptoms from many conditions. Anesthetic and other perioperative research outcome measures have traditionally centered on surrogate endpoints and recovery times1 and, far less often, major complications and death.2,3 As exemplified in the PeriOperative ISchemia Evaluation (POISE) trial,4 it can be difficult to ascribe a relative weight or harm to outcomes such as myocardial infarction or stroke, particularly when the long-term consequences of these outcomes vary substantially. In addition, such endpoints may not reflect the patient’s perception of their subsequent health status after surgery.5,6
Previous research suggests that a return or maintenance of health, functional capacity, and emotional well-being are highly valued patient goals following surgery.2,7,8 Accordingly, contemporary anesthetic and other perioperative research sometimes includes patient-centered outcome measures such as quality of recovery7,9,10 and quality of life8,11,12 after surgery. But new or residual disability after surgery is of particular concern to patients and clinicians alike.
Current definitions of disability make distinction between the physical or mental impairment caused by a health condition and the impact that impairment has on the person’s ability to work, care for themselves, and interact with society.13,14 The World Health Organization (WHO) International Classification of Functioning, Disability and Health classifies disability as “difficulties in any area of functioning as they relate to environmental and personal factors.”15
An instrument used to measure postoperative disability should include, but not be limited to, an assessment of physical functioning or quality of life. Furthermore, rather than focusing specifically on the presence (or even extent) of symptoms, it should assess the impact of these symptoms on the patient’s life in the dimensions of psychological well-being, social involvement, life role activities, and cognitive well-being.16
Although it is tempting to use quality of life measures as a proxy for measuring postoperative disability, this approach is scientifically unsound, and there is currently no validated generic measure of long-term postoperative disability that accords with the WHO classification. The ideal instrument should be easy to administer, reliable, responsive to change, and be specifically validated in a surgical population.
The WHO Disability Assessment Schedule 2.0 (WHODAS) was developed to measure disability cross-culturally, in the aged, and for disease-related states.17 It asks about limitations over the last 30 days in six major life domains: cognition, mobility, self-care, interpersonal relationships, work and household roles, and participation in society. WHODAS has excellent psychometric properties, is easy to use and score, and is available on the public domain in self-report, proxy, and telephone-based versions that can be administered in around 5 min.18 WHODAS has been used to assess disability following trauma,19,20 stroke,21,22 spinal cord injury,23 and in those with numerous and varied chronic diseases.24 It has not, however, been specifically evaluated in a surgical setting.
The aim of this study was to evaluate WHODAS in a diverse surgical cohort with varying degrees of comorbid medical disease, disability, and health. A secondary aim was to characterize disability-free survival after surgery.
Materials and Methods
This multicenter prospective observational cohort study was conducted in five hospitals in Australia and Hong Kong, and institutional review board approval was sought and obtained at each site (see table 1, Supplemental Digital Content 1, https://links.lww.com/ALN/B131). We specifically aimed to recruit a diverse low- to high-risk surgical population in order to properly evaluate diagnostic utility.25 Patients were included in the study if aged 18 yr or over, able to provide informed consent, and were scheduled to have ambulatory, intermediate, major noncardiac, cardiac, or nonelective surgery. Patients were excluded if they were not expected to be available for follow-up over the following year, had poor language comprehension, known or suspected cognitive impairment, current psychiatric disease, or substance abuse. While patients having nonelective surgery were included, patients having time-critical surgery (e.g., requiring urgent transfer to the operating theater) were excluded due to insufficient time or patient ability to complete baseline testing and consent. Patients were excluded from analysis if they did not have surgery or if they were consented but no further data were collected.
In an effort to maintain unbiased sampling, a broad range of patients undergoing different types of surgery of varying extent were selected in consecutive order from operating theater booking lists. To increase study power, we planned to recruit a greater proportion of patients from the major noncardiac, cardiac, and nonelective surgery groups as they were more likely to have a complicated recovery after surgery, with a broader range of outcomes (including disability).
After providing informed consent, patients were given instructions in completing the predetermined standardized questionnaires, which they then completed without prompting from research staff. Efforts were made to follow-up all patients so that data were not lost from sicker or older patients.
We began our study with the intention to create a novel postoperative disability scale, but further literature review identified the WHODAS as being a likely valid measure that had as yet not been formally evaluated in a surgical population. We thus included this scale in our suite of perioperative measurements after commencement of the study and defined this as the revised primary aim of our study.
Following enrollment, patient medical and demographic data were collected and patients were provided with instructions in completing each of the health status questionnaires being used for validity testing:
The 12-item WHODAS.17
The 40-item quality of recovery (QoR-40) score,26 as a global, patient-centered measure of health status at 30 days after surgery.
The Katz index of independence in Activities of Daily Living (Katz ADL) scale,29 measuring physical functioning.
The modified Brief Pain Inventory Short Form (mBPI-sf),30 measuring daily pain.
The 12-item WHODAS (fig. 1) was scored as previously described.18,31 Numerical values were attributed to each item on a 5-point Likert scale: none = 0; mild = 1; moderate = 2; severe = 3; and extreme = 4. The total score, between 0 and 48, is then divided by 48 and multiplied by 100 to convert it to a percentage of the maximum disability score. One site (Hong Kong) scored WHODAS from 1 to 5 as originally described in the WHODAS user manual. This was resolved by subtracting 1 point from each WHODAS item score at this site. Missing data were handled according to guidelines in the WHODAS manual,18 whereby if a single item was missed, the mean value of the remaining items was assigned to the missed item. The WHODAS score was not calculated when more than one item was missed.
The 12-item World Health Organization Disability Assessment Schedule.18 Reproduced, with permission of WHO, from Measuring Health and Disability: Manual for WHO Disability Assessment Schedule. Geneva, World Health Organization, 2010 (WHODAS 2.0 12-item version self-administered www.who.int/classifications/icf/whodasii/en).
The 12-item World Health Organization Disability Assessment Schedule.18 Reproduced, with permission of WHO, from Measuring Health and Disability: Manual for WHO Disability Assessment Schedule. Geneva, World Health Organization, 2010 (WHODAS 2.0 12-item version self-administered www.who.int/classifications/icf/whodasii/en).
We considered a disability score of greater than or equal to 25% to indicate “disability,” based on the WHODAS and WHO International Classification of Functioning, Disability and Health: none (0 to 4%); mild (5 to 24%); moderate (25 to 49%); severe (50 to 95%); and complete (96 to 100%) disability.18 New disability was defined if a patient had an increase in the WHODAS score of greater than or equal to 8% from their preoperative assessment.31
A preoperative WHODAS was not included in the study procedures until 3 months after the commencement of the study, when we became aware of its potential utility in the perioperative setting. Up until that time, WHODAS had not featured in any surgical or anesthetic literature. As a result, a number of patients (n = 81) did not complete a preoperative WHODAS questionnaire. For those participants, we adopted strict criteria (preoperative scores of Katz ADL = 12, EQ-5D 100-point scale ≥ 80, and QoR-40 ≥ 180) to classify them as being free of baseline disability for some secondary evaluations. We tested these criteria in the complete cohort and only 16 out of 151 (10.6%) of participants with a preoperative WHODAS score of less than 10 were misclassified as having preoperative disability.
The QoR-40 is a validated 40-item questionnaire measuring quality of recovery following anesthesia and surgery.26,32 It consists of five dimensions: (1) physical comfort (12 items), (2) emotional state (nine items), (3) physical independence (five items), (4) psychological support (seven items), and (5) pain (seven items). The QoR-40 has a possible score of 40 (extremely poor quality of recovery) to 200 (excellent quality of recovery). Missing data were imputed by assigning the mean value of other items within that domain to the missing item. The EQ-5D has five dimensions, each ranked on a three-level scale as well as a 100-point scale where participants can rate their health from 0 (“worst imaginable”) to 100 (“best imaginable”). The Katz ADL scale contains six domains of physical functioning, each scored between 0 (“little or no difficulty”) and 2 (“unable”). The mBPI-sf has two parts. The first assesses “worst,” “least,” and “average” pain over the previous 24 h as well as pain “right now” from 0 (“no pain”) to 10 (“pain as bad as you can imagine”). The second part assesses the degree to which pain interferes with seven life domains and is again scored from 0 (“does not interfere”) to 10 (“completely interferes”). The mean interference score can be calculated if four or more of the seven items have been completed on a given administration.
Baseline patient surgical risk and health status were assessed by classifying patients according to the Portsmouth Physiological and Operative Severity Score for the enUmeration of Mortality and Morbidity (P-POSSUM) score,33,34 the American Society of Anesthesiologists’ physical status (ASA) score, and the Canadian Study of Health and Aging Clinical Frailty scale.35
The P-POSSUM score estimates the risk of postoperative morbidity and in-hospital mortality using defined physiological and operative variables and was calculated as previously described.33,36 Physiological and operative scores were summed and applied to formula: ln R/1 − R = −9.065 + (0.1692 × physiological score) + (0.1550 × operative severity score). As no radiological data were collected, a history of congestive cardiac failure was substituted for cardiomegaly in the P-POSSUM scoring system. Values for other missing data: electrocardiogram (n = 53); hemoglobin (n = 37); urea (n = 35); potassium (n = 37), sodium (n = 37), and heart rate (n = 31) were assumed to be normal as these were most likely not measured for healthy participants having minor procedures.
The Canadian Study of Health and Aging Clinical Frailty scale is a subjective measurement of patient frailty based on their appearance and history. Both the attending anesthesiologist and an investigator independently determined the patient’s level of frailty, and the average score was used to quantify the clinical level of frailty: no frailty, 1.0 to 3.9; vulnerable, 4.0 to 4.9; mild, 5 to 5.9; moderate, 6 to 6.9; and severe, ≥7.0. Where the anesthesiologist did not complete the clinical frailty score (n = 13), the investigator-determined score was used.
Intraoperative data were recorded by the anesthesiologist and included the type, extent, duration, and urgency of surgery and estimated blood loss. The extent of surgery was classified according to the P-POSSUM system (minor, intermediate, major, and major+).34 Nursing staff collected postoperative data, including temperature on arrival and length of stay in the postanesthesia care unit. For patients going directly to the intensive care unit, the duration of tracheal intubation was recorded from intensive care unit charts or discharge summary.
Discharge data, including the occurrence of postoperative complications, duration of hospital stay, and discharge destination, were collected from the patient’s medical record and the hospital electronic discharge system. Duration of hospital stay was calculated as the number of full days spent in hospital as an inpatient.
Patients were followed up with self-assessment questionnaires and by telephone at 30 days, 3, 6, and 12 months after surgery. At 30 days, we ascertained whether they had experienced any postoperative complications (including readmission to hospital, readmission to the intensive care unit, myocardial infarction, respiratory complications, stroke, wound infection, or intraabdominal collection). At subsequent follow-up times, we recorded patients’ current living situation (home with or without nursing assistance, rehabilitation, nursing home, or hospital). At each telephone interview, patients were asked to rate how worthwhile they felt their surgery was and the effect the surgery had on their lives, using 5-point Likert scales. The batch of questionnaires (WHODAS, QoR-40 [30 days only], EQ-5D, Katz ADL, and mBPI-sf) was sent to patients with a stamped self-addressed envelope for return postage.
Psychometric Evaluation of WHODAS
Psychometric evaluation of a health status instrument should occur in the population and setting of interest and include assessment of the clinical acceptability, validity, reliability, and responsiveness of the scale.16,37 The WHODAS has previously undergone extensive psychometric evaluation,17,18,24,31,38 but not in a surgical population.
Clinical acceptability was assessed by measuring WHODAS completion rates over time and the comparative completion rates of WHODAS and the other instruments at 12 months after surgery. Analysis was limited to 12-month completion rates for pragmatic reasons and because 12 months was considered a relevant time point for measuring long-term disability after surgery. The denominator for completion rate included all living patients remaining in the study that had not actively withdrawn or been lost to follow-up (i.e., participants answering calls but not returning surveys were included).
Content validity: WHODAS was developed and has been extensively validated as a responsive measure of health-related disability as defined by the International Classification of Functioning, Disability and Health.18
Concurrent (criterion) validity: WHODAS was compared to the QoR-40, Katz ADL, and mBPI-sf scales. Sensitivity analyses were done to explore whether correlations were modified by patient age or extent of surgery.
Convergent validity: WHODAS was compared to the EQ-5D 100-point quality of life health scale.
Construct validity:
a. Discriminative validity (construct validation by extreme groups): A good and poor quality of recovery at 30 days and good and poor quality of life at 3, 6, and 12 months were identified by using the upper and lower quartiles of the day 30 QoR-40 and EQ-5D 100-point scales, respectively. WHODAS scores were then compared between good and poor groups.
b. We measured the relationship between WHODAS and clinical variables likely to be associated with higher rates of disability after surgery: duration of hospital stay, complications, and unplanned readmission within 30 days after surgery.
Reliability was assessed by measuring internal consistency. An interitem correlation matrix was visually inspected before measuring Cronbach’s α and split-half reliability coefficients. In other words, we assessed the degree to which different items in the WHODAS scale agree with each other and with the overall measure of disability.
The repeatability of WHODAS has been evaluated extensively in previous studies17 and was not assessed in this study as it would not be expected to be different in this population.
The responsiveness, or the ability of WHODAS to detect a meaningful change in the clinical state of a patient, was quantified using the Cohen effect size.39 This is the mean difference in scores from baseline to the time point of interest, divided by the SD at baseline. The subgroup of patients with a baseline WHODAS score of less than or equal to 4% was used to define a group of patients with little to no preoperative disability. An effect size of greater than 0.8 was considered to provide strong evidence that the score is responsive to change in health status.
Following psychometric evaluation, disability-free survival was calculated as the percentage of participants who were both alive and had a WHODAS score of less than 25% at each time point after surgery. Further exploratory analyses of the surgical population were undertaken to examine the relationship between disability-free survival and patient age, medical comorbidity, and surgical type and extent.
Statistical Analysis
Our sample size calculation was based primarily on data from our previous quality of recovery studies,26,40 using MedCalc version 12.3.0 (Ostend, Belgium). To have a probability of greater than or equal to 80% to detect a relationship between two variables at a two-sided 0.05 significance level, looking for a greater than or equal to 15% change in the dependent variable, with an assumption that the SD of the independent and dependent variables is 4 (on a 10-point scale), required 350 patients. To account for possible ineligible or incomplete questionnaires, and to support subgroup exploratory analysis, we increased the sample size to at least 500 patients.
Data are presented as mean ± SD, median [interquartile range], number (%), or 95% confidence intervals. All percentages of 10 or more are rounded to the nearest integer. Associations were measured using Pearson correlation coefficients (r) or Spearman rank correlation (rho) for nonnormal data. When comparing scales with reverse direction of scores indicating improved health, resultant correlations will be negative. Associations for ordinal data were measured using chi-square for trend. Internal consistency was measured using split-half reliability and Cronbach’s α.41 Changes in numerical data from baseline were compared using the paired t test. Interrater agreement was measured using Cohen’s κ coefficient.42 The null hypothesis was rejected if the two-tailed P was less than 0.05. All analyses were performed using SPSS for Windows v22.0 (SPSS Inc., Chicago, IL).
Results
Patient demographics (table 1) indicate a typical population of patients presenting to a university hospital for a broad range of surgical procedures. The mean age of patients was 56 yr (range, 18 to 90 yr), and 42% had an ASA score of III or IV. There was a high level of agreement between an investigator- and attending anesthesiologist–determined clinical frailty assessments, κ 0.60 (P < 0.0005). Additional results are provided in Supplemental Digital Content 1, https://links.lww.com/ALN/B131, tables 1–13.
Despite only 4.5% of patients having mild or moderate frailty before surgery (table 1), there was a high level of preoperative disability, with 115 patients (27%) having a WHODAS score greater than or equal to 25%. The mean P-POSSUM predicted in-hospital mortality was 2.0%. There was also a broad range of surgical type and extent (table 2), including 42 patients (8.3%) undergoing nonelective surgery. The median length of stay was 5 days (interquartile range, 2 to 8), and the majority of patients (90%) were discharged home or to a rehabilitation facility (5.8%). By day 30, 15% of patients had at least one postoperative complication, 5 patients (1%) had died, and 35 patients (7.2%) had an unplanned readmission to hospital (table 3).
Of the 510 patients enrolled in the study, 68 (13%) had either withdrawn or been lost to follow-up at 6 months and 72 (14%) by 12 months. When comparing the baseline characteristics of patients with complete data at 6 and 12 months to patients with incomplete data (withdrawn or lost to follow-up), those with incomplete data were more likely to be female, having more minor surgery, with lower P-POSSUM scores but higher rates of preoperative disability (see tables 2 and 3, Supplemental Digital Content 1, https://links.lww.com/ALN/B131, which are tables describing baseline demographics for patients with complete and incomplete data at 6 and 12 months).
WHODAS demonstrated good clinical acceptability with completion and postal return rates of greater than or equal to 88% at all time points. At 12 months after surgery, WHODAS, Katz ADL, and EQ-5D all had 92% completion and postal return rates. Five patients had a missing value for WHODAS, allowing valid imputation.
The correlation between preoperative WHODAS, QoR-40, and Katz ADL was tested to explore the relationship between baseline disability (WHODAS has been extensively validated in nonsurgical patients) and the instruments proposed for subsequent postoperative criterion validity testing. As expected, there was moderate correlation with the Katz ADL scale (r = −0.56, P < 0.0005) and QoR-40 score (r = −0.60, P < 0.0005). Preoperative WHODAS had modest correlation with the Canadian Study of Health and Aging Clinical Frailty scale (rho = 0.28, P < 0.0005) and ASA score (rho = 0.22, P < 0.0005). There was no correlation (r = 0.01) between patient age and preoperative disability (see table 4, Supplemental Digital Content 1, https://links.lww.com/ALN/B131, which is a table describing the preoperative correlations between WHODAS and other health assessment scales).
Concurrent validity of WHODAS in the early postoperative period was tested by measuring its correlation with the QoR-40 score at day 30. There was moderate to strong correlation globally, with r = −0.70 (P < 0.0005), and with each dimension of the QoR-40 scale, although as could be expected disability was less related to perceived patient support (table 4). The correlation between WHODAS and QoR-40 was maintained in subgroups stratified by extent of surgery and patient age (see table 5, Supplemental Digital Content 1, https://links.lww.com/ALN/B131, which is a table describing the day 30 correlation between WHODA and QoR-40).
The Association between WHODAS and the Five Dimensions of the QoR-40 Scale at 30 Days after Surgery (n = 298) and the EuroQOL EQ-5D Scale at 3, 6, and 12 Months after Surgery

Concurrent validity was further assessed by the correlation between the Katz ADL and WHODAS at 3, 6, and 12 months, with r = −0.61, r = −0.60, and rho = −0.47, respectively (all P < 0.0005). Again correlation was maintained when stratified for extent of surgery and patient age (see table 7, Supplemental Digital Content 1, https://links.lww.com/ALN/B131, which is a table describing correlations between WHODAS and Katz ADL).
There was a good correlation between WHODAS and the mBPI-sf pain scores (table 5), and strong correlation with mean pain interference scores, which increased over time at day 30 (r = 0.69), 3 months (r = 0.72), 6 months (r = 0.74), and 12 months (r = 0.81) after surgery (all P < 0.0005).
Correlations between WHODAS and the Modified Brief Pain Inventory at Day 30 and at 3, 6, and 12 Months after Surgery

Convergent validity was tested by the correlation between WHODAS and EQ-5D 100-point scale over time at day 30 (r = −0.55), 3 months (r = −0.57), 6 months (r = −0.60), and 12 months (r = −0.52) (all P < 0.0005) (table 4).
Discriminative validity was excellent. WHODAS was able to discriminate between those with a good and poor quality of recovery after surgery at day 30 and quality of life at 3, 6, and 12 months (table 6).
A Comparison of WHODAS Scores for Those with a Poor or Good Recovery and Quality of Life, Both Defined by the Upper and Lower Quartiles for QoR-40 and EQ-5D 100-Point Scale, Respectively

Construct validity was further assessed by comparing length of stay and complications in patients with and without new disability at day 30. Those with new disability had a longer hospital stay (median 6.89 vs. 5.34 days, P = 0.008) and were more likely to have a complication (20% vs. 11%, P = 0.042). There was a nonsignificant increase in unplanned hospital readmission (9.8% vs. 4%, P = 0.06) in patients with new disability. The direction and magnitude of change in WHODAS score at day 30 also varied with the type of complication, with a mean decrease in disability score of 16% (95% CI, −28 to −3.8, P = 0.01) in patients with postoperative myocardial infarction, and a mean increase of 68% (95% CI, 33 to 100, P < 0.0005) in patients with a postoperative stroke (see table 10, Supplemental Digital Content 1, https://links.lww.com/ALN/B131, which is a table describing the association between day 30 complications and change in day 30 WHODAS score from baseline). Of the patients that had a myocardial infarction by day 30, 9 of 12 had undergone cardiac surgery. As such, the observed decrease in WHODAS may reflect improved early postoperative function in the cardiac surgery cohort. As expected, older patients were more likely to develop disability postoperatively, with r = 0.19, 0.21, and 0.22 at 3, 6, and 12 months, respectively (all P < 0.0005).
WHODAS demonstrated excellent reliability. The interitem correlation matrix for WHODAS at 6 months is shown in table 7 and demonstrated good correlation between items with no evidence of item redundancy, indicated by almost all interitem correlations between 0.4 and 0.8. Similar results were obtained for the interitem matrices at day 30 and 3 and 12 months after surgery (results not shown). Cronbach’s and split-half coefficients greater were than 0.90 at all time points (table 8). The Cohen effect size was very high at all times demonstrating excellent responsiveness (table 8).
WHODAS had very good scaling properties. The 10th, 25th, 50th, 75th, and 90th centiles were 0, 0, 2.1, 17, and 33, respectively. A floor effect was present,43 with more than 40% of patients having little or no disability at 6 months, but otherwise there was very good spread of data. The scaling properties are demonstrated in figure 2, with 40% of patients having a score of zero and 85% of patients having a WHODAS score of less than 25%.
The cumulative percentage of World Health Organization Disability Assessment Schedule (WHODAS) scores at 6 months after surgery, depicting its scaling properties.
The cumulative percentage of World Health Organization Disability Assessment Schedule (WHODAS) scores at 6 months after surgery, depicting its scaling properties.
Disability-free survival at day 30 and at 3, 6, and 12 months after surgery was 72% and 74%, 80%, and 76%, respectively. When analyzing disability-free survival, disability had a greater contribution to disability-free survival rates than patient mortality (table 9). The pattern of recovery after surgery varied according to the patient’s ASA physical status (fig. 3), with higher rates of disability-free survival for patients with lower ASA scores at all times (P for trend <0.0005). Compared to their preoperative state, ASA I and II patients tended to have less disability by day 30 and continued to improve out to 6 months. By contrast, ASA III and IV patients tended to have a more delayed recovery, with decreased disability-free survival at day 30 and significant recovery not occurring until 3 months. In general, all ASA groups plateaued by 6 months, with only slight decreases in disability-free survival afterward.
Disability-free survival after surgery according to American Society of Anesthesiologists (ASA) physical status score.
Disability-free survival after surgery according to American Society of Anesthesiologists (ASA) physical status score.
Disability-free survival and new disability also varied according to the type of surgery, with the lowest rates of disability-free survival at 6 months being in patients having orthopedic (67%) or neurosurgery (58%), and the highest rates of new disability occurring in patients having thoracic surgery (see table 12, Supplemental Digital Content 1, https://links.lww.com/ALN/B131, which is a table comparing the rate of disability-free survival and new disability at 6 months according to the type of surgery).
Discussion
This study was able to confirm that WHODAS retains its excellent psychometric properties found in community and medical populations when measuring disability in an adult surgical population. The broad range of patient demographics, medical comorbidities, surgical type and extent, and consistent psychometric indices in selected strata offer strong support for the generalizability of our findings to other surgical settings.
Overall, we had a very good participant retention rate at 12 months after surgery (85%) and excellent clinical acceptability as reflected by WHODAS completion rates between 88% and 92% via postal survey. It is likely that patient acceptability would be further improved if WHODAS was completed as the sole telephone survey instrument instead of being one of several postal surveys.
In the absence of a “definitive standard” patient-centered long-term outcome measure after surgery, the correlation of WHODAS with existing well-validated health status instruments (QoR-40, Katz ADL, EQ-5D, and mBPI-sf) that measure related but different constructs was used to assess validity. As expected, there was moderate but not high correlation (r = 0.5 to 0.7) between scores, supporting a conclusion that these scales do not assess the same construct; if so, WHODAS would seem redundant. There was strong correlation between the WHODAS score and the mean pain interference score of the mBPI-sf. This correlation increased with time after surgery and may be indicative of the influence chronic postsurgical pain has on persistent postoperative disability.
Construct validity testing revealed a number of interesting relationships between disability and patient characteristics in a surgical population. WHODAS demonstrated good discriminative validity, being able to clearly distinguish between patients with good or poor recovery at 30 days, and self-rated quality of life at 3, 6, and 12 months. Although there was modest correlation between preoperative disability and ASA physical status, there was no correlation between age and preoperative disability. By contrast, it is known that disability tends to increase with age in the community setting.31 The lack of correlation in our study is almost certainly a true finding rather than a problem with performance of WHODAS in a surgical population, as these scores were established before surgery. By contrast, there was modest correlation (r = 0.17 to 0.21) between patient age and postoperative disability. This finding should be consistent with most clinicians’ experience, knowing that older patients having surgery are more likely to develop difficulties after surgery.3,44,45
Orthopedic patients had a low rate (67%) of disability-free survival at 6 months after surgery. This may reflect our cohort in that three of the recruiting hospitals have a trauma focus. However, 18 of the 75 orthopedic patients had elective hip or knee arthroplasty, and these patients had even lower rates (53%) of disability-free survival at 6 months. Orthopedic patients seem to have poorer rates of disability-free survival than anticipated, most probably due to persistent postsurgical pain in this group.46,47
There was modest correlation between preoperative frailty and disability measures. There are several reasons why this correlation is lower than one might intuitively expect. Disability and frailty are different constructs and while most frail people are likely to have at least some disability, the reverse may not be true. In addition, poor correlation may exemplify the difference between subjective clinician- or investigator-rated scales and patient-rated assessments. Finally, the modest correlation may reflect the fact that there was a low rate of frailty in our cohort, thus limiting study power for this evaluation. As an aside, we were able to demonstrate a high level of agreement between anesthesiologist- and investigator-determined clinical frailty.
Although the study was not powered to determine the discriminant validity of WHODAS to reflect the impact of postoperative complications, patients with a new disability after surgery were more likely to have had one or more postoperative complications and also had a longer hospital stay. There was also a trend toward increased unplanned readmission in patients with new disability. The relationship between the type of postoperative complication and subsequent change in WHODAS score highlights a major benefit of using a patient-centered outcome measure over traditional unweighted cardiovascular endpoints. The two patients that suffered a postoperative stroke had substantially increased disability, whereas the reverse was true for patients deemed to have had a postoperative myocardial infarction. Although most myocardial infarctions occurred in patients after cardiac surgery, this finding still demonstrates a potential problem when using traditional outcome measures, in that from a patient’s perspective, stroke is likely to be a much more serious and disabling complication than myocardial infarction.
As in previous studies in other settings, WHODAS was found to be highly reliable and very responsive to change.17 In our study, an expected floor effect was demonstrated, with 40% of patients having a WHODAS score of 0% (i.e., no disability) at 6 months after surgery. However, this is similar to WHODAS scaling properties in the general population31 and probably reflects a true incidence of people with no measurable disability rather than a problem with the lower end of the scale. Indeed, there was a good distribution of scores across the remainder of the scale.
While health-related quality of life is an important outcome measure in its own right,8,48 such measures cannot be simply dichotomized, are not designed for repeat testing or to be responsive to change, are heavily influenced by social and economic circumstances, and may overlook important aspects of functional independence. Of greater relevance, however, is that the general aims of surgery and other interventional procedures are to cure or relieve symptoms of a disease state. Survival and freedom from disability, therefore, should be measured after surgery.49 Around 20% of elderly patients have one or more serious complications after surgery.3 Many more never fully recover after their surgery and seem to have accelerated disability in the months and years that follow.50–52
Postoperative disability, as measured by WHODAS, is a valid and reliable clinical endpoint that is well suited for future anesthetic and surgical research. The WHODAS is simple to use and interpret and meaningful to clinicians and patients alike. The high rates of clinically significant preoperative (27%) and postoperative (16 to 22%) disability mean that comparative studies using disability as an endpoint would require modest sample sizes to obtain adequate statistical power. Bearing in mind that the current study contained patients with a mixed risk profile, disability rates can be expected to be higher in clinical trials enrolling high-risk surgical patients.
At present, investigators designing randomized trials have tended toward combining several complications or outcomes into one composite endpoint to increase the event rate and thereby decrease the sample size required to demonstrate a clinically important effect. This can be problematic.53 Composite endpoints can be misleading when one of the outcomes in the composite has a higher incidence than the others or otherwise carries significantly less patient burden. Adopting disability-free survival as a primary endpoint in clinical studies should circumvent this problem.
Disability-free survival is an ideal study endpoint as it reflects the primary goal for most patients undergoing major surgery and can aid shared decision-making in surgical care.54 It can be used as a single primary endpoint, and when using survival analysis, it has enhanced statistical power. It is particularly suitable for clinical trials in which groups have comparable baseline risk. It may be more difficult, however, to observe a clear disability signal when observing a group of patients with a mixed risk profile, having surgery of varying extent and type due to likely confounding effects. In this situation, it may be more useful to measure either rates of new disability or a significant change in WHODAS score. In addition, WHODAS would be an ideal measure for ongoing audit and clinical quality improvement processes.
Limitations of the Study
This study may be subject to nonresponder bias as patients who withdrew or were lost to follow-up were also more likely to have clinically significant preoperative disability. While some of these patients may have improved postoperatively, it is possible that missed patients may have developed worse disability and that the rates of disability were underestimated. This potential bias was minimized by low overall withdrawal and loss to follow-up rates. We acknowledge that not all postoperative disability may be directly attributable to the index surgery. This is particularly true at 6 and 12 months after surgery when intervening unrelated life events may result in overestimation of surgery-induced postoperative disability. On the other hand, the stress of surgery may precipitate a series of unrelated morbid events because of the patient’s vulnerable status—the so-called post-hospital syndrome.50 Of note, 54 (11%) patients in our study had further planned or unrelated procedures in the 12 months following their index surgery.
Based on previous literature,18 we used a WHODAS score of greater than or equal to 25% to define clinically significant disability and the change in WHODAS score of greater than or equal to 8% to define a minimal clinically important difference.31 These cut points require further verification to ensure they correspond to clinically meaningful endpoints in surgical populations.
In conclusion, WHODAS is a clinically acceptable, valid, reliable, and responsive instrument for measuring disability in a surgical population. Freedom from disability after surgery is a meaningful outcome for clinicians and patients alike. We recommend disability-free survival as an important endpoint in clinical trials.
Acknowledgments
The authors thank Ed O’Loughlin, M.B., B.S., F.A.N.Z.C.A., M.Clin.Research (Fremantle Hospital and University of Western Australia), Kate Turnahan, B.Sc.(nurs), M.N. (Fremantle Hospital, Western Australia), Daniel Myles, B.A., G.Dip.Psych. (Monash University, Melbourne, Australia), and Marie Backstrom, B.Sc.(nurs) (Monash Medical Centre, Clayton, Australia), for their assistance with data collection.
This study was supported by The Australian and New Zealand College of Anaesthetists (Melbourne, Victoria, Australia) and a Direct Grant for Research (4054079). The Chinese University of Hong Kong (Shatin, New Territories, Hong Kong) provided funding for this work.
Competing Interests
The authors declare no competing interests.