Abstract
An accurate risk score able to predict in-hospital mortality in patients undergoing surgery may improve both risk communication and clinical decision making. The aim of the study was to develop and validate a surgical risk score based solely on preoperative information, for predicting in-hospital mortality.
From January 1, 2010, to December 31, 2010, data related to all surgeries requiring anesthesia were collected from all centers (single hospital or hospitals group) in France performing more than 500 operations in the year on patients aged 18 yr or older (n = 5,507,834). International Statistical Classification of Diseases, 10th revision codes were used to summarize the medical history of patients. From these data, the authors developed a risk score by examining 29 preoperative factors (age, comorbidities, and surgery type) in 2,717,902 patients, and then validated the risk score in a separate cohort of 2,789,932 patients.
In the derivation cohort, there were 12,786 in-hospital deaths (0.47%; 95% CI, 0.46 to 0.48%), whereas in the validation cohort there were 14,933 in-hospital deaths (0.54%; 95% CI, 0.53 to 0.55%). Seventeen predictors were identified and included in the PreOperative Score to predict PostOperative Mortality (POSPOM). POSPOM showed good calibration and excellent discrimination for in-hospital mortality, with a c-statistic of 0.944 (95% CI, 0.943 to 0.945) in the development cohort and 0.929 (95% CI, 0.928 to 0.931) in the validation cohort.
The authors have developed and validated POSPOM, a simple risk score for the prediction of in-hospital mortality in surgical patients.
An accurate risk score for in-hospital mortality is needed to guide future research, as well as preoperative decision making.
This multicenter study examining in-hospital mortality in over 5.5 million patients in France in a 1-yr period identified a 17-variable, highly sensitive, and specific risk calculator for in-hospital mortality
WORLDWIDE, more than 230 million major surgical procedures are carried out each year.1 Although in-hospital death after surgery occurs infrequently in the general population,2,3 among particular subgroups of patients, the rate can be greater than 1 in 10.4,5 In the light of this substantial mortality risk, it is important that the patient, their family, and the attending physician are able to accurately and objectively predict the preoperative risk (probability) of in-hospital mortality or the risk of developing other major perioperative complications (e.g., myocardial infarction, heart failure, or sudden cardiac death). Accurate risk prediction therefore becomes crucial for communicating operative risk to patients, guiding clinical decision making and management, and for forming realistic expectations of the value of undergoing surgery. Other uses include provider profiling as risk adjustment to account for differences between centers6,7 or in the design and analysis of clinical trials.8,9
Existing risk scores, such as the American Society of Anesthesiologists (ASA) physical status score10 and the Physiological and Operative Severity Score for the enUmeration of Mortality and Morbidity (POSSUM) scoring system, which have been externally validated for predicting the probability of in-hospital mortality after surgery have considerable limitations.11 The ASA score is based on physicians subjective assessment of a patient’s preoperative clinical status and categorizes patients into five broad risk groups.12 Although widely used worldwide, the ASA score does not consider the type of surgery the patient will undergo, makes no adjustment for age, and is based on subjective criteria; all factors contributing to its very limited accuracy13,14 and reliability.15 The POSSUM score has been widely validated16–18 but includes a number of risk factors collected at discharge (e.g., operative blood loss and the presence of malignancy) that precludes its preoperative use. Although other risk scores for predicting in-hospital or 30-day mortality after surgery have been developed, these have overwhelmingly been surgery-specific tools for patients undergoing cardiac surgery.19,20
Therefore, to date, there are very few risk scores that use objective preoperative patient information to predict postoperative in-hospital mortality for patients scheduled for any type of surgery. The objective of this study was therefore to develop and validate a preoperative risk score that could be used to predict postoperative in-hospital mortality, based on objective and readily available preoperative clinical information.
Materials and Methods
Study Population
Since 1996, all French hospitals, both public and private, caring for medical, surgical, and obstetric patients have been required to submit anonymous patient data to the National Hospital Discharge Data Base (NHDBB). As such, the NHDBB has complete information on all patients, in all centers. Each discharge summary submitted to the NHDBB is linked to a national grouping algorithm leading to a French Diagnosis-Related Group,21 thereby allowing patient comorbidities to be recorded and linked.22 As required by the French Protection Act concerning the use of anonymized hospital data, we obtained permission to access this database from the “Commission Nationale de l’Informatique et des Libertés” (Paris, France).
Selection of Patients and Procedures
We identified all surgical procedures, requiring anesthesia (i.e., surgical procedures conducted in the presence of an anesthesiologist or under their supervision, whether this did or did not include an anesthetic intervention), performed in France on patients aged 18 yr or older during a 1-yr period, from January 1, 2010, to December 31, 2010. From this cohort, we selected those centers that performed more than 500 surgical procedures of any type during the 1-yr period. A center is defined as a single hospital or a group of hospitals sharing the same administration. For each patient record, the following information was extracted: sex, age (years), length of hospital stay (days), primary diagnosis (i.e., the reason for the hospital stay as defined by attending physicians), and patient medical history—coded according to the International Classification of Diseases, 10th Revision (ICD-10).19 All medical (e.g., postoperative mechanical ventilation and postoperative dialysis) or surgical (e.g., reintervention related to complication and subsequent intervention(s) during the same stay) procedures performed during the hospital stay were recorded using the French classification for medical procedures “Classification Commune des Actes Médicaux” (CCAM).20 The surgical procedure used for the analysis is the index operation, which reflected the planned surgical procedure at admission. For patients with multiple surgical procedures during the same stay, the index procedure was defined as the first one performed during the stay. Subsequent surgeries and/or medical procedures were recorded, but they were not used in the analysis, as they were likely to be not planned at admission.
Definition of Predictors
The ICD-10 disease categorization is complex and not commonly used in clinical practice. We therefore aggregated codes into broader disease groups to resemble the clinical observations commonly recorded during preoperative assessment (see Supplemental Digital Content 1, https://links.lww.com/ALN/B228). To ensure the validity of this aggregation, the process was conducted independently by three physicians with expertise in ICD-10 code management (Y.L.M., C.L.B.-B., and P.L.). Full consensus between the experts was required to include a code in a disease group. Furthermore, the experts also determined whether the disease groups, or single ICD-10 codes, were related to preoperative comorbidities or rather to postoperative adverse events. Any codes for which full agreement among experts was not obtained were excluded from the list of potential predictors.
Endpoint Definition
The primary endpoint was in-hospital mortality, defined as death after surgery and before discharge, regardless of the length of stay. Deaths occurring after hospital discharge were not considered and for the purpose of the analyses were censored, and patients were recorded as alive.
Statistical Methods
Data are expressed as mean ± SD, number (percentage), odds ratio, and 95% CI. All P values were two tailed, and a P value less than 0.05 was considered significant.
We randomly divided the entire national cohort of French patients who underwent surgery in 2010 into two cohorts: one to develop the model and another to validate the model. Randomly splitting at the patient level is an inefficient approach to develop and validate a prediction model.23–25 Random splitting a single data set at the patient level creates two cohorts that are very similar, except for random variation. Consequently, for large sample sizes, the predictive performance of the model will be very similar when evaluated in both the derivation and the validation cohorts and is thus hardly a strong test of the internal validation of the model.26 We, therefore, split the cohort at the center level (i.e., one hospital or several hospitals sharing the same administration) where the surgery was carried to produce derivation and validation cohorts. The centers were randomly selected to be included either in the derivation or in the validation cohort. Within a center, all patients were then either in the derivation or in the validation cohort.
We constructed a multivariable logistic model that included age, all preoperative variables (listed in Supplemental Digital Content 1, https://links.lww.com/ALN/B228), and predefined surgical subgroups (listed in Supplemental Digital Content 2, https://links.lww.com/ALN/B229) to predict an individual patient’s probability of in-hospital mortality. We modeled age as both linear and nonlinear (using fractional polynomials and general additive models), but no sufficient gain in predictive performance of the model was identified and age was therefore retained as linear in the model. The large size of the derivation cohort made it likely that almost all variables examined would show a significant association with the primary outcome, resulting in an overly complex and clinically unusable final model. Therefore, we did not consider variables that occurred in the derivation cohort with a frequency less than 0.1% as potential predictors.
We developed an easy to use score (referred to as POSPOM [PreOperative Score to predict PostOperative Mortality]) from the derivation cohort using the approach described by Sullivan et al.27 This approach assigns a positive score to less healthy risk factor states and a negative score to reflect healthier states. To develop a clinically meaningful score, a single point was made to represent a standard increase in risk (i.e., a unit of risk). We defined this unit of risk as that risk associated with a 5-yr increase in age. This “amount” of risk is represented by the logistic regression coefficient associated with a 5-yr increase in age (βage5).
Based on the results of the logistic regression model, a score of 0 point was assigned to the surgical group with the lowest risk. Points were then assigned to all other groups by dividing their regression coefficients with the 5-yr age coefficient (i.e., βage5). Individual patient risk was then estimated by calculating the sum of both the medical history of the patient and the surgical subgroup score.
We used both the derivation and the validation cohorts to evaluate the performance of POSPOM for predicting in-hospital mortality. Performance was evaluated by assessing the calibration and discrimination of both models. Calibration was assessed graphically by plotting the observed outcome against the predicted in-hospital probability. A smooth, nonparametric calibration line was created with the LOESS algorithm (i.e., a locally weighted scatterplot smoothing) to estimate the observed probabilities of in-hospital mortality in relation to the predicted probabilities.28–30 Discrimination was quantified by calculating the concordance statistic (c-statistic). Additional performance measures include the Yates slope (the difference in mean-predicted probabilities between those with and without the outcome), Brier score (squared difference between patient outcome and predicted risk), and scaled Brier score (scaled by the maximum Brier score and ranging from 0 to 100%). The performance of the models predicting the secondary composite endpoints was also evaluated in the validation cohort. All statistical analyses were carried out in R software (version 3.1) (http://www.r-project.org, last date accessed, July 21, 2015).
Results
Between January 1, 2010, and December 31, 2010, 7,059,447 eligible patients from 1,107 centers in France were recorded in the NHDBB. Of these, 5,507,834 patients met eligibility criteria and 2,717,902 (from 479 hospital centers) were allocated to the derivation cohort and 2,789,932 (from 479 hospital centers) were allocated to the validation cohort. The flowchart of the study is summarized in figure 1.
In-hospital mortality after surgery was 0.47% (95% CI, 0.46 to 0.48%) (12,786 deaths) in the derivation cohort and 0.54% (95% CI, 0.53 to 0.55%) (14,933 deaths) in the validation cohort. Baseline characteristics and surgical procedures distributions in both cohorts are shown in table 1.
Final Model
In the derivation cohort, 29 potential predictors (27 binary, 1 continuous, and 1 categorical variable) were evaluated for model inclusion. Of these, only 17 were retained in the final logistic model: age, ischemic heart disease, cardiac arrhythmia or heart blocks, chronic heart failure or cardiomyopathy, peripheral vascular disease, dementia, cerebrovascular disease, hemiplegia, chronic obstructive pulmonary disease, chronic respiratory failure, chronic alcohol abuse, cancer, diabetes, transplanted organ(s), chronic dialysis, chronic renal failure, and type of surgery. Final logistic model characteristics and the POSPOM are presented in tables 2 and 3 and in Supplemental Digital Content 3, https://links.lww.com/ALN/B230. The range of attainable POSPOM is 0 to 50; the range observed in the derivation set was 0 to 45.
Model Performance
POSPOM showed excellent discrimination (c-statistic: 0.944 [95% CI, 0.943 to 0.945], Brier score: 0.004 [95% CI, 0.004 to 0.004], scaled Brier score: 4.06% [95% CI, 3.76 to 5.18%], and Yates slope: 0.058 [95% CI, 0.057 to 0.059]) in the derivation cohort. In the validation cohort, similar performances (c-statistic: 0.929 [95% CI, 0.928 to 0.931], Brier score: 0.005 [95% CI, 0.005 to 0.005], scaled Brier score: 4.31% [95% CI, 3.25 to 5.71%], and Yates slope: 0.058 [95% CI, 0.057 to 0.060]) were observed. Inspection of calibration plot (fig. 2) demonstrates that POSPOM has good calibration with only a small underestimation of in-hospital mortality in the validation cohort for predicted probabilities ranging from 1 to 10%. The concordance between the predicted probabilities from the logistic regression model and the POSPOM scoring system was high (Lin concordance = 0.99), indicating little loss of predictive information when simplifying the logistic regression model to the POSPOM scoring system. In the validation cohort, POSPOM score equal to 30 (i.e., predicted in-hospital mortality = 7.40%) was associated with an observed in-hospital mortality of 6.74% (95% CI, 6.40 to 7.08%). The distribution of POSPOM and the associated observed in-hospital mortality in the validation cohort are shown in figure 3. POSPOM values less than or equal to 20 were associated with a probability of in-hospital mortality less than or equal to 0.04% (i.e., less than the in-hospital mortality observed in the full population—the average risk); a POSPOM value of 25 equates to a probability of in-hospital mortality of 1.73% (i.e., about three times the average risk), and POSPOM values between 30 and 40 equate to a probability of in-hospital mortality of, respectively, 5.65 and 11.77% (i.e., 20 times the average risk).
Calibration plot: predicted versus observed in-hospital mortality for POSPOM (PreOperative Score to predict PostOperative Mortality) in the validation cohort (n = 2,789,932). Observed in-hospital mortality (bold line) with 95% CI for deciles of risk (triangles) and smooth. Nonparametric calibration line (dash line) was created with the LOESS algorithm (i.e., a locally weighted scatterplot smoothing).
Calibration plot: predicted versus observed in-hospital mortality for POSPOM (PreOperative Score to predict PostOperative Mortality) in the validation cohort (n = 2,789,932). Observed in-hospital mortality (bold line) with 95% CI for deciles of risk (triangles) and smooth. Nonparametric calibration line (dash line) was created with the LOESS algorithm (i.e., a locally weighted scatterplot smoothing).
Distribution of the POSPOM (PreOperative Score to predict PostOperative Mortality) values in the validation cohort (n = 2,789,932) in relation to the observed in-hospital mortality rate (solid line) at each POSPOM value.
Distribution of the POSPOM (PreOperative Score to predict PostOperative Mortality) values in the validation cohort (n = 2,789,932) in relation to the observed in-hospital mortality rate (solid line) at each POSPOM value.
Discussion
Our study demonstrated that a score using only clinically relevant preoperative patient characteristics (i.e., POSPOM) can accurately predict postoperative in-hospital mortality.
Traditionally, the risk associated with different types of surgery has been determined by observing the rate of outcome events after each type of surgery.31 However, this approach does not determine how much of the event rate is due to the surgery itself, and how much is due to the medical comorbidities of the patients undergoing the surgery. In this study, we have shown tremendous variation in patient risk within surgical categories, much of which was driven by preexisting medical conditions. For example, the unadjusted risk of in-hospital mortality after major orthopedic surgery related to trauma was 3.46% (95% CI, 3.29 to 3.63%) compared with 1.09% (95% CI, 1.02 to 1.12%) for minor vascular surgery. After adjusting for preexisting medical comorbidities, the risk of in-hospital mortality after an endovascular procedure was very similar to that of major orthopedic surgery (table 3). These findings suggest that traditional surgery-specific risk estimation based only on the observed postoperative outcome rate does not provide a true reflection of surgical risk.
During a 1-yr period in 2010, every patient in France, requiring anesthesia in a center performing more than 500 procedures per year, was included to develop the POSPOM. This combination of a sufficiently long period of inclusion and lack of patient selection resulted in a large cohort guaranteeing accurate estimates of postoperative in-hospital mortality in the overall population.
Although the inclusion of intraoperative events or early postoperative complications would have improved the performances, a preoperative risk score for providing patients and clinicians with an estimate of in-hospital mortality after surgery precludes inclusion of intra- and postoperative risk factors. Accordingly, the POSPOM only included preoperative risk factors.
Comparison of Postoperative Mortality with Existing Studies
In our population of 7 million nonselected patients, we observed a postoperative in-hospital mortality of 0.5%. This contrasts with several large-scale studies that have reported postoperative mortality rates ranging from 1.3 to 4%.2,5,32,33 For example, the recent European Surgical Outcomes Study (EuSOS) conducted in 28 European countries reported a mortality rate as high as 4% in their sample of nonselected patients.5 However, as noted by the authors of the EuSOS, this value was noticeably higher than those reported in previous studies.2,34 Their sampling strategy, which included a 7-day observation period, is likely to have affected the case mix. Furthermore, the small proportion of European hospitals in the sample, with an overrepresentation of university hospitals in some countries, is likely to have contributed to an overestimation of in-hospital mortality. The POSPOM study, however, involved all centers, both private and public, that carried out 500 or more surgical procedures during 2010, in France, and is therefore more likely to be a true reflection of in-hospital mortality in France.
The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) also provides substantial information on postoperative mortality.32 Between 2005 and 2007, the 30-day mortality was 1.3% in about 300,000 patients from 200 participating hospitals.2 As with the EuSOS, the sampling strategy, selection of centers and procedures, and duration of follow-up are likely to explain the difference in the observed mortality between their study and our study.35
Similarly, the Veterans Administration Surgical Quality Improvement Program (VASQIP) database (January 2005 to August 2010) included 136,745 patients treated at 104 centers. Among them 1,568 patients (1.1%) sustained the primary 30-day mortality outcome. The two-fold increase in the observed mortality is likely due to the use of 30-day mortality as an outcome, as well as the type of surgical subspecialties included in the study. Only seven surgical subspecialties (vascular, general, neurosurgery, orthopedics, thoracic, urology, and otolaryngology) were recorded, and some procedures requiring anesthesia such as ophthalmologic surgery or gastrointestinal endoscopy were not considered. The ratio between postoperative myocardial infarction frequency and in-hospital mortality in our study (0.16/0.47 = 0.36 in the development cohort and 0.15/0.54 = 0.29 in the validation cohort) was comparable with the Q-wave myocardial infarction frequency and 30-day mortality ratio observed in the VASQIP database (0.3/1.1 = 0.27).33
In the VISION cohort study, in-hospital mortality was 1.38% (95% CI, 1.18 to 1.57%).36 In the POSPOM study, when non–same-day surgery patients, older than 45 yr were selected, in-hospital mortality was 1.38% (95% CI, 1.36 to 1.40%) in the derivation cohort and 1.51% (95% CI, 1.49 to 1.53%) in the validation cohort (Supplemental Digital Content 4, https://links.lww.com/ALN/B231, and Supplemental Digital Content 5, https://links.lww.com/ALN/B232). The large increase in the postoperative cardiac outcomes is related to a major shift in the definition of the postoperative cardiac events.
Comparison of POSPOM Performances with Existing Risk Scores
ASA physical status classification10 remains widely used to stratify the preoperative risk of postoperative mortality. However, its accuracy and more importantly its reproducibility among observers37 represent major limitations,38 which also has an impact on risk scores that include this classification as a variable.2,39,40 Davenport et al.41 demonstrated an interdependence between the ASA physical status and the clinical risk factors in the ACS-NSQIP and that the inclusion of ASA physical status was not associated with an increase in discrimination.
Using a total of 35,179,507 patient stay records from 2001 to 2006 Medicare Provider Analysis and Review (MEDPAR) files, Sessler et al.42 developed the risk stratification indices (RSIs) for risk adjustment (including 30-day mortality and in-hospital mortality) to enable healthcare provider profiling. Although the RSIs demonstrated excellent discrimination, its calibration limits its generalizability and usefulness for providing patients and clinicians an accurate estimate of in-hospital mortality.43 Furthermore, the large number of variables used in the score (e.g., 184 variables for in-hospital mortality) and the infrequent occurrence of some of the preoperative variables (e.g., cardiopulmonary resuscitation) limit its applicability. In a subsequent study,44 RSIs was modified to incorporate the timing of diagnoses and procedures to develop the Present-On-Admission (POA) risk score. Observed discrimination remained high; however, almost 2,000 coefficients were used to fit the final model; this represents a clear limitation for clinical use. Both RSI and POA risk score have been constructed primarily as tools for research—for adjustment for (baseline) confounding—and not for individual patient prediction.
The collection of data from a single country is a limitation of this study. In the EuSOS,5 there were variations across hospitals regarding in-hospital mortality, even after adjusting for confounding. The odds ratio of in-hospital mortality varied from 0.44 (95% CI, 0.19 to 1.05; P = 0.06) for Finland to 6.92 (95% CI, 2.37 to 20.27; P = 0.0004) for Poland compared with that for United Kingdom, the country with the largest data set.5 We chose a single country with a pay-for-performance system. Clearly, the motives underlying the introduction and development of Diagnosis-Related Group systems vary greatly from country to country, adapted to each country, according to their individual developmental contexts or to their conception of a welfare state.22 Variations may relate to differences in the health system model used, the relationship between providers and funders, the degree of centralization, the separation between purchasing and provision, the structure of the hospital market, the type of centers (e.g., profit vs.nonprofit), or the level of competition between public and private structures. In this study, we considered a wide range of public and private hospitals and we included more than 5 million procedures performed in more than 1,100 hospitals. Finally, we did not consider including any interactions between any of the risk factors and age, as our intention was to keep the model simple and easy to use and that interactions rarely add to the predictive ability of the models.26,30 POSPOM requires validation in other countries.
Conclusions
In summary, our POSPOM risk score is a robust tool for predicting in-hospital mortality in patients undergoing surgery and has very good discriminative and calibration properties. Physicians may find it practical to use and applicable to clinical practice.
Acknowledgments
Drs. Le Manach and Landais had full access to all the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis. Drs. Le Manach, Collins, and Landais helped in study concept and design; Drs. Le Manach, Le Bihan-Benjamin, and Landais in acquisition of data; Drs. Le Manach, Collins, and Landais in analysis and interpretation of data; Drs. Le Manach and Collins in drafting of the manuscript; Drs. Le Manach, Collins, Rodseth, Biccard, Devereaux, Riou, and Landais in critical revision of the manuscript for important intellectual content; and Le Manach and Collins in statistical analysis. Dr. Le Manach is supported by the Hamilton Anesthesia Associates and the Canadian Network and Centre for Trials Internationally, Hamilton, Ontario, Canada; Dr. Collins is funded by the UK Medical Research Council (grant number G1100513, London, United Kingdom); Dr. Rodseth is supported by a CIHR Scholarship (Canada-HOPE Scholarship, Toronto, Ontario, Canada), the College of Medicine of South Africa (Phyllis Kocker/Bradlow Award, Cape Town, South Africa), and the University of KwaZulu-Natal (Competitive Research Grant, Durban, South Africa); and Dr. Devereaux is supported by Career Investigator Award of Heart and Stroke Foundation of Ontario, Canada.
Competing Interests
The authors declare no competing interests.