Major bleeding can occur during cardiac surgery. Although different scoring systems exist, the assessment of bleeding can be variable, and the reliability of these scoring systems has not been determined.
Two consensus-based scoring systems for assessing bleeding were compared in a substudy of the Transfusion Avoidance in Cardiac Surgery trial. Both the Universal score and European Coronary Artery Bypass Graft scores performed well and may be used as validated outcome measures in future clinical trials.
Research into major bleeding during cardiac surgery is challenging due to variability in how it is scored. Two consensus-based clinical scores for major bleeding: the Universal definition of perioperative bleeding and the European Coronary Artery Bypass Graft (E-CABG) bleeding severity grade, were compared in this substudy of the Transfusion Avoidance in Cardiac Surgery (TACS) trial.
As part of TACS, 7,402 patients underwent cardiac surgery at 12 hospitals from 2014 to 2015. We examined content validity by comparing scored items, construct validity by examining associations with redo and complex procedures, and criterion validity by examining 28-day in-hospital mortality risk across bleeding severity categories. Hierarchical logistic regression models were constructed that incorporated important predictors and categories of bleeding.
E-CABG and Universal scores were correlated (Spearman ρ = 0.78, P < 0.0001), but E-CABG classified 910 (12.4%) patients as having more severe bleeding, whereas the Universal score classified 1,729 (23.8%) as more severe. Higher E-CABG and Universal scores were observed in redo and complex procedures. Increasing E-CABG and Universal scores were associated with increased mortality in unadjusted and adjusted analyses. Regression model discrimination based on predictors of perioperative mortality increased with additional inclusion of the Universal score (c-statistic increase from 0.83 to 0.91) or E-CABG (c-statistic increase from 0.83 to 0.92). When other major postoperative complications were added to these models, the association between Universal or E-CABG bleeding with mortality remained.
Although each offers different advantages, both the Universal score and E-CABG performed well in the validity assessments, supporting their use as outcome measures in clinical trials.
EXCESSIVE perioperative blood loss requiring transfusion is a common and clinically important complication of cardiac surgery.1,2 Despite its importance, there is significant variability in how this outcome is scored across clinical trials.1,2 Adoption of consensus-based outcome scoring methods has been advocated as a means for better standardizing endpoints in clinical trials.3–6 They offer consistency across clinical trials, help simplify the interpretation of trials with conflicting results, and facilitate evidence synthesis.7,8
Two such consensus-based scores for clinically important blood loss in cardiac surgery have recently been proposed, namely the Universal definition of perioperative bleeding9 and the European Coronary Artery Bypass Graft (E-CABG) bleeding severity grade (table 1).10,11 The Universal score is based on nine clinically important events that occur during surgery or within the first postoperative day. It was designed to capture significant bleeding independent of its source or clinical management decisions (table 1).9 Conversely, the E-CABG score is based on interventions that indirectly quantify perioperative blood loss (i.e., blood product transfusion, reoperation for bleeding) and occur at any point from surgery to the end of hospitalization (table 1).11
Precise quantification of bleeding during the perioperative period is difficult and prone to error. Direct estimation of surgical blood loss, even by experienced staff, is often inaccurate and unreliable.12,13 In addition, ongoing bleeding may be concealed or unrecognized until it leads to changes in hemodynamic stability or laboratory parameters. As such, these consensus-based scores were designed to grade bleeding using multiple items rather than relying on directly observed blood loss. Recognizing that multiple items are combined to create a construct for bleeding, such scores are often compared and evaluated to provide evidence of their validity prior to use in clinical trials.14–16
There has been no prior comparison of E-CABG and the Universal score for their intended purpose, which is as clinical trial endpoints. We therefore conducted a substudy of the Transfusion Avoidance in Cardiac Surgery (TACS) trial,17 which was a stepped-wedge clustered randomized controlled trial evaluating point-of-care hemostatic testing for transfusion avoidance at 12 hospitals, to compare the Universal and E-CABG scoring systems with respect to their content, construct, and criterion validity.
Content validity is whether all important domains of a given construct are included and was assessed by examining the individual items graded by the Universal score and E-CABG. Construct validity is whether a measurement tool captures the phenomenon it claims to measure and was assessed by examining whether patients who underwent procedures known to be associated with higher blood loss correspondingly had higher Universal and E-CABG scores. Last, criterion validity, which is the extent to which a score is related to an important outcome, was assessed by examining whether higher blood loss as assessed by either score was predictive of early postoperative mortality.
We hypothesize that both E-CABG and the Universal score will have evidence of content, criterion, and construct validity but expect that the Universal score may have better measurement properties overall given the greater number of clinically relevant events captured in the perioperative period.
Materials and Methods
Study Setting and Population
This substudy was based on data prospectively collected in the TACS trial, which included all patients who underwent elective, urgent, and emergent cardiac surgical procedures with cardiopulmonary bypass at 12 Canadian study sites from October 6, 2014, to May 1, 2015. Institutional research ethics board approval for this substudy was obtained from the University Health Network in Toronto, Canada. All authors had full access to all the data in the study and were responsible for its integrity and the data analysis. If a patient was readmitted for additional operations requiring cardiopulmonary bypass during the study period, only data from the first admission were used. The eligible patient sample included all 7,402 patients included in the TACS trial.
Predictor and Outcome Definitions
The primary outcome was 28-day in-hospital all-cause mortality. In the TACS trial, major bleeding was scored using the Universal definition of perioperative bleeding, but E-CABG could be readily scored using variables already collected in the trial data set. Other events of interest included acute kidney injury (defined as an at least twofold postoperative increase in creatinine concentration or new need for renal replacement therapy, which corresponds to Kidney Disease Improving Global Outcomes stage 2 or 3),18 sepsis, sternal infection, myocardial infarction, return to the operating room for reexploration, and cerebrovascular accident. In-hospital follow-up for complications was censored at postoperative day 28.
Important confounders for potential inclusion in risk-adjustment models were selected based on a review of published cardiac risk scores, as well as the literature examining the association between preoperative factors and postcardiac surgery outcomes.19–29 These potential confounders included demographics (age, sex); procedure urgency (elective, urgent, emergent); redo procedure; preoperative intraaortic balloon pump; preoperative hemoglobin concentration (g/l); preoperative renal dysfunction, which was defined as an estimated glomerular filtration rate of less than 60 ml/min (calculated using the Cockcroft–Gault Equation)30 or preoperative dialysis; diabetes mellitus; extracardiac arteriopathy, which was defined as stroke, transient ischemic events, or peripheral arterial disease; chronic obstructive pulmonary disease; coronary artery disease; liver disease; heart failure; hypertension; and procedure complexity.20 Procedure complexity was classified as simple, which was defined as isolated coronary artery bypass graft (CABG) or single valve procedure, or complex (all other procedures).
The data set had at most 2% missing data; hence, missing data were not replaced or imputed. Initially, descriptive statistics (mean, SD, median, interquartile range, counts, proportion) were used to characterize the overall cohort, strata based on the presence versus absence of the primary outcome, and strata delineated by the E-CABG and Universal scores. Statistical significance was defined by a two-sided P value less than 0.05. All statistical analyses were conducted using SAS version 9.4 (SAS Institute Inc., USA).
Content validity refers to whether a measure reasonably represents all aspects of a given construct. The items graded by each score, as well as items incorporated into one score but not the other, were assessed for how they contributed to assigning patients to a given category of blood loss. Chest tube output was specifically examined because it could upgrade bleeding severity in the Universal score alone and thereby could result in discrepancies between the Universal score and E-CABG. The correlation between E-CABG and Universal scores was characterized using the Spearman statistic.
Construct validity refers to whether the Universal score and E-CABG behave as would be expected if they are representing measures of perioperative blood loss. We examined whether procedure types known to have higher rates of blood loss, such as complex and redo procedures, were associated with higher blood loss severity as graded by either score.
An important component of criterion validity is whether increased severity of blood loss (as assessed by the Universal score or E-CABG) is associated with future events, such as increasing postoperative mortality. We measured the adjusted association between the Universal score and E-CABG with in-hospital 28-day all-cause mortality using multivariable logistic regression modeling. We accounted for clustering by using hierarchical models that incorporated site random effects and patient-level factors as fixed effects. Both the Universal score and E-CABG were each treated as ordinal variables, with progressively increasing scores treated as higher severity categories. Model assumptions were verified, including linearity of continuous variables. Age was modeled using a b-spline method to provide a robust and flexible way of modeling nonlinearity to the logit. Hemoglobin was treated as an untransformed continuous variable because it did not demonstrate significant nonlinearity. All other variables were binary or categorical.
A series of nested models were constructed for both the Universal score and E-CABG. Model 1 was used as the baseline model that only incorporated parsimonious predictive variables, which were initially identified from the literature and published predictive indices. Bootstrapping was used to select relevant variables for inclusion in the model from this initial list. Random sampling with replacement was used to generate a sample of 5,000 in 300 bootstrapped replicates, and covariates included in more than 50% of bootstrapped replicates were retained. Site was empirically retained in the model. Model 2 included all predictor variables in model 1 plus Universal score categories. Model 3 incorporated all predictor variables in model 2 plus major perioperative complications as predictors. This process was then repeated for E-CABG bleeding severity grades. Model 4 was composed of all predictor variables in model 1 plus E-CABG bleeding severity grades. Model 5 incorporated all predictor variables in model 4 plus major perioperative complications. Model calibration was examined using the Hosmer–Lemeshow statistic, whereas discrimination was characterized using the area under the curve for the receiver-operating-characteristic curve.
Using SAS version 9.4, the Glimmix procedure was used to create hierarchical models. The output data from the Glimmix procedure was subsequently used in the logistic procedure to obtain the area under the curve and receiver-operating-characteristic curve for each model. Using Glimmix output, the logistic procedure receiver-operating-characteristic contrast command was used to compare differences in area under the curve values between the models. The output data sets from the Glimmix procedure were ranked according to deciles, and the Hosmer–Lemeshow statistic was calculated for each hierarchical model. In addition, internal model validation was conducted for all adjusted models via bootstrapping to obtain the model optimism, which was subsequently used to adjust the c-statistic and obtain a 95% CI using the Harrell Optimism SAS macro.
The primary outcome, 28-day in-hospital mortality, occurred in 190 (2.6%) patients. The characteristics of patients stratified by mortality are presented in table 2. The Universal score could be fully calculated for 7,281 (98.4%) patients, of whom 168 (2.3%) died. E-CABG could be fully calculated for 7,347 (99.3%) of patients, of whom 190 (2.6%) died.
Universal score classes and E-CABG grades were moderately correlated with each other (Spearman ρ = 0.78, P < 0.0001). Only 910 (12.4%) patients were classified as having more severe bleeding (grade 2 or 3) when E-CABG was used, whereas 1,729 (23.8%) of patients were classified as having more severe bleeding (class 3 or 4) by the Universal score. Individual items in each scale were evaluated to explain this discrepancy. In total, 857 (11.9%) patients were classified by E-CABG as having lower severity bleeding (E-CABG grades 0 and 1) but classified as higher severity bleeding by the Universal score (Universal score class 3 and 4). Out of these individuals, 700 (81.7%) patients were classified as higher severity bleeding by the Universal score based solely on high chest tube output. Mortality risk among patients classified into higher severity bleeding categories due to chest tube output alone was low, with 0.9% (n = 6) of these patients experiencing death. In contrast, patients classified as higher severity by the Universal score due to factors other than or in addition to chest tube output had much higher mortality, with 12.6% (n = 130) experiencing death. No patients classified as lower severity bleeding by the Universal score (class 0 to 2) were scored as higher bleeding by E-CABG (grade 2 and 3).
There were higher numbers of complex and redo procedures within each increasing category of bleeding severity for both scores (fig. 1).
When blood loss severity increased as graded by either score, there was a corresponding increase in mortality (fig. 2). In unadjusted logistic regression analyses, increasing Universal and E-CABG scores were associated with increasing mortality (table 3). When used as the only predictor for the outcome of mortality, the Universal score and E-CABG did not differ in their discrimination, with the Universal score having an area under the curve of 0.84 (95% CI 0.81 to 0.87), and E-CABG having an area under the curve of 0.85 (95% CI 0.82 to 0.88). These two areas under the curve did not differ significantly (P = 0.25).
Similar results were found in subsequent adjusted analyses. In the hierarchical logistic regression model that incorporated important confounders but not bleeding events or other complications (model 1), the area under the curve for predicting 28-day in-hospital mortality was 0.84. In model 2, Universal classes were added, and the area under the curve increased to 0.91. In model 3, the further addition of other important complications increased the area under the curve to 0.94. Adding E-CABG grade as a predictor to model 1 similarly increased the area under the curve to 0.92. Further addition of other important complications to the model with E-CABG grades increased the area under the curve to 0.94. In each of the models, bleeding severity regardless of the score used, whether assessed by the Universal score or E-CABG, demonstrated a statistically significant association with 28-day in-hospital mortality. The details of the models can be found in Supplemental Digital Content 1, https://links.lww.com/ALN/B671. Detailed model calibration results can be found in Supplemental Digital Content 2, https://links.lww.com/ALN/B672.
This study compared the Universal definition of perioperative bleeding and E-CABG bleeding severity scores and attempted to provide evidence of construct, criterion, and content validity in a high-quality data set from a multicenter clinical trial of patients undergoing cardiac surgery. Our findings indicate that the Universal score and E-CABG are both valid and acceptable scoring systems for use as bleeding endpoints in cardiac surgery clinical trials.
The Universal definition of perioperative bleeding has been studied in various patient samples since its development. It was tested in an 1,144 patient single-institution adult European cardiac surgical database, where it was shown that increasing classes demonstrated an independent association with 30-day mortality.9 The Universal score was further validated in an institutional cardiac surgery data set of 2,764 patients in Finland, where increasing classes were significantly associated with worse immediate and late outcomes.19
E-CABG has also been studied in a variety of patient samples. It was evaluated in a 7,491 patient sample drawn from institutional databases at two hospitals in Italy. Increasing E-CABG severity grades were shown to be independently associated with higher in-hospital mortality and composite adverse events.11 In a separate study drawn from 3,730 patients in a multicentre prospective registry encompassing 16 centers in six countries (England, Finland, France, Germany, Italy, and Sweden), six different bleeding scores were compared against each other, including E-CABG and the Universal definition of perioperative bleeding.20 Both E-CABG and the Universal score showed good discriminative and predictive ability, with acceptable area-under-the-curve values for prediction of mortality, stroke, acute kidney injury, and sternal wound infection.20
Our study offers strong evidence supporting the use of either score in a clinical trial context. Despite both being useful, we also noted some key differences. Support for the Universal score and E-CABG measuring a similar construct was offered by the higher complex and redo procedures in higher bleeding grades. In terms of content validity, the Universal score captures a wider variety of clinical events associated with bleeding when assigning patients to a given category of bleeding severity. E-CABG may better capture the adverse impact of higher transfusion needs because it scores almost exclusively transfusion volume. The Universal score also assesses transfusion volume but may better capture coagulopathy related to massive bleeding due to the scoring of recombinant activated factor VII administration, for example.31 Thus, although the two scores have some overlap, each captures slightly different aspects relevant to major bleeding.
For both scores, it is important to note that patients undergoing surgery with a lower total hemoglobin mass to begin with are more likely to receive red cell transfusions than those with higher hemoglobin mass. Because the amount of red cell transfusions is a major component of both scores, investigators should control for patients’ baseline hemoglobin mass if using these scores as endpoints. This also applies to the use of other product transfusions, such as platelets or factor replacement, which may be administered to correct preexisting deficiencies rather than in reaction to bleeding events.9
Despite E-CABG including fewer assessed domains than the Universal score, they both have evidence of construct and criterion validity. Interestingly, the scored domain of chest tube output does not appear to contribute significantly to the criterion validity of the Universal score, and any future revision to the Universal score may consider omitting this component. Although chest tube output has been associated with mortality in other studies, this has been in samples where patients receiving transfusion or having other significant clinical events such as reoperation associated with ongoing bleeding were not excluded.2,32 In our study, chest tube output alone, without the presence of any other items indicative of bleeding, was not associated with increased mortality. Patients with chest tube output and other items indicating bleeding did have increased mortality, which is largely consistent with the existing literature.
The time frame for the scoring of the Universal definition of perioperative bleeding and E-CABG is notably different. E-CABG is scored based on data collected from the initial surgery throughout the entire hospital stay of the patient, whereas the assessment of the Universal score is limited to events that occur from surgery to the first perioperative day. Because E-CABG scores primarily transfusion volume, events during the hospital stay distant or unrelated to the original surgery that result in transfusion are captured and scored as bleeding, which may erroneously be associated with the original surgery. It is important to note this important distinction, which may impact on the construct validity of E-CABG.
Generally, during the conduct of a trial, the more complex the endpoints and the greater the volume of data collected, the higher the risk of incomplete data. Difficulty collecting data to assess either the Universal score or E-CABG was not a problem in the TACS study. Some investigators may prefer to use E-CABG based on the lower number of domains scored and lower burden of data collection. However, any potential savings in data collection when using E-CABG as an outcome have to be weighed against the reduced number of potential bleeding events captured by E-CABG, particularly more severe bleeding. The probability of more severe bleeding score by the Universal score (class 3 or 4) was nearly double that of the E-CABG (grade 2 or 3) in our sample (23.75% by the Universal score vs. 12.39% by E-CABG). A greater proportion of patients are scored as having bleeding events, including more severe bleeding, when the Universal score is used. In addition, the distribution of patients across bleeding categories is more even with the Universal score.
Although we attempted to provide a thorough comparison of these two consensus-based scores, there are limitations to our work. In assessing criterion validity, our outcome was 28-day in-hospital all-cause mortality. This was the data we had available to us, but assessing all-cause mortality not limited to the hospital setting is important. Although limiting mortality to within 28-days allowed us to focus on the immediate perioperative period, certainly mortality beyond this time frame is important. For example, complications of severe bleeding events include acute kidney injury, which may have an impact on patient mortality beyond 28 days and would not have been captured in our data set.33 Furthermore, in our multivariable logistic regression models examining the association of each bleeding score with mortality, we included as covariates variables that were identified as prognostically important in the literature. Our data set did not include important items scored by EuroSCORE II, a model used worldwide for the prediction of cardiac surgical risk.34 This precluded us from including it as a covariate in our models.
The adoption of clinically sensible, consistent endpoints in cardiac surgery clinical trials evaluating bleeding has many advantages. In this study, we add to existing evidence supporting the use of these two consensus-based bleeding scores in cardiac surgery. Both the Universal definition of perioperative bleeding and E-CABG demonstrate evidence of criterion, construct, and content validity. However, the Universal score seems to capture more bleeding events and has a more uniform distribution of bleeding categories. This has implications for clinical trial sample size calculations and suggests that fewer patients may be required to demonstrate a difference between groups if the Universal score is used as an endpoint. The Universal score captures a variety of clinically important events related to bleeding beyond transfusion in the time period immediately related to surgery. E-CABG primarily captures transfusion volume over the entire length of hospital stay, which may not always be related to bleeding events associated with the original surgery. On the other hand, the individual components of E-CABG are easily collected from most hospital administrative databases and may represent significantly less burden of work for trial organizers needing to be efficient with their resources. The ongoing growth of clinical trials in cardiac surgery will benefit from the use of either scoring system in future clinical trials measuring bleeding as an outcome of interest.
The authors thank the Transfusion Avoidance in Cardiac Surgery Investigators for access to the study data.
There are no sources of funding to declare for this substudy. The original Transfusion Avoidance in Cardiac Surgery Study was funded by a grant from the Canadian Institutes of Health Research and by unrestricted grants from Octapharma Canada Inc. (Toronto, Ontario, Canada) and Baxter Corp. (Mississauga, Ontario, Canada). In-kind financial support was provided by Tem International GmbH (Munich, Germany) and Helena Laboratories (Beaumont, Texas). The funders did not have a role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; or preparation, review, or approval of the manuscript. Drs. Karkouti and Wijeysundera are supported in part by merit awards from the Department of Anesthesia, University of Toronto. Dr. Wijeysundera is supported in part by the New Investigator Award from the Canadian Institutes of Health Research. Dr. Scales was supported by a Fellowship in Translational Research from Physicians’ Services Incorporated Foundation.
The authors declare no competing interests.
The Transfusion Avoidance in Cardiac Surgery (TACS Research Group) included:
Keyvan Karkouti, M.D.
Jeannie Callum, M.D.
Duminda N. Wijeysundera, M.D., Ph.D.
Vivek Rao, M.D., Ph.D.
Mark Crowther, M.D.
Hilary P. Grocott, M.D.
Ruxandra Pinto, Ph.D.
Damon C. Scales, M.D., Ph.D.
Blaine Achen, M.D.
Sukhpal Brar, M.D.
Doug Morrison, M.D.
David Wong, M.D.
Jean S. Bussières, M.D.
Tonya de Waal, M.D.
Christopher Harle, M.D.
Étienne de Médicis, M.D., M.Sc.
Charles McAdams, M.D.
Summer Syed, M.D.
Diem Tran, M.D.
Terry Waters, M.D.