“[In developing this tool,] Eikermann et al. … have lit a path of discovery toward greater efficiency of screening for postoperative respiratory failure.”
CLINICAL research in Anesthesiology increasinglyfocuses on prediction of myriad perioperative complications. This research has been driven by development of databases that capture various measures of patient health, anesthetic care, surgical procedure, and outcomes. One of the strengths of this approach is that it allows analysis of outcomes that were previously understudied due to their rarity or complexity. Postoperative respiratory failure is a perfect target for such researches because it is relatively rare and has a significant impact on healthcare costs1 and patient mortality.2,3 Previous investigators have developed several screening tools for the prediction of postoperative respiratory failure,2–4 but most have focused on 30-day outcomes, a period that may well exceed the immediate influence of anesthesia technique and perioperative respiratory management. For example, the first 24h after surgery represent the highest risk of unanticipated respiratory failure due to opioids,5,6 whereas postoperative hypoxemia has been shown to peak by the third night after major surgery.7–10 In this issue of the Journal, Eikermann et al. report the development and validation of a Score for Prediction of Postoperative Respiratory Complications (SPORC) focusing on the early postoperative period of 3 days after surgery.11
The investigators identified several independent predictors for reintubation such as planned postoperative hospital admission, preoperative history of congestive heart failure, chronic pulmonary or cerebrovascular disease, emergency surgery, American Society of Anesthesiologists score of 3 or more, and high-risk surgical service. By using a weighted point system, the SPORC yielded a calculated area under the receiver operating characteristics curve of 0.84–0.87, with a step-wise increase in the odds for reintubation with increasing number of risk factors. As previously reported, the development of respiratory failure was associated with a large increase in 30-day mortality.2 The SPORC tool is, thus, a simple way to identify high-risk patients in future studies and prospectively evaluate the effectiveness of interventions in preventing or reducing the incidence and severity of postoperative respiratory failure. For example, success of continuous positive airway pressure therapy in patients recovering from major abdominal surgery suggests that screening tool such as SPORC may have a role in identifying the patients who are likely to benefit from this therapy.12 There are, however, several limitations to the authors’ approach to screening.
Bayes Theorem describes the relation between the prevalence of a disease and the accuracy of prediction tools.13–15 For rare events, even highly accurate tests will generate many false positives, with the potential consequences of excess resource utilization and complications from unnecessary treatment. Most prediction models derived from outcomes databases have positive predictive values less than 10% (low clinical precision) as the outcomes of interest are typically rare (0.1–4%).2,16 As most patients who underwent surgery have low risk and will not develop the complication in question, high specificity is given; but, because only a small fraction of patients will screen positively as high-risk, the sensitivity typically tends to be low. A tool with sensitivity of less than 50% will, by definition, fail to identify the majority of patients who will develop the complication. This is the case with the majority of prediction models derived from outcomes databases for perioperative complications (low sensitivity and low technical precision). A consequence is that policies or care processes that preferentially allocate treatments to high-risk patients, based on these screening tools, may potentially place more patients at harm due to misdiagnoses (i.e., patients who should be given treatment do not receive it). Furthermore, heterogeneity in patient populations is high in most prediction models, resulting in significant variability in prediction accuracy.14,17 The inherent heterogeneity and lack of technical or clinical precision are thus permanent limitations of screening tools designed to detect high risk of rare outcomes. Other challenges with rare outcomes from large databases include concerns over data reliability, missing data, unmeasured biases, unmeasured treatment effects, and lack of generalizability. These concerns apply broadly and are certainly not unique to Eikermann’s work.
The impact of clinical screening tools needs to be considered in the context of the population to which it is applied and the clinicians who will use the tool.14,15 Screening is one of several steps in the perioperative care of patients that could influence outcomes. The bevy of outcomes research-based screening tools is an emerging point to one common finding, i.e., sicker patients do worse after surgery! More interestingly, several of the independent predictors are common across multiple studies of diverse outcomes. Because these diseases are coded in a binary fashion, their severity spectrum or modifiability remains untested. Should patients be investigated more intensively when there is high expected risk of complications? Should surgical techniques be determined by the estimated risk of complications? Indeed, several care processes that significantly improve outcomes have been described for patients with specific conditions.18 An elegant way to assess prediction tools is to use a hierarchical model of diagnostic test effectiveness.19 The hierarchical model describes the impact of a test or screening tool on patient outcomes, where screening tests that address the higher level questions have greater clinical impact. It is imperative to recognize that a great screening tool is unlikely to change outcomes unless specific interventions exist that modify the risk of the adverse outcome themselves.20,21 The most important question among clinicians remains “What should I do differently to prevent postoperative respiratory failure?” Thus, to answer that specific question, iterative steps are needed to traverse the levels of test efficiency from accurate disease screening to identification of patients most suited for effective therapy. In the process of achieving greater screening test effectiveness, we need to be wary of these existing statistical traps.
One way to overcome the stated permanent limitations could be to develop two-level screening tools, with an initial high-sensitivity test followed by a test with high-positive predictive value. This approach has been particularly effective in screening for disease states such as obstructive sleep apnea, where determination of high risk based on a questionnaire (high sensitivity) followed by overnight home oximetry (high predictive value) provides highly accurate prediction.22 The countermeasure for low sensitivity of prediction tools is to cast a wider net that includes all possible conditions that influence risk of respiratory failure (or other clinical outcomes of interest). This approach comes with an attendant drop in the positive predictive value. The addition of a second-level screening process, which identifies the more severe forms of specific contributory diseases, could increase the positive predictive value of the tool in a subsection of the study population. For example, second-level testing might evaluate lung volumes, exercise tolerance, radiographic measures, or other functional measures that are shown to increase the overall predictive value.
A significant strength of Eikermann’s study is that it elucidates specific risk conditions, which become targets for mechanistic research, to help us understand how these conditions influence the outcomes. With the development of the SPORC screening tool, Eikermann et al. have targeted an important perioperative complication that markedly increases mortality and cost of health care. In doing so, they have lit a path of discovery toward greater efficiency of screening for postoperative respiratory failure.