: Routine predictions made by surgeons or historical mean durations have only limited capacity to predict operating room (OR) time. The authors aimed to devise a prediction model using the surgeon's estimate and characteristics of the surgical team, the operation, and the patient.
: Seventeen thousand four hundred twelve consecutive, elective operations from the general surgical department in an academic hospital were analyzed. The outcome was OR time, and the potential predictive factors were surgeon's estimate, number of planned procedures, number and experience of surgeons and anesthesiologists, patient's age and sex, number of previous hospital admissions, body mass index, and eight cardiovascular risk factors. Linear mixed modeling on the logarithm of the total OR time was performed.
: Characteristics of the operation and the team had the largest predictive performance, whereas patient characteristics had a modest but distinct effect on OR time: operations were shorter for patients older than 60 yr, and higher body mass index was associated with longer OR times. The surgeon's estimate had an independent and substantial contribution to the prediction, and the final model explained 27% of the residual variation in log (OR time). Using the prediction model instead of the surgeon's prediction based on historical averages would reduce shorter-than-predicted and longer-than-predicted OR time by 2.8 and 6.6 min per case (a relative reduction of 12 and 25%, respectively), assessed on independent validation data.
: Detailed information on the operative session, the team, and the patient substantially improves the prediction of OR times, but the surgeon's estimate remains important. The prediction model may be used in OR scheduling.
What We Already Know about This Topic
❖ Efficient use of the operating room requires accurate estimates of procedure time
❖ Surgeon estimates of procedure duration are poorly predictive, and models have focused on specific procedures without including individual patient characteristics
What This Article Tells Us That Is New
❖ Using more than 17,000 cases in an academic practice, procedure, operating team, and patient characteristics added to the predictive value of surgeon estimate of duration
❖ Surgeon and anesthesiologist age in this teaching setting affected procedure duration
OPERATING rooms (ORs) are of pivotal importance to a hospital, consuming a considerable part of its total budget.1Typically, more than 60% of patients admitted to a hospital are treated in the OR. Patient management, that is, the decision to treat a patient and the timing of treatment, is often constrained by limitations in the OR capacity or in the availability of surgeons and qualified OR personnel. For this reason, and for cost containment, the planning of care, that is, planning which patient to operate on when, is crucial.2,3Emergency procedures, large diversity in processes, dependency on limited capacity in other parts of the care process such as intensive care units, and a large number of specialties competing for limited OR facilities make planning complex.
Optimal planning can be achieved only when reliable predictions are available about the time needed for elective operations. When an operation takes longer than predicted, subsequent operations may need to be postponed or even Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are available in both the HTML and PDF versions of this article. Links to the digital files are provided in the HTML text of this article on the Journal′s Web site (www.anesthesiology.org). cancelled. When the actual time is shorter than predicted and planned, the OR remains unused for a while. Both are undesirable and could lead to suboptimal use of the OR.4Furthermore, in the absence of reliable predictions, the use of advanced planning techniques makes no sense. Although much progress has been made in the planning methodology, particularly for planning on the day of surgery, there remains opportunity for additional improvement through a better preoperative prediction of OR times for individual cases.
In some hospitals, the surgeons make a routine prediction of the OR time needed, and in others, historical times are taken as a reference.5,6However, the accuracy of these predictions is limited.7If it would be possible to make more accurate predictions of the OR time for individual patients, planning will be improved, and potential benefits would be twofold: (1) the prediction for an individual patient will be more accurate than the average prediction for the group of patients undergoing the same operation, and (2) the variation around the prediction will be smaller than the variation for the group as a whole. Previous studies have aimed to develop predictive tools by statistical modeling of operation times.5,8–11However, none of them aimed to make predictions for individual patients covering all operations from one surgical department. Selected operations from various surgical specialities were taken into account,8or only one particular type of operation was considered.9Silber et al. ,11using Medicare claims data, developed a prediction model to explain the differences between subgroups of patients and between hospitals, not to develop a tool for planning. The role of the prediction made by the surgeon is ignored, or it is compared with the predictions made by an automated planning software.5
We aim to predict the total OR time by using the surgeon's estimate of operative time and procedure, team, and patient characteristics of individually specified operations of a general surgical department.
Materials and Methods
Subjects
All operative sessions at the Erasmus Medical Center (Rotterdam, The Netherlands) are registered electronically since January 1993. For the purpose of this study, data from the operation database (OPERA, operation administration) were matched with global patient data from the general electronic hospital information system and with more detailed patient data from a previous study on risk factors for complications of surgery.12For use of these data sources, approval of the Institutional Review Board of the Erasmus Medical Center was obtained. We initially selected 18,838 consecutive elective operations performed by the Department of General Surgery until June 2005. Emergency operations were not considered. Operations that had not been performed during the last 3 yr (n = 1,338), operative sessions for which no matching between the databases could be obtained (n = 21), and operations that were wrongly assigned to the Department of General Surgery (n = 67) were excluded. This left 17,412 operations for analysis. Operations were classified into 253 categories, according to the main procedure during the operation. These operations are typical for a surgical department in an academic, tertiary referral center.
The outcome to be predicted was total OR time, defined as the time from entry of the patient into the OR until leaving it. We will systematically use the term “operation” to characterize a session and use “surgical procedure” for the possibly multiple surgical activities that are part of an operation.
Operation characteristics were the number of separate procedures within the operation and whether it was a laparoscopic procedure. In case of multiple procedures, the operation was coded according to the main procedure, which was determined from a priority list that was constructed by surgeons from the general surgical department. We preferred this method over the statistical determination of the longest procedure from single procedure cases,13because we observed that a number of procedures were never performed in isolation. Team characteristics were the total of the ages of the surgical team as a measure of combined experience, age of the youngest and oldest surgeon, the number of surgeons, and the ages and number of anesthesiologists.
Patient characteristics were age and sex, the number of admissions to the hospital before the operation, and the length of the current hospital admission. For patients who were operated before 2001, additional data were available on the presence or absence of the following cardiovascular risk factors: diabetes, hypercholesterolemia, hypertension, history of heart failure, history of cerebrovascular accident, history of chronic obstructive pulmonary disease, history of renal insufficiency, and history of coronary artery disease.12Body mass index (kg/m2) of the patient was known in 1,491 (8.6%) of the operations, as assessed during a previous study.
Prediction by the Surgeon
Before each operation, the surgeon's prediction of the total surgical time was routinely registered in the database and used for planning the operation. In an internal evaluation in 2002, it became evident that the time planned in this way systematically underestimated the total OR time, because anesthetic time was not taken into account. Starting in 2004, a computerized planning system was used, providing the surgeon with the mean duration of previous operations of the same type. Surgeons made a subjective adjustment when necessary, which was used in planning. We assessed whether this planning system had improved the accuracy by comparing the pre-2004 with the 2004–2005 data. For further analysis, the bias in the surgeon's estimates in the pre-2004 data were removed by adding to it the median deviation between this estimate and the actual OR time.
Recoding of the Operations
The operation and team characteristics were entered in the database after finishing the operation as they had turned out to be and not as they had been intended preoperatively. Some operations evolve differently than initially intended and planned. Examples are oncological operations with curative intent: the patient may seem to be inoperable only during the operation. This may lead to less procedures being performed than those intended and shorter OR time than what is typical for such an operation. Further, operations that are planned laparoscopically may be changed to an open procedure during the operation. Finally, in case of complications during an operation, an experienced surgeon may be called in, who is afterward added to the list of surgeons performing the operation. This makes the data unsuited for prediction modeling, where only factors that are preoperatively known may be taken into account. Two of the authors (M.J.C.E. and G.K.) have gone through the list of operations, recoding when necessary the postoperative code as entered in the database to the preoperative code that was most likely initially planned and adjusting the number of procedures and the number of surgeons to the usual number for each specific operation. Procedure codes that had been changed over time, in particular, the coding for laparoscopic procedures, were reassigned to a unique coding. The recoded data were used in all subsequent analysis.
Statistics
We used imputation of missing data, as this is recommended as less biased than dropping cases with missing values when developing multivariable models.14The multiple imputation technique, implemented by Harrell's AregImpute function in the Hmisc library in Splus, was used to properly adjust standard errors and confidence intervals after imputation. Linear mixed modeling was used to build the prediction model, with the logarithm of the total OR time as the dependent variable. The 253-level variable indicating the type of operation was used as a random effect; all other variables were analyzed as fixed effects. From the random effects part of the mixed model, one can calculate empirical Bayes estimates, which are equivalent to the Bayesian estimates obtained by the method of Dexter et al. 15This approach allows for the inclusion of very infrequent operations, even operations that have been performed only once. The total OR time variable was log transformed because of its right skewness16and because the log-normal distribution fits better to the data of multiple procedure operations.13First, a base model was fitted, containing only the type of operation as a random effect. As a screening step before further model building, the nonlinearity of the association between the continuous predictor variables and the log (total OR time) was assessed by fitting a restricted cubic spline function, with knots at the 5th, 35th, 65th, and 95th percentiles of the predictor's distribution, as an extension to the base model. In this way, learning-curve-like nonlinear patterns, e.g. , for the ages of the surgeons, may be detected and incorporated into the prediction model, using only two extra parameters in the model.17In the following step, a test for interaction between the predictor and the type of operation was performed to assess whether the effect of the predictor depended on the type of operation. To keep this analysis manageable, we tested for interaction with a condensed version of the operation code into 40 categories, corresponding to the organ or anatomical site involved. For example, when testing for interaction between type of operation and number of surgeons, we entered the operation code in 253 categories; the main effect of number of surgeons and the interaction between the number of surgeons and the condensed code was entered in 40 categories.
The surgeon's estimate and the operation, team, and patient characteristics were subsequently added to the model, and the improvement in predictive ability was assessed, using the nonlinear patterns and interactions when statistically and clinically significant. Selection of variables was applied conservatively to minimize the risk of over fitting: all predictors with a univariable P < 0.30 were included into the model.18The predictive ability of the resulting extended models was expressed as a percentage of variation in log (OR time) that is explained by the model and measured by the model's adjusted R 2. To quantify the improvement in comparison with the base model, the gain in R 2of the extended model was expressed as a percentage relative to the variation left unexplained by the base model: (R 2model−R 2base)/(1 −R 2base). The final model contained the type of operation as a random effect and the surgeon's estimate together with operation, team, and patient characteristics as fixed effects. The model predictions on the log (OR time) were back transformed to the original time scale, applying a correction for back-transformation bias, a smearing factor computed as the mean value of the exponentiated residuals of the model.19
To assess the potential impact of using the final model in planning, we split the data according to the date in 2004 at which the planning was changed from the surgeon's estimate of operative time to the surgeon's estimate based on the mean duration of all previous operations of the same kind. The pre-2004 data were used to reestimate the final prediction model, which was subsequently used to predict the durations of operations from 2004 onward. The difference between the observed and predicted OR times was assessed and compared with the difference between the observed and the actually planned durations. Analyses were performed with S-plus 7.0 (Insightful Corp, Seattle, WA).
Results
There were 11,243 operations consisting of one surgical procedure, 3,580 of two surgical procedures, 1,289 of three surgical procedures, and 1,300 operations of four or more surgical procedures (see table, Supplemental Digital Content 1, which shows the list of operations, together with their frequency of occurrence and descriptive statistics of their duration, https://links.lww.com/ALN/A560). The OR times show considerable variation between operations with the median ranging from 42.5 to 504 min. The coefficient of variation illustrates that the variability within the same type of operation may also be considerable. The operation with the highest consistency in duration was nervous system—sympathectomy thoracal (coefficient of variation = 0.12), whereas trachea—tracheotomy—had the relatively most unpredictable duration (coefficient of variation = 0.95). After accounting for the operation code (the base model), the predicted OR time had a 95% prediction interval with relative bounds between 0.52 and 1.91. For any specific operation, this implies that the OR time may be from nearly half as short up to almost twice as long as the median for that operation.
The historical pattern in the difference between the surgeon's expectation of operative time and the observed total OR time is depicted in figure 1. A systematic underestimation is evident until 2004; the median difference was 31 min. The use of a computerized planning system providing the surgeon with the mean of previous operations, introduced in planning in 2004, clearly resulted in improved correspondence between expectations and observations. The bias per 8 h of used OR time, as calculated per 4-week period, was on average 114 min (SD = 18) before 2004 and −2 min (SD = 11) from 2004 onward.
Table 1shows the operation, team, and patient characteristics in our study population. On average, patients were 56-yr old, ranging from 11 to 98 yr, and the sex distribution was about equal. The predictive effects of the characteristics on the log (total OR time) are also shown in table 1, in addition to the significance of the nonlinearity in this association, as tested by the spline function. Figure 2shows the six parameters that had a nonlinear association with the log (total OR time): age of the youngest surgeon, age of the oldest surgeon, summed age of the surgeons, age of the youngest anesthesiologist, patient's age, and number of previous hospital admissions of the patient. Further, the predictive effects of five variables were different for different types of operation, according to tests for interaction: the age of the youngest surgeon, summed ages of the surgeons, age of the patient, number of previous hospital admissions, and length of the current admission (all P < 0.0001). Most notably, for patients older than 60 yr, operations seemed to last shorter with increasing age, whereas they lasted longer with increasing age for abdominal surgery and for general vascular surgery.
Table 2summarizes the contribution to the model of the predictive factors. When the expected OR time (i.e. , the surgeon's estimate) was added as a single factor to the base model, 76.4% of the variation was explained, an absolute improvement of 4.3%, corresponding to 15.3% of the variation left unexplained by the base model. The next largest improvement in adjusted R 2is due to the session characteristics (the number of separate procedures within the operation, indicating the relative complexity of the operation and the year of surgery), and lesser so the team characteristics. Patient characteristics have only a limited influence. The model extension with session, team, and patient characteristics combined, explained 77.2% of the total variation or 18.3% of the variation left unexplained by the base model. Finally, the model containing all factors, including the surgeon's estimate, explains almost 80% of the total variation in log (OR times), which corresponds to 27.4% of the variation left unexplained by the base model. For any specific operation, the OR time predicted by the final model has a 95% prediction interval with relative bounds from 0.60 to 1.70.
The goodness-of-fit of the model is shown graphically in figure 3Aon the log-transformed scale. No substantial deviation from a symmetrical scatter around the regression line is present. Figure 3Bshows the data on the original scale, where a correction has been used for the back-transformation bias19(smearing factor: 1.04). Figure 3Cshows the corresponding normal probability plot of the log-transformed data. The residuals of the model follow the diagonal line, except at the far ends of the normal scale, beyond a quantile (or: z-score) of ±2.
The potential added value of the model in daily planning is illustrated in table 3. The total amount of shorter-than-predicted and longer-than-predicted OR time is substantially reduced when using the model predictions, including the—bias corrected—surgeon's estimate, instead of the surgeon's prediction based on historical data. The absolute reduction was on average 2.8 and 6.6 min per case, corresponding to a relative reduction of 12 and 25%, respectively.
Discussion
We have studied the influence of operation, team, and patient characteristics on the duration of operations from a general surgical department in an academic hospital, and we have assessed whether the surgeon's estimate had a predictive effect independent of the other factors. Given an individual operation, the surgeon's estimate and operation characteristics were the most important predictors of total OR time, followed in importance by team characteristics. The particular finding in this study were the nonlinear patterns in the effects of the ages of the team members on the total OR time. Effects of the teaching environment and growing experience were expressed in these patterns. Patient characteristics had a statistically significant influence, although limited in size, once the other factors were accounted for. Recently, the limited value of patient characteristics was also found in thoracic surgery articles.20In our data, the patients' body mass index was a significant predictor, but the effect per 10 points increase in body mass index was only a 6% relative increase in predicted total OR time. This may explain why the results in the literature so far have been conflicting on this factor.21–23We assessed that the model might reduce shorter-than-predicted and longer-than-predicted OR time by 12 and 25%, respectively.
The nonlinear age patterns seen in figure 2, A to F may have the following interpretation: when the youngest member of the surgical team was younger than 30 yr, the total OR time was higher with younger age, reflecting both a learning curve and the teaching function of an academic hospital: the younger the resident, the more time is spent with teaching and practice aspects. Between 30 and 35 yr, the total time increased with older age, reflecting the increasing complexity of cases that a young surgeon is allowed to perform with increasing age. For ages of the youngest surgeon older than 35 yr, the duration goes down with increasing age, reflecting the high experience of the team in this case. The pattern in the effect of the age of the oldest surgeon is reversed. The older the oldest surgeon is, the longer the operation takes: if the oldest surgeon is very young, the operation is apparently of a simple enough type to allow for a relatively inexperienced team. For older ages, the operation is apparently so difficult that a very senior supervisor needs to be present. For anesthesiologists, only the age of the youngest member of the team has a nonlinear association with log (OR time). A clear learning curve is visible until the age of 35.
Our results show that the variation in OR times increases with increasing mean. Strum et al. have found that a “work rate effect,” one surgeon working at a different pace than another one, may explain why the differences between OR times increase with increasing mean.8As a result, the distribution of OR times is a log-normal one, which implies a multiplicative error.16However, it also implies that the prediction for long operations is less precise than that for short operations. It may seem that the model is, therefore, not very useful, because the danger of a large absolute deviation from the planned duration is biggest for the long durations. However, very long durations are not common, as can be seen from figure 3B. It is the large bulk of short-duration operations that determines the effectiveness of planning, and these can be predicted quite precisely. Further, operations that are anticipated to take a long time are usually planned as the only operation in the OR on that day.
The database used in this study was designed for administrative purposes, to have access to the production realized by the department. Therefore, it does not contain the operations that were planned, instead the operations that were actually performed. In this respect, our data collection is comparable with the Medicare data used recently by Silber et al. 11,24in a study on the influence of hospital, medical history, and sociodemographic variables on anesthesia procedure time. However, for scientific prediction research, there is a clear registration deficit in this respect: such a database cannot be used to predict operative time for scheduled procedures. We made a particular effort to retrospectively reconstruct the intended operations from the registered ones. Oncological procedures are particularly prone to deviation from the intended procedure because of unforeseen metastases that can force surgeons to refrain from a curative resection. Therefore, despite our efforts, the results should be used with care for planning oncological operations.
The current prediction model is in principle fit to be applied in practice. We used all operations, including the ones that were performed infrequently. The mixed model approach is capable of incorporating operations that occurred rarely or only once, because the distribution of OR times derived from all operations serves as a reference for the estimate of individual operations, similar to the Bayesian methodology described by Dexter et al. 15An issue of the practical usefulness of all statistical modeling is that the factors in the model need to be available online. Particularly, the coding system of the operations needs to be implemented electronically, patient data should be available online, and the calculations should preferably be performed electronically.
Year of surgery seemed to be important in our analysis; the median operative time for all procedures increased significantly from 1996 and reached a plateau around 2003 (data not shown). This difference cannot be explained by changes in the operative portfolio or the introduction of laparoscopic surgery, because this was already corrected for. However, from 1996 onward, an active fellowship in upper and lower gastrointestinal surgery and hepatobiliary surgery with three junior surgeons was established at our institution. This had implications for the operative time of these procedures apparently. Also, the number of attending surgeons increased in that same period, because these fellows were more often supervised during surgery. For the aim of prediction of future operations, the year 2003 should be taken for a steady state estimate.
The surgeon's estimate of operative time was a strong predictor of total OR time and a significant addition to the more objective factors already in the model. Even when very specific cardiovascular risk factors were included that resemble overall comorbidity, the surgeon's estimate remained a very important factor. Nevertheless, the absolute contribution to the explained variance decreased from 4.3% when added to the base model to 2.6% when added to the final model (table 2). Apparently, the surgeon's estimate replaces part of the information of operation, team, and patient characteristics. A potential problem could be the reproducibility of this estimate: it is a subjective assessment by a surgeon, not an objective factor. However, there were many different surgeons involved, and it could not have been such a strong predictor when the data are subjective or random to any extent. Moreover, it is unlikely that surgeons from the academic hospital in Rotterdam, The Netherlands, would do worse or better than surgeons from elsewhere in estimating the duration of their operations.
Planning of operations is often considered difficult, because of the unpredictability of operations. Now, the reverse may become true: because we can predict operations for individual patients, serious planning becomes feasible. The amount of detail of the current model, using operation codes at the lowest level plus operation, team, and patient characteristics, allows for operational25planning of care: the predictions provided by the model are directly applicable in scheduling the surgical workload into available OR working hours. Table 3shows that the variation in discrepancy between predicted and realized OR time will be reduced considerably when using the prediction model instead of the surgeon's estimate, which is based on historical data. As was shown recently,26this variation may be used to reserve “planned slack” time to accommodate unforeseen longer-than-expected OR times and control the risk of overtime. With the prediction model, less slack time needs to be planned, and therefore, the OR utilization rate may increase. The risk of overtime is usually constrained by the hospital board and may be controlled by optimal planning of the start of the last operation relative to the end of the workday.27Reduced variability may allow certain operations to be started on a day, which otherwise, without precise prediction, would run a too high risk of resulting in overtime.4Not all anticipated gains will be realized in practice though, as it has been shown that flexible planning on the day of surgery, moving cases and add-on cases, will minimize the impact of uncertainty in case prediction on the amount of overtime.28
Previous studies on predictive factors for total OR time had aims that were different from ours: Strum et al. 8developed a regression model similar to our model, but applied it to only 40 procedure codes. Their aim was not to devise a prediction model but to assess the effect of surgeon and anesthesiologist on surgical and on OR time, accounting for other predictive factors. A further difference was that they fitted a separate model to each one of the 40 procedure codes. The analysis of Silber et al. ,11although very interesting from a methodological point, cannot be used for case prediction for daily planning because of the administrative post hoc nature of their data. A recent study from Dexter et al. 29showed that their Bayesian methodology15may be used to estimate the remaining time from an already ongoing case. Although it was not the aim of this study, we note that our mixed effects model with its associated empirical Bayes estimates might also be used for this purpose.
We conclude that a prediction model could be developed containing detailed procedure codes and operation, team, and patient characteristics. The surgeon's estimate together with specific aspects of the operation and the experience of the surgical team are the best predictors of the OR time of a given operation. Use of prediction models can improve the planning of ORs.