Recently, two centers have independently developed a risk score for predicting postoperative nausea and vomiting (PONV). This study investigated (1) whether risk scores are valid across centers and (2) whether risk scores based on logistic regression coefficients can be simplified without loss of discriminating power.

Adult patients from two centers (Oulu, Finland: n = 520, and Wuerzburg, Germany: n = 2202) received inhalational anesthesia (without antiemetic prophylaxis) for various types of surgery. PONV was defined as nausea or vomiting within 24 h of surgery. Risk scores to estimate the probability of PONV were obtained by fitting logistic regression models. Simplified risk scores were constructed based on the number of risk factors that were found significant in the logistic regression analyses. Original and simplified scores were cross-validated. A combined data set was created to estimate a potential center effect and to construct a final risk score. The discriminating power of each score was assessed using the area under the receiver operating characteristic curves.

Risk scores derived from one center were able to predict PONV from the other center (area under the curve = 0.65-0.75). Simplification did not essentially weaken the discriminating power (area under the curve = 0.63-0.73). No center effect could be detected in a combined data set (odds ratio = 1.06, 95% confidence interval = 0.71-1.59). The final score consisted of four predictors: female gender, history of motion sickness (MS) or PONV, nonsmoking, and the use of postoperative opioids. If none, one, two, three, or four of these risk factors were present, the incidences of PONV were 10%, 21%, 39%, 61% and 79%.

The risk scores derived from one center proved valid in the other and could be simplified without significant loss of discriminating power. Therefore, it appears that this risk score has broad applicability in predicting PONV in adult patients undergoing inhalational anesthesia for various types of surgery. For patients with at least two out of these four identified predictors a prophylactic antiemetic strategy should be considered.

GENERAL anesthesia using volatile anesthetics is associated with an average incidence of postoperative nausea and vomiting (PONV) ranging between 20% and 30%. 1It has been suggested that this may increase patients’ discomfort and also increase costs (*e.g.* , antiemetics, readmission) and unwarranted side effects (*e.g.* , pulmonary aspiration). 2PONV is thought to be multifactorial, involving anesthetic, surgical, and individual risk factors. 1–3A few studies have tried to quantify the relative impact of risk factors 4–6and to set up a risk model for the prediction of PONV. 4,7,8If such a model can be shown to have general applicability, it could provide a rational basis to decide who might benefit from prophylactic antiemetic therapy. 9

An initial step was to construct a risk table for PONV based on patient-related factors (*e.g.* , gender, history). 4However, because this study was restricted to one type of anesthesia and surgery, the relative impact was not quantified. This limitation was overcome by a prospective survey in Oulu, Finland, with different types of anesthesia and surgery, which revealed that the most important predictors were patient-specific. 7The authors also reported a simplified risk score that was based on the number of equally weighted risk factors present (0–5). Recently, the incidences of postoperative nausea and postoperative vomiting were studied separately after different types of otolaryngologic surgery in Wuerzburg, Germany. 8Again, patient-specific predictors were most relevant, so an operation-independent risk score for postoperative vomiting was constructed 8that was later demonstrated to be applicable in patients undergoing general and ophthalmologic surgery. 10Because validation of such predictive scores in other centers is required in other centers, 9two centers performed cross-validation in order to answer the following questions:

Can a risk score derived from one center predict PONV in an individual from another center with a similar discriminating power?

Does a simplification of a risk score for PONV retain its discriminating power?

How accurate are calibration curves of a risk score in predicting the incidence of PONV in risk group

*s*from another center?In a combined data set, what are the most important predictors for a final score, and what is the impact of a possible center effect?

**Materials and Methods**

*Origin of Data*

The analyses are based on prospectively collected data of 520 and 2,202 adult patients (age ≥ 18 yr) who underwent general anesthesia with volatile anesthetics. The data of 520 patients are a subset of the 1,107 patients of the previous survey in Oulu 7; the data of 2,202 patients were taken from two other studies conducted in Wuerzburg. 8,10The latter studies applied the same eligibility criteria as the present study (table 1), whereas the Oulu survey initially included a broader spectrum of patients (covering for example children or those receiving regional anesthesia). The distribution of patient characteristics and other variables are presented in table 2.

*Anesthesia*

All selected patients received an inhalational anesthetic technique as previously described. 7,8This included a benzodiazepine for premedication on the morning of the operation, induction with thiopental 3–5 mg/kg and either fentanyl up to 2 μg/kg or alfentanil up to 20 μg/kg, and the use of a volatile anesthetic (isoflurane, enflurane, or sevoflurane). No prophylactic antiemetics were given. Postoperative pain was treated with nonsteroidal analgetic drugs or opioids such as oxycodone or tramadol if needed (table 2).

*Outcome*

Although both centers originally performed their studies without knowledge of each other, the assessment of the outcome was similar. Postoperative nausea was assessed at 2 h on a binary scale (yes/no) by a trained nurse and at 24 h on an 11-point numeric scale (0–10) by a trained physician (the principal investigator of each center or one or two of his or her colleagues). Patients were considered nauseated if they responded to the question, “Are you or have you felt nauseated in the last 2 h?” or if postoperative nausea was reported to be greater than zero on the 11-point scale during the 24-h assessment with the question, “Have you felt nauseated since your discharge from the postanesthetic care unit and if so, what would be the average level of nausea you have felt until now on a 0 to 10 scale?” For the same intervals the number of episodes of postoperative vomiting was recorded. Again, patients were considered to have vomited if postoperative vomiting occurred at least once within the first 2 h or within the following 22 h. Patients who had either postoperative nausea or postoperative vomiting in either of these two periods were considered to have had PONV. PONV was considered as a binary outcome to be applicable to logistic regression analysis.

*Predictors*

The following variables were considered in the analysis: gender (female = 1, male = 0), age (< 50 yr = 1, ≥ 50 yr = 0), smoking status (nonsmoker = 1, smoker = 0), MS or PONV in the patient history (yes = 1, no = 0), duration of operation (< 60 min = 0, ≥ 60 min = 1), use of postoperative opioids (yes = 1, no = 0), and type of surgery (orthopedic, ophthalmologic, otolaryngologic, laparoscopic, laparotomic, and other). Possible one-way interactions were also evaluated. Other variables (*e.g.* , body mass index, the type and dosage of volatile anesthetics), which previously have been shown not to contribute significantly to the prediction of postoperative nausea or postoperative vomiting, 6–8,11were not considered in the current analysis.

*Analysis*

The most predictive factors were chosen by fitting a logistic regression model using a forward selection procedure (*P* < 0.05 to enter). In this model the estimated probability of PONV, denoted by *P* , depends on the score *S* ^{coeff}according to the formula

in which *S* ^{coeff}=*b* ^{0}+*b* ^{1}*x* ^{1}+…+*b* ^{k}*x* ^{k}is a weighted sum of the values *x* ^{1}, …, *x* ^{k}of *k* risk factors or predictors, each coded as 1 if present and 0 if absent in a patient, with *b* ^{1}, …, *b* ^{k}as the weights or estimated regression coefficients, each describing the log-odds-ratio associated with the corresponding factor (so that the corresponding odds ratio is obtained OR^{j}= exp(*b* ^{j}) for factor *j* ). *b* ^{0}is the intercept term describing the baseline log-odds of PONV, that is, *P* ^{0}=(1 + exp(−b^{0}))^{−1}is the estimated baseline risk of PONV in a patient with no risk factors.

To estimate the discriminating power of a chosen model, a receiver operating characteristic (ROC) curve was plotted. A ROC curve demonstrates the relationship of sensitivity and specificity at various points or decision criteria; that is, at what level of the score patients will be classified as potential vomiters or nonvomiters. The areas under the ROC curves (AUCs) were calculated as previously described 8and are estimates of how well patients who vomited will be discriminated from patients who did not vomit by the score (discriminating power). An AUC of 1.0 would represent a perfect discrimination; an AUC of 0.5 refers to a case with no discrimination at all. The 95% confidence intervals of the AUC were approximated according to the formula

where *m* is the size of the smaller of the two groups: those with postoperative vomiting and those without postoperative vomiting.

The calibration 12or accuracy of a score in predicting the probability of PONV applied to the patients of the other center was evaluated by fitting a linear regression model relating the predicted probabilities and the observed proportions of PONV in five groups sorted by increasing predicted probabilities. The slope and the intercept of the fitted regression line show whether the score generally or in a certain range under- or overestimates the occurrence of PONV. Given that the relation is truly linear a slope of 1 (45 degrees) with an intercept of 0 represents perfect calibration.

In order to answer the questions posed in the introduction, the following approaches were chosen: For each center a score (generically denoted as *S* ^{coeff}) based on the regression coefficients of the fitted logistic model (according to formula [1]) was developed to estimate the probability of PONV following the same principles as previously described for postoperative vomiting. 8A score derived in that way from the data collected in Oulu is identified as *score O* ^{coeff}; a score derived from the data collected in Wuerzburg is shown as *score W* ^{coeff}. The discriminating power of both scores was tested by plotting ROC curves and calculating their AUCs. This calculation was applied to both the data from which the score was derived and the data of the other center for comparison.

Two corresponding simplified scores were constructed, each based on equally weighted factors instead of the estimated logistic coefficients (*score O* ^{fact}and *score W* ^{fact}). Equally weighted factors means that each factor that has been shown to be significant in the score derived from the logistic regression analysis was given a coefficient of 1, leading to the following type of score:*S* ^{fact}=*x* ^{1}+…+*x* ^{k}. Each factor contributes 1 to this score if present and 0 if absent in a patient. Hence, the number of risk factors present provides the individual value of this simplified score. Again, ROC curves were plotted and the AUCs of the simplified scores were compared with the AUCs achieved with the scores based on regression coefficients. Thus, a total of 2 × 2 × 2 = 8 AUCs were calculated.

The simplified scores were each entered in a second procedure as a linear variable in a logistic regression model on their original data set so that for each risk group an expected incidence *P* ^{fact}of PONV (based on a simplified score *S* ^{fact}) could be estimated according to the formula

where *a* ^{0}and *a* ^{1}are estimated regression coefficients pertaining to this prediction model. The patients of the other center were classified according to the simplified risk score and in five ordered groups the theoretical incidences were plotted against the actual incidences in the appropriate calibration curves.

To ensure an equal representation of both centers all the 520 patients from Oulu and 520 patients randomly chosen out of the 2,202 from Wuerzburg were included in a combined data set of 1,040 patients. According to the method previously described the estimated regression coefficients of the most relevant factors for the prediction of PONV, as emerging from the combined data, were used to develop a new risk score (*score OW* ^{coeff}), and a variable indicating the origin of the center was introduced to assess the remaining potential impact for the prediction of PONV. Finally, *score OW* ^{coeff}was simplified by forming the equally weighted sum score with the four most relevant factors (*score OW* ^{fact}), and its discriminating power was examined by calculating the AUC of the ROC.

**Results**

The prevalence and distributions of most factors, as well as the incidence of PONV, appeared to be different between the two centers (table 2). Only the duration of surgery, the age of the patients, and the proportion of nonsmokers were similar. The incidence of PONV still appeared to be different when corrected for any *single* variable such as female gender, prior history of MS or PONV, nonsmoking, postoperative opioids, and type of operation (table 3).

The most predictive risk factors derived from Oulu were female gender, prior history of MS or PONV, nonsmoking, and use of opioids (table 4). For all these risk factors the adjusted odds ratios in the multivariate model were approximately 2. For Wuerzburg the important risk factors again included female gender, prior history of MS or PONV, and nonsmoking but not the use of postoperative opioids. In contrast to Oulu, age, duration, and the interaction of male gender and prior history of MS or PONV were additional significant predictors. If ROC curves were plotted by applying the developed risk scores to its original data, the AUC of *score O* ^{coeff}and *score W* ^{coeff}, that is, those based on estimated logistic regression coefficients, were 0.69 and 0.75, respectively (table 5). If the scores were applied to the other center, the AUC of the *score O* ^{coeff}and *score W* ^{coeff}were 0.69 and 0.65, respectively (table 5). Thus the *score O* ^{coeff}and *score W* ^{coeff}resulted in a mean AUC of 0.69 and 0.70 if applied to both data sets.

The AUC of the simplified scores, that is, those based on counting the number of significant risk factors present, was similar to the AUC of the previously described scores and did not lead to a relevant decrease in discriminating power (table 5). The simplified *score O* ^{fact}and *score W* ^{fact}applied to the data of Wuerzburg and Oulu resulted in calibration lines having slopes of 0.91 and 0.86 and intercepts of 0.01 and 0.13, respectively (fig. 1).

The analysis of the combined data set resulted in five significant predictors (table 4). If a center variable was included in a logistic model the odds ratio (lower–upper 95% confidence limit) was 1.06 (0.71–1.59) and thus had practically no impact on the predicted incidence of PONV (table 6). For the construction of *score OW* ^{fact}the one-way interaction of male gender by prior history of MS or PONV was dropped, as this did not have a significant impact on the AUC (data not shown). Thus, the remaining four risk factors for *score OW* ^{fact}were female gender, prior history of MS or PONV, nonsmoking, and the use of postoperative opioids. As depicted in the ROC curve this score leads to an AUC of about 0.75 with a best overall predictive value of about 0.71 (fig. 2). According to *score OW* ^{fact}the estimated probability of PONV was 10, 21, 39, 61, and 78 in the joint data set if no, one, two, three, or four risk factors were present.

**Discussion**

The analysis shows that a risk score for PONV derived in one center could be applied to another center, and that a simplification of such a score, based only on counting how many of the four significant risk factors were present, had a similar discriminating power to a score based on regression coefficients estimated in a logistic regression model. In the combined data set, the four most important predictors were female gender, prior history of MS or PONV, nonsmoking, and the use of postoperative opioids. Although the distribution of risk factors as well as the incidences of PONV, even if adjusted for any *single* variable, appeared to be quite different in both centers, it could be demonstrated that the center had no impact on the incidence of PONV if the *four* relevant predictors were all taken into account. Thus, the final score may reliably predict PONV in a wide spectrum of patients undergoing various types of surgery during inhalational anesthesia.

Special attention was given to the type of operation. Surely there is an *association* between the type of operation and PONV. 1–3,13However, its causal impact on PONV remains questionable, because a high incidence of PONV after certain operations might well be caused by the involvement of high-risk patients (*e.g.* , in gynecologic laparotomies, the patients are females and are also more likely to receive postoperative opioids). Our analysis of the combined dataset confirms that the type of operation is not a strong independent predictor for PONV, which is consistent with our previous studies. 7,8Nevertheless, we reviewed the literature on PONV in an attempt to find evidence for the assumed impact of the type of operation on PONV. 14–18However, apart from the observation that some operations apparently are being associated with a higher incidence of PONV than others, it was and still is unclear whether this was caused by the different anesthetic agents, 19the different lengths of operation, 15or the operation itself. 17Even large prospective studies using logistic regression analyses have conflicting results. 6,11In view of our results, it seems more appropriate to base risk prediction on the described risk score rather than a certain type of operation, as there is not sufficient evidence for an assumed *causal* impact of the type of operation on PONV.

The use of *postoperative* opioids as a predictor for PONV may be questioned. We have included this predictor in the analyses because the use of narcotics in daily practice is often foreseeable and depends very much on the institutional analgetic policy as well as on the duration and type of operation. 20

Although the raw data appeared to be quite different, there were three factors that were significant in both centers, namely female gender, prior history of MS or PONV, and nonsmoking. The use of postoperative opioids was only significant in Oulu but not in Wuerzburg. This may well be a result of different approaches to postoperative pain management. In Oulu more patients received postoperative opioids compared with Wuerzburg (80%*vs.* 10%) and the analgesic dosage was much higher (20 mg oxycodone *vs.* 100 mg tramadol). The discriminating power of *score O* ^{coeff}appeared to be independent of the center, whereas the discriminating power of *score W* ^{coeff}was better if it was applied to its own data set than if it was applied to data from the other center. One reason might be that more risk factors were derived from Wuerzburg than from Oulu, which may also explain why the mean AUC of the scores from Wuerzburg was 0.72 and thus slightly higher compared with the score from Oulu with an AUC of 0.66.

It could be demonstrated for both centers that the simplification of a score, by counting the number of the relevant risk factors, had a discriminating power similar to the score based on regression coefficients in the fitted logistic model. This is an important consideration if the score is to be applied to routine anesthetic practice. The only disadvantage of such a simple scoring system is that the likelihood of PONV cannot directly be derived from the number of risk factors. Thus, the simplified score was again processed in a logistic model so that the theoretical risks could be calculated. If these were related with the actual incidences in the other data set they revealed good calibration curves, irrespective of the center. Because the two studies were performed in two different countries, we expected some center effect because of differences in the patient population 4or the manner of treatment that were not accounted for by the variables in our analysis. In addition, a marked center effect has been reported in the multicenter study of Cohen and colleagues 6; however, their data may have been skewed because prophylactic antiemetic usage was not recorded. Because our study did not include the use of prophylactic antiemetics, we are inclined to conclude that a hypothesized center effect is negligible. The established patient-related factors seem to be most important even across centers from different countries and can explain the different incidence of PONV.

The four risk factors included in the final simple sum score were female gender, prior history of MS or PONV, nonsmoking, and the use of postoperative opioids. If no or only one risk factor is present the incidence of PONV may vary between about 10% and 21%, whereas if at least two risk factors are present it may rise to between 39% and 78%. As a consequence, a modification or change of the anesthetic technique might be considered if two or more risk factors are present. One approach would be prophylactic antiemetic treatment, because recent metaanalysis implies that the efficiency (in terms of the number needed to treat) may only be reasonable in high-risk patients. 21,22Another approach would be to avoid volatile anesthetics entirely by using a total intravenous anesthetic technique, which has been shown to be associated with significantly less PONV. 23,24Finally, this score might be useful for patient selection in antiemetic trials.