Midazolam is used commonly for sedation in the surgical intensive care unit. A suboptimal dosing regimen may lead to relative overdosing, which could result in delayed extubation and increased cost. This multicenter trial characterized midazolam pharmacodynamics in patients recovering from coronary artery bypass grafting.

Three centers enrolled 90 patients undergoing coronary artery bypass grafting. All patients received sufentanil and midazolam via target-controlled infusion. After surgery, midazolam was titrated to a Ramsay sedation score of 5 for 2 h and then decreased to maintain a sedation score of 3 or 4 for at least another 4 h. Pharmacodynamic parameters were derived using NONMEM. The model was cross-validated to test performance.

The probability of a given level of sedation was related to the midazolam concentration by this equation: P(Sedation > or = ss) = Cn/(Cn + C(50,ss)n), where ss is the sedation score, C is the sum of the midazolam concentration and a term reflecting the dissipating effect of anesthesia: C = [midazolam] + theta x e(-Kt), where theta = 256 ng/ml and K = 0.19 h(-1). C(50,ss) values for Ramsay scores of 2 to 6 were 5.7, 71, 171, 260, and 659 ng/ml, respectively. The model predicted 57% of the data points correctly and 88% within one sedation score.

Despite previous reports of high interindividual variability in midazolam pharmacodynamics in patients in the surgical intensive care unit, these cross-validation results suggest that, when midazolam is administered using a target-controlled infusion device, the level of sedation can be predicted within 1 sedation score in 88% of patients based on the target midazolam concentration and the time since the conclusion of the anesthetic.

SEDATION is often necessary during the period of controlled ventilation after coronary artery bypass grafting (CABG). Inadequate sedation during controlled ventilation may lead to increased cardiopulmonary and metabolic demands, resulting in hypertension, tachycardia, arrhythmia, myocardial ischemia, tachypnea, and hyperventilation. [1]The principal goal of postoperative sedation is to minimize these physiologic and psychologic responses to stressful stimuli. Midazolam is a commonly used benzodiazepine for short-term sedation in critically ill patients. [2]It is a water-soluble benzodiazepine following a single intravenous bolus injection, midazolam rapidly crosses the blood-brain barrier with an onset of drug effect within 2 to 2.5 min after typical sedative doses. [3]The principal active metabolite of midazolam, 1-hydroxymidazolam, is nearly as potent as the parent compound. [4]Previous studies suggested that long-term midazolam infusions result in accumulation of 1-hydroxymidazolam glucuronide in patients in the intensive care unit (ICU) who have renal insufficiency. [5]

Midazolam pharmacodynamics have been studied widely using tools such as the electroencephalography, Digit Symbol Substitution Test, and saccadic eye movement, [6-11]but clinical measures of drug effect have not been applied to patients in the ICU who are sedated with midazolam.

Although midazolam infusions are used commonly to sedate patients who require mechanical ventilation, there are no well-established dosing recommendations. Wide ranges of midazolam plasma concentrations have been associated with adequate sedation, [12]and experienced clinicians can make empiric adjustments effectively. Nonetheless, suboptimal dosing may lead to inadequate or excessive sedation.

This study was designed to develop rational dosing guidelines for midazolam administration in patients requiring mechanical ventilation after CABG. The approach was to develop detailed pharmacokinetic and pharmacodynamic models for midazolam and to integrate the results in dosing guidelines. The study used a multicenter design to assess the typical pharmacokinetics and their variability. The pharmacokinetic analysis can be found in a companion article in this issue. [13]This article describes the pharmacodynamic analysis of midazolam in this population and integrates the pharmacodynamic and pharmacokinetic models to derive dosing guidelines for patients requiring mechanical ventilation after elective CABG.

## Materials and Methods

### Study Design

After we received institutional review board approval, we obtained written informed consent from 90 adult patients who required mechanical ventilation in the surgical ICU for a minimum of 6 h after elective CABG surgery. The methods are presented in detail in another article in this issue. [13]For brevity, only those methods relevant to the pharmacodynamic analysis are repeated.

Equal numbers of patients were recruited at three centers: Duke University Medical Center (Duke), Emory University Medical Center (Emory), and the VA Palo Alto Health Care System (PAVA). Patients with neurologic disorders, severe liver or renal disease, intraoperative complications, a history of recent drug abuse or long-term benzodiazepine use, and a history of allergy to benzodiazepine were excluded.

Midazolam (Versed; Roche Pharmaceuticals, Nutley, NJ) and sufentanil (Sufenta; Janssen, Titusville, NJ) were administered intravenously to all patients during and after operation using a target-controlled infusion (TCI) device. Midazolam and sufentanil were infused based on pharmacokinetic models reported by Buhrer et al. [8]and Hudson et al., [14]respectively.

All patients followed a standardized perioperative anesthetic and invasive monitoring protocol. They were administered 0.5 to 2 mg lorazepam orally and 5-10 mg methadone orally 1 h before surgery. Midazolam was administered using the TCI device to provide sedation when invasive monitors were inserted. Anesthesia was induced with midazolam and sufentanil infusions at target plasma concentrations of 150 and 1-2 ng/ml, respectively. Supplemental anesthesia was provided using isoflurane or enflurane to a maximum of 1%, as necessary. The sufentanil and midazolam infusions were suspended during transport from the operating room to the ICU.

At arrival in the ICU, the sufentanil infusion was restarted at a target concentration of 0.15 ng/ml. Additional analgesia was provided, as necessary, by administration of 0.25 [micro sign]g/kg sufentanil with the TCI device. Patients were allowed to regain consciousness, and sedation scores were evaluated using a modified version of the Ramsay sedation scale (Table 1). [15]When patients emerged from anesthesia with a sedation score of 5 or less, the midazolam TCI infusion was restarted. The target concentration of midazolam was then titrated up every 15 min as necessary to reach and maintain a sedation score of 5. If the sedation score was 6, the target concentration was decreased by 25-50 ng/ml every 30 min until a sedation score of 5 was achieved again. Patients were titrated to a midazolam target concentration that maintained a sedation score of 5 for at least 2 h. Subsequently, the target concentration was decreased every 30 min as clinically indicated to maintain a sedation score of 3 or 4. Midazolam infusions were continued after operation for a minimum of 6 h in all patients. The midazolam and sufentanil infusions were discontinued before tracheal extubation.

### Pharmacodynamic Analysis

Data Used for the Analysis. As described previously, plasma samples were assayed for midazolam and its main metabolite, 1-hydroxymidazolam. [13]Sufentanil was not assayed. Computer files generated by the CACI II [16]and STANPUMP[dagger, dagger] programs provided detailed descriptions of the drug infusions. These computer files were used to provide precise predictions of the midazolam and sufentanil concentrations for this analysis.

The pharmacodynamic models were based on model-predicted plasma midazolam and sufentanil concentrations instead of the concentrations actually measured. For midazolam, the concentrations were calculated from each patient's post hoc Bayesian pharmacokinetic estimates, based on the pharmacokinetic model reported in out companion article. [13]If the models are accurate, post hoc Bayesian predictions of concentration can filter out much of the intraindividual measurement noise. Thus, the post hoc Bayesian prediction is likely to be closer to the “true”(and unknowable) concentrations than the actual measured concentrations. For sufentanil, the concentrations were calculated from the pharmacokinetics reported by Gepts et al. [17]and were not adjusted for this clinical setting or individualized for each patient. The Gepts model of sufentanil was not available when this study was performed, therefore, the sufentanil delivery was based on the pharmacokinetics reported by Hudson et al. [14]Subsequently, Gepts et al. [17]published sufentanil pharmacokinetics based on 2,880 min of blood sampling. The extended sampling in the study by Gepts et al. [17]makes it more appropriate for our current study, therefore we chose it for the sufentanil simulations.

Pharmacodynamic Models. The pharmacodynamic model related the probability of a particular level of sedation to plasma drug concentrations. This relation followed a sigmoidal form:Equation 1where P(Sedation >or= to ss) is the probability that the sedation score would be equal to or more than (i.e., “deeper than”) a given sedation score, C is the drug concentration, C^{50}is the concentration associated with 50% probability, and n is the steepness of the concentration-probability relation, also called the “Hill coefficient.” The pharmacodynamic model actually consisted of five related curves, one each for sedation scores of 2, 3, 4, 5, or 6.

(Equation 1) does not predict a sedation score directly, but rather a probability of being at or deeper than a particular level. At any given concentration there is a finite probability of being at sedation scores 1, 2, 3, 4, 5, and 6 when the sum of those probabilities is 1. At any given concentration, it is possible to calculate the probability of a particular sedation score (ss) from the family of curves for sedation scores equal to 1, 2, 3, 4, 5, and 6:Equation 2

For example, in Equation 2, the probability of being at a sedation score of 3 is the probability of being at a sedation score >or= to 3 (Equation 1) minus the probability of being at a sedation score >or= to 4 (Equation 1). We defined the actual sedation score “predicted” by the model for any concentration as the sedation score with the highest probability.

The likelihood of the model is the product of the probability of each observed sedation score. The objective function (Obj, -2 times the logarithm of the likelihood) was minimized using the nonlinear regression program NONMEM Version IV (University of California, San Francisco, CA).[double dagger, double dagger] The treatment of interindividual variability is described in appendix 1.

The simplest interpretation of the concentration term, C, in Equation 1is that it represents the plasma midazolam concentration. However, other representations of C are possible, including that “concentration” represents midazolam plus sufentanil, midazolam plus the residual anesthetic effects, and so on. The underlying hypothesis is that sedation is caused by a drug, a combination of drugs, or some perioperative physiology (e.g., cooling) that can be modeled as if it were a drug. Table 2summarizes several different interpretations of concentration, C. In the simplest model (model A in Table 2), C is the post hoc Bayesian estimate of midazolam concentration. An alternate model is to represent concentration as some combination of midazolam and sufentanil (models B and C in Table 2). Another choice is to assume that there is a residual anesthetic effect, which might include sufentanil, isoflurane, or enflurane; rewarming; the residual effects of the anesthetic; premedication; and perhaps even acute tolerance. These can be grouped together as a “virtual drug” that is present in its highest concentration when the patient arrives in the ICU and washes out over time, as shown in model D in Table 2. The last model in Table 2(model E) represents drug concentration as a combination of midazolam, sufentanil, and the residual anesthetic effect (the virtual drug).

In addition to exploring a series of models for concentration, we investigated the role of the covariates age and study center. These covariates were evaluated by extending the models shown in Table 2to include age and center as covariates of C^{50}and evaluating the improvement in the NONMEM objective function. The role of 1-hydroxymidazolam was investigated by including additive and synergistic interaction models (similar to that described in Table 2for sufentanil) and evaluating the improvement in the NONMEM objective function.

Assessing the Goodness of Fit. As noted, the pharmacodynamic fits were obtained using midazolam concentrations based on individual post hoc Bayesian predictions of individual pharmacokinetics. However, the clinician treating the postsurgical patient will not have individualized pharmacokinetic estimates on which to make therapeutic decisions. Therefore, we assessed the goodness of our model fit using the typical values of the midazolam pharmacokinetics, as reported by Zomorodi et al. [13]in the companion article in this issue, which will be available to clinicians using a TCI device. These values are based on the population prediction of the midazolam concentrations, ignoring interindividual variability in the pharmacokinetics.

Classic tools [18,19]for assessing the goodness of fit for pharmacokinetic models (e.g., residual error plots, median performance error, absolute median performance error, root mean square error) cannot be used to assess the goodness of a probability model with a polychotomous (multiple discrete) response. Therefore, we assessed goodness of fit using the NONMEM objective function, the percentage of data points predicted correctly, and a newly described measure of performance, the prediction probability (P^{k}). [20]These measures of goodness of fit are described in detail in appendix 2.

Graphic Assessment of the Goodness of Fit. Visual assessment of the goodness of fit is difficult because the observations are sedation scores, not the probability of the scores as predicted by the models. The pharmacodynamic models predicted the probability of a particular observation and not the observation itself.

Therefore, residual errors could not be calculated and residual plots, a standard pharmacokinetic analysis tool, could not be constructed. The NONMEM objective function provides one measure of goodness of fit, but it has no clinical interpretation.

We created a “measurement” of the probability of each observation to compare with the probabilities predicted by the different models. The concentrations were rank ordered. Each concentration was considered together with its four lower and four higher neighbors. At each concentration, the probability of a given observation (e.g., sedation >or= to 3) was “measured” by calculating the fraction of the nine concentrations (the concentration of interest and its eight neighbors) for which the level of sedation was equal to or deeper than the given level of sedation. This “measurement” of the probability at each concentration was then compared with the prediction of the model. We chose nine concentrations as the basis for this measurement of probability because it limited the span of concentrations averaged to produce the measurement. Had we used more concentrations to produce the measurement, we would have averaged concentrations over a larger range. This would be expected to introduce a positive bias (with the measured value greater than the actual probability) for concentrations less than C^{50}and a negative bias (with the measured value less than actual probability) for concentrations more than C^{50}.

To assign a prediction to each concentration, we calculated the probability of each level of sedation according to Equation 2as a function of concentration. These were then plotted against the actual sedation scores, again as a function of concentration.

Cross-validation is a convenient alternative to a prospective looking at the predictions of those observations used to create the model is misleading, because the results are favorably biased. Ideally, the model should be tested prospectively. Prospectively testing results from a large, multicenter study is expensive and time consuming. Cross-validation is a convenient alternative to a prospective study [21,22]that should provide a measurement of the expected performance of the model during identical experimental circumstances. To perform a cross-validation, the model is refitted with one subject of the data set excluded. The new parameters, called jackknife estimates, are used to predict the data in the excluded subject. Jackknife estimates are obtained for every subject. This procedure yields a nearly unbiased estimate of the predictive ability of the model.

Jackknife estimates were used to cross-validate the percentage of accurately predicted observations and the graphic analyses of goodness of fit. Models estimated from mixed-effects modeling (MEM) and “naive pooled data”(NPD) methods were compared using cross-validation. The jackknife approach was also used to calculate the standard error of the pharmacodynamic parameters from both the NPD and the MEM estimates, as described by Efron and Tibshirani. [22]These were compared with the NONMEM estimates of the standard errors of the parameters.

## Results

Of the 90 patients enrolled, 27 were excluded from the pharmacokinetic analysis. This analysis and the reasons for the exclusions are described by Zomorodi et al. [13]From the 63 remaining patients, nine sufentanil infusion profile data files were either corrupted or unavailable. The final data set included 1,329 observations of sedation scores (Table 3) from 54 patients (17 from PAVA, 18 from Emory, and 19 from Duke), yielding an average of 25 observations of sedation per patient. Bayesian predictions for midazolam concentrations in the surgical ICU ranged from 5 to 189 ng/ml. The average was 53 ng/ml. Ninety-one percent of the predicted midazolam concentrations ranged from 10 to 100 ng/ml. Population predictions for sufentanil concentrations ranged from 0.05 to 2.8 ng/ml. However, only two patients had predicted sufentanil concentrations more than 1 ng/ml. The average was 0.24 ng/ml. Ninety-three percent of the predicted sufentanil concentrations ranged from 0.1 to 0.5 ng/ml. Only 6 of 54 patients had detectable levels of 1-hydroxymidazolam.

The selection of the final model is discussed in appendix 3. The final model was Equation 6where C^{50}= 5.7, 71, 171, 260, and 659 for sedation scores >or= to 2, 3, 4, 5, and 6, respectively, and C = midazolam + 256e^{-019}t, where t is the time, measured in hours, since the patient arrived in the ICU. In other words, sedation is a function of midazolam concentration plus a virtual drug with an initial concentration on arrival in the ICU equivalent to a midazolam concentration of 256 ng/ml and an elimination half-life of 3.7 h (Table 4).

(Figure 1), top graph, shows the predicted probability of a sedation score >or= to 3 as a function of concentration, as defined by the final model. It also shows the “measured” probability at each observation, as described in Materials and Methods. The final model provided a nearly unbiased prediction of the probability of sedation being >or= to 3 as a function of concentration. The performance for other sedation scores was similar.

(Figure 2) shows the probability of sedation scores 2, 3, 4, 5, and 6 as a function of concentration calculated using Equation 2, and based on the NPD parameter estimates. Interestingly, at a concentration of approximately 200 ng/ml, the patient has a nearly equal probability of having sedation scores 3, 4, or 5. A sedation score of 4 is spread out over such a large concentration range that it never has the highest probability among all sedation scores, and thus it is never the sedation score predicted for any concentration.

(Figure 3) shows all 1,329 sedation scores from the 54 patients (vertical dashes) and the predictions of the pharmacodynamic model D (solid line), using the midazolam concentrations predicted by the population pharmacokinetic model. [13]The prediction goes through the central portion of each sedation score. At a concentration of 200 ng/ml, the prediction increases abruptly from a sedation score of 3 to 5, which is consistent with the nearly identical densities of observed sedation scores of 3, 4, and 5 at this concentration.

In the cross-validation, the pharmacodynamic model estimated using the NPD approach provided 56% correct predictions and 88% close (within 1 sedation score) predictions. The pharmacodynamic model estimated using the MEM approach provided 49% correct predictions and 83% close predictions. The cross-validation thus confirmed the choice of pharmacodynamic model D, estimated using the NPD approach.

## Discussion

This study investigated the relation between clinical effect and the plasma drug concentration in ventilated patients recovering from CABG surgery. A better understanding of this relation should help clinicians to design rational infusion regimens to provide sedation for their patients in the surgical ICU.

### Study Limitations

The pharmacodynamic analysis was complicated by several problems with the study design. First, the Ramsay scale used to assess sedation has limitations. As an ordered categoric scale, a sedation score of 2 represents a deeper sedation level than does a sedation score of 1. However, 1 is agitated, whereas 2 represents calm, presumably the baseline. There is no provision in the Ramsay scale for a patient who is sedated yet agitated, although such patients are not uncommon. In addition, a sedation score of 2 may be the true baseline for no drug effect, which would clearly be the case if the patients were not subjected to noxious stimulation. At the opposite end of the scale, the distinction between a sedation score of 5 and 6 is the response to a painful stimulus, such as deep pressure to the nail bed (used as a torture in medieval times). A high sufentanil concentration may blunt the response to this stimulation in an otherwise lightly sedated patient. Thus, the Ramsay scale blurs the distinction between pain and sedation. Finally, the Ramsay sedation scale uses three different types of stimulation: verbal, tactile, and painful. We assumed that these are rank ordered, as shown in Table 1, and Figure 3provides evidence that this is the case. However, it is not necessarily the case that these are rank ordered. The good performance of concentration as a predictor of Ramsay score in this study (P (^{K})= 0.84) may offer as much evidence for the validity of the Ramsay scale as a measure of sedation as it does for our measure of concentration as a predictor of sedation.

A second complication is that mental status naturally changes in the absence of drugs. Most persons alternate from a level of 2 (awake) to 3 (sedated) and 4 (asleep) during the day. Clearly, the level of sedation is not entirely a function of plasma concentration, as represented in models A, B, and C. Perhaps the success of model D, the virtual drug model, was partly due to the ability of this model to account for the change in alertness during the first postoperative night. Model D included a nonspecific drug effect that dissipated progressively. The dissipation may have represented, in part, the passage of the first postoperative night when the patients would have been asleep anyway.

The Ramsay scale is subjective. We did not find a center effect, suggesting that the scale was consistently applied among study centers. However, individual centers reported considerable discussion about distinctions between levels 3, 4, and 5. This is clear from Figure 2, which shows that at a concentration of approximately 200 ng/ml it was difficult to distinguish among these three clinical states.

The study was complicated by the correlation between sufentanil and midazolam concentrations and the clustering of 93% of the sufentanil concentrations between 0.1 and 0.5 ng/ml. The interaction between opioids and benzodiazepines is well documented. [23-28]Modeling such an interaction necessitates that we characterize the effects of the two drugs administered together and then model the effect of each drug in the absence of the other. In this study, the drugs were administered concurrently to all patients. At arrival in the surgical ICU, the sufentanil and midazolam concentrations were initially high, both of which decreased progressively. In addition, the narrow span of sufentanil concentrations did not permit exploration of the response surface. Because opioids and benzodiazepines are known to interact synergistically, it is likely that part of the drug effect attributed to midazolam in our results is actually the contribution of sufentanil, just as part of the effect of the virtual drug is probably sufentanil.

Only six patients had detectable plasma levels of 1-hydroxymidazolam, and the levels were very low. This may account for the lack of effect for 1-hydroxymidazolam in our models. 1-Hydroxymidazolam has been shown clearly to have benzodiazepine properties when administered intravenously. [4]The lack of effect seen in our study indicates a lack of power to detect a subtle effect with this study design and is not a reflection on the potency of 1-hydroxymidazolam.

### Measures of Goodness of Fit

We tried to “measure” the probability of each observation, as shown in Figure 1. Even with this approach it was not possible to compute meaningful residuals from this figure, because the precision of each measured probability depends on the number of data points combined to produce the measurement. Increasing the number of data points will produce less apparent variability, at the expense of increased bias. Thus, Figure 1overstates the variability in the concentration-versus-response relation, because only nine points are combined at each measured probability.

### Dosing Implications

The final model included the residual effect of anesthesia as a simple monoexponential decay without incorporating likely covariates such as body temperature, anesthetic duration, or models of the individual drugs administered during operation. Because all three centers followed a uniform study protocol, the model for anesthetic wash-out developed for this study may not describe the time course for different anesthetic and surgical practices. In addition, the pharmacokinetic model of midazolam reported by Zomorodi et al. [13]is based on data collected primarily during the first 24 h. Therefore, extrapolation of the results beyond 24 h may be unreliable. However, we note that the final model predicted that the patient arrived in the surgical ICU showing the effects of both midazolam and a virtual drug. At arrival in the surgical ICU, the effect of the virtual drug was the equivalent of 256 ng/ml midazolam. The half-life of this virtual drug was 3.7 h.

It is still possible to derive insight from this model. The midazolam concentration necessary to maintain a particular sedation score will vary over time as the residual effect of anesthesia dissipates. With the anesthetic regimen outlined in this protocol, the half-life of the residual effect was 3.7 h. Figure 4shows how the probability of sedation score versus midazolam concentration varies with time. Figure 5depicts the midazolam plasma concentrations (top) and the corresponding midazolam infusion rates (bottom) needed to achieve a desired sedation score as a function of time.

(Table 5) illustrates midazolam dosing regimens necessary to maintain sedation scores of 3 (C^{50},3 = 71 ng/ml) and 5 (C^{50},5 = 260 ng/ml) after CABG surgery. Table 5is based on simulations performed with STANPUMP. Three midazolam infusion rates are calculated:(1) the rate to achieve a sedation score of 3 in half of the patients (C^{50},3), (2) the rate to achieve a sedation score of 5 in half of the patients (C^{50},5), and (3) the rate to achieve a sedation score of 5 in the most sensitive patient (the lowest midazolam concentration associated with a sedation score of 5 in Figure 3). The midazolam infusion rates were calculated to obtain the concentrations predicted by the model of midazolam that included a virtual drug wash out after anesthesia (model D: midazolam + virtual drug). Table 5shows the recovery times for these three cases. For a sedation score of 3, we calculated the time for a 50% reduction in plasma drug concentration after discontinuing the infusion (the “context-sensitive halftime”). [29]We computed the context-sensitive half-time instead of the time necessary to reach C^{50},2 after discontinuing the infusion, because we believe the estimate of C^{50},2 is unreliable as a result of the small number of data points for a sedation score of 1 (Table 3) and the large standard error of C^{50},2 (Table 6). For the two regimens based on a sedation score of 5, we calculated the time necessary to reach a sedation score of 3 after discontinuing the infusion on the basis that a mental status of 3 was suitable for tracheal extubation.

Based on Table 5, to maintain a sedation score of 3, the midazolam infusion should be started at 2 mg/h approximately 6 h after arrival in the surgical ICU. The 6-h delay is necessary to permit the patient to awaken from anesthesia and to reach a sedation level of 3. After starting the infusion, the accumulation of midazolam is balanced by the decrease in the anesthetic effect, eliminating the need for further adjustment of the infusion rate, except for the dosage adjustment necessary because of interindividual variability. After the infusion is discontinued, the time for the concentration to decrease by 50%(the context-sensitive half-life) increases from 1.5 h after a 12-hour infusion to 4.2 h after a 72-hour infusion.

Based on Table 5, to maintain a sedation score of 5, the midazolam infusion at arrival in the surgical ICU should be started at a rate of 3 mg/h. During the ensuing 3 h, the rate is increased to 8 mg/h to compensate for the loss of residual virtual drug effect (e.g., wash out of anesthesia, patient rewarming, and so forth). During the next 3 days, the rate needs to be decreased gradually to 6 mg/h. This is the rate to maintain C^{50}, the concentration associated with sedation scores of 5 or higher in half of the population. Although the infusion rate may seem high, this is readily understood in light of the Ramsay sedation scale. By definition, patients with a score of 5 are unresponsive to glabellar tap, and patients with a score of 6 are unresponsive to nail bed pressure. These are noxious stimulations, particularly the nail bed pressure. To maintain a score of 5 or 6 with midazolam means maintaining a concentration that is high enough to block the response to a noxious stimulation. Usual clinical practice would be to use an analgesic, rather than to administer a high dose of a benzodiazepine, to blunt any response to pain. Thus, the infusion rate shown in Table 5would likely need to be reduced if opioids were administered concurrently. Individual adjustments will be necessary based on pharmacodynamic variability, as indicated by the spread of concentrations in Figure 3, and the additional use of opioids, as noted before. For the first 12 h, the model predicts that it will take 4-6 h for the level of sedation to decrease from a sedation score of 5 to a sedation score of 3 after the infusion is discontinued. Beyond 24 h, the model predicts that 9-16 h will be necessary for the level of sedation to decrease to a sedation score of 3.

Using a conservative dosing regimen to maintain the lowest midazolam concentration at which the predicted sedation score was 5 (210 ng/ml in Figure 3), the midazolam infusion on entry to the surgical ICU is still started at 3 mg/h but only increases to 6 mg/h. During the first day, the model predicts that it will take 4-6 h for the sedation score to decrease to 3 after the infusion is discontinued. After the first day it will take approximately 12 h for the sedation score to decrease from 5 to 3 after the infusion is discontinued.

We noted this before but now repeat it for emphasis: The dosing guidelines and predicted time course of drug effect are based on a specific anesthetic protocol, described in the materials and methods section, and a pharmacokinetic model developed from only 24 h of samples in most patients. The predictability of the model may be very different when it is used with different anesthetic regimens. In addition, the extrapolation to infusions more than 24 h shown in Table 5is included to provide a plausible pharmacokinetic and pharmacodynamic explanation for the slow emergence sometimes observed after long-duration midazolam infusions in the surgical ICU and is not intended to suggest that these model predictions serve as a substitute for titration to individual patient responses.

In conclusion, the pharmacokinetics and pharmacodynamics of midazolam after CABG surgery can be used to predict the time course of sedation and the infusion rates necessary to maintain a given mental status. Modeling sedation necessitates modeling the dissipation of other anesthetic effects, which can be accomplished with a monoexponential model of dissipating anesthetic drug effect. Future studies may provide more complete models of dissipating anesthetic effect to account for differences in anesthetic drugs, patient temperature, age, and other covariates. Despite considerable intersubject variability in pharmacodynamic response, and the use of a subjective clinical assessment of sedation, it is possible to accurately predict the level of sedation in more than half of the observations, based solely on the infusion rates of midazolam and the time since arrival in the surgical intensive care unit. Population pharmacokinetic and pharmacodynamic models can be extended to clinical practice, by using target-controlled infusion devices, as in this study, or through dosing nomograms developed from the study results.

## Appendix 1: Mixed-Effects versus Naive Pooled Data Modeling Approaches

Interindividual variability in C^{50}was modeled as a log-normal distribution about the typical estimates of C^{50}for individuals for each sedation score (ss):Equation 3where (C^{50},ss)(^{j}) is the estimate of the concentration associated with 50% probability of a given sedation score, ss, in the jth individual, (C^{50},ss)^{TV}is the typical value (TV) of the estimate of the concentration associated with 50% probability of a given sedation score in the population, e is the base of natural logarithms, and [Greek small letter eta]^{j}is the value [Greek small letter eta] in the jth patient, where [Greek small letter eta] is a random variable with a mean of 0 and a variance of [small omega, Greek](^{2}). The same [Greek small letter eta] was applied to each value of C^{50}, so only a single [small omega, Greek]^{2}term was estimated by NONMEM. Interindividual variability was not modeled on n, the steepness factor in Equation 1. In addition, the same n was applied to the Equation forthe probability of each sedation score, forcing the family of curves (i.e., P(Sedation >or= to 2), P(Sedation >or= to 3), and so on) to be parallel.

Modeling interindividual variability as 0 is the NPD approach that treats the observations as though they arose from a single person. Modeling the interindividual variability as a distribution is the method typically taken in the MEM approach. [30]The MEM is computationally intensive but has the advantage of estimating both the interindividual and the intraindividual variability. It also depends more closely on modeling assumptions.

In several pharmacokinetic analyses, the NPD approach has been compared with the MEM approach. [18,19,31-33]These comparisons suggested settings in which the NPD approach offered some advantages. However, pharmacokinetic models are linear with respect to dose, and it is not surprising that the NPD approach might work well with linear models when the data have not been systematically censored (such as by having many observations below the limits of detection of the assay). Pharmacodynamic models are intrinsically nonlinear: Doubling the dose may produce far more, or less, than a doubling of effect, depending on where the patient is along the concentration-response curve. With nonlinear models, the MEM approach is expected to produce the shape of the concentration-response curve in the typical person, whereas the NPD approach is expected to produce an average concentration-response curve in the population. The NPD approach would be expected to estimate a more shallow slope (smaller n in Equation 1) than the MEM approach. With nonlinear models, the MEM model better predicts the typical individual, at the cost of less accurately predicting the observations in most individuals. The NPD approach better predicts the observations in most individuals, at a cost of estimating a concentration-response relation that is not representative of any individual.

In this case, the MEM and NPD models were not very different (Figure 1and Table 4). Because of the bias evident in Figure 1and the better predictive ability of the NPD model (Table 4), we preferred the NPD results. In theory, we could have used the structural and variance parameters from the MEM approach to simulate a series of curves for many individuals and from this calculated the average response in the population. However, the interindividual variability was so great that we had little confidence that this approach would work better than estimating the average concentration-response relation directly from the data.

Readers should not interpret these results as a recommendation for the general use of NPD models for nonlinear pharmacodynamic models. The model estimated by the NPD approach will not resemble any individual. The MEM approach is preferable if the response in the typical individual is more important than the average response of the population, if estimates of interindividual variability are needed, or if the data are censored systematically. We try both approaches and then choose between them in light of the study goals and measures of model performance. The relative merits of mixed-effect, NPD, standard two-stage, and other population modeling approaches have been recently reviewed. [30]

## Appendix 2: Measures of Goodness of Fit

### NONMEM Objective Function

The NONMEM objective function was used to compare full models with reduced models. Reduced models were identical to full models, except that one or more parameters of full models are fixed to zero. For example, model A (Table 2) is a reduced model for models B, C, D, and E. It has been suggested that full models may be better if the objective function decreases by 4, [18,19]because this exceeds the value of the chi-squared distribution for one degree of freedom at [small alpha, Greek]= 0.05 ([tilde operator][chi squared]^{0}.95 [1]). This assumes that in 5% of cases the more complex model will be chosen even though it is not actually better. It is necessary to apply a correction when many models are tested to prevent more complex models from being chosen by chance. We evaluated 20 models (19 comparisons with the simplest model). Were we to choose [small alpha, Greek]= 0.05 for each comparison, the likelihood of inappropriately selecting a more complex model by chance would be 100%-(95%)[19]= 62%. To prevent our selecting a better model by chance, we selected a value of 11 ([tilde operator][chi squared]^{0}.999 [1]) as the criterion for significant improvement of the objective function. There is only a 2.5% chance that in 19 comparisons any model will be selected as better by chance with an improvement in the objective function by 11 (e.g., [small alpha, Greek]= 0.025 for all 19 comparisons taken together).

It is expected that the objective function with the MEM approach will be less than with the NPD approach because of the inclusion of additional parameters to describe the interindividual variance. Although the NPD model is a reduced form of the MEM model, the interpretation of the decreased objective function with the MEM approach is unclear. Therefore, we used other means to compare the results of the MEM and NPD approaches.

### The Percentage of Correct Predictions

Ideally, we would like to predict every sedation score exactly. However, uncertainty is inevitable considering both the diversity of the patient population and the subjective nature of the sedation scores (Table 1). Targeting a sedation score of 3 and observing a score of 2 or 4 is, in most circumstances, acceptable. For each model, we calculated the actual prediction at each observation using Equation 2and then computed the percentage of correct predictions (observed sedation score = predicted sedation score) and close predictions (observed sedation score = predicted sedation score +/− 1).

(Table 7) summarizes the measures of goodness of fit. As noted before, these measures of goodness of fit were based on the typical value of the population pharmacokinetic model for midazolam, [13]rather than post hoc Bayesian estimates used to estimate the pharmacodynamics. This was done because the prediction from the typical value of the pharmacokinetic model is available to the clinician administering midazolam using a TCI device.

### Prediction Probability

Smith et al. [20]recently described a performance measure, P (^{K}), to assess anesthetic depth indicators. The measure of anesthetic depth was described as a monotonically nondecreasing function of an indicator x. In our study, we wanted to evaluate the performance of “concentration” based on the midazolam pharmacokinetic model reported by Zomorodi et al., [13]the sufentanil pharmacodynamic models reported by Gepts et al., [17]and the possible interactions proposed in Table 2, as a predictor of the sedation score. Using the definition of Smith et al. [20]for P^{K}, it is possible to determine nonparametrically which of the possible measures of concentration provides the best prediction of sedation score.

The relation between concentration, C, and sedation score is described in terms of rank ordering for pairs of data points. A concordance occurs when C and sedation score for a pair of data points are rank ordered in the same direction. A discordance occurs when they are rank ordered in opposite directions. An “x-only” tie occurs when two data points have the same C and different sedation scores.

Let P^{c}, P^{d}, and P^{tx}be the probabilities that two data points are in concordance (have the proper order), in discordance (have the wrong order), or represent an x-only tie, respectively. Then P^{K}is defined as:Equation 4

In this study, concentrations were reported with four significant digits, resulting in no identical concentrations. Therefore, there were no “ties” in x. Because P^{tx}= 0, Equation 4can be simplified as Equation 5P^{K}will range from 0 to 1. An ideal indicator will yield a P^{K}= 1. The best model will have the highest P^{K}. We calculated P (^{K}) for each model of concentration in Table 2(Table 7). The standard error for P^{K}was estimated using the jackknife method suggested by Smith et al. [20](Table 7). These values were computed using Excel for Windows 95 (Microsoft, Redmond, WA).

There are limitations to using P^{K}to assess goodness of fit. P^{K}measures the ability of changes in one measurement to predict the direction of changes in a second measurement. If, for example, increases in concentration always lead to no change or to an increase in the sedation score, then P^{K}for concentration as a predictor of sedation score will be 1. In theory, one could obtain a perfect P^{K}= 1 for a model that has the wrong shape (for example, modeling a sigmoid model with a linear model) or a strong bias. Thus, P^{K}does not measure how well the model actually predicts the observations. It is not a measure of the ability of the model of concentration to actually predict the sedation score and thus does not directly assess goodness of fit. P^{K}is a measure of association, and it answers the question “Does the observed sedation score tend to increase as C (^{p}) increases?” The P^{K}is not sensitive to the accuracy of the prediction. Indeed, the P^{K}does not even consider the predicted score but only reports the extent to which the observed sedation score increases (or at least does not decrease) as C^{p}increases.

## Appendix 3: Model Selection

(Table 4) shows the pharmacodynamic parameters estimated by NPD and MEM approaches. The NPD and MEM approaches both show a benefit in adding predicted sufentanil concentrations to the model for concentration (model B), which decreased the NPD and MEM objective functions by 58 and 201 points, respectively. Both approaches also suggested a synergistic interaction between sufentanil and midazolam, which decreased the objective function from the additive midazolam-sufentanil model by 21 and 43 points for the NPD and MEM approaches, respectively. The NPD and MEM approaches strongly preferred the “virtual drug” model of concentration (model D; exponentially decreasing the model of residual “anesthetic” effect) to the other models of concentration. The virtual drug model decreased the MEM and NPD objective functions by 540 and 290 points, respectively, from the simple model. The addition of sufentanil to the virtual drug model of concentration (model E) produced a negligible improvement in the fit using the NPD approach and an improvement of 10 with the MEM approach. The latter improvement in objective function did not meet criteria for significant improvement as described in appendix 2. On this basis, model D, the model of concentration based on post hoc Bayesian estimates of midazolam concentration and a model for dissipation of the residual effects of anesthesia, was selected as the preferred model of the sedative concentration after CABG surgery.

The inclusion of 1-hydroxymidazolam, using additive and synergistic models both, did not improve the quality of the models. There was no evidence of a study-center effect or an age effect in the covariate analysis.

(Table 6) indicates the standard errors of the model estimates for model D from both the NPD and the MEM approaches, as calculated by NONMEM and by using the jackknife approach. Standard errors calculated using the jackknife approach typically were approximately 40% more than those calculated by NONMEM using the NPD approach, and approximately 60% more than those calculated using the MEM approach.

(Table 7) summarizes the measures of goodness of fit. Using both the NPD and the MEM approaches, model D (concentration defined as predicted midazolam concentration plus a dissipating anesthetic effect) provided the most accurate predictions of sedation score (57% and 47% with the NPD and MEM approaches, respectively). Model D was able to predict the observed level of sedation within 1 sedation score in 88% of cases using the NPD approach and 81% of cases using the MEM approach. As suggested by the analysis of objective functions (Table 4), the inclusion of sufentanil concentration (model E) did not further enhance the predictive ability of model D. For all models tested, the pharmacodynamic model estimated using the NPD approach more accurately predicted the observed sedation scores than did the model estimated using the MEM approach.

Given the limitations of the sedation assessment described in our discussion, the 57% exact predictions and 88% close predictions (within 1 unit) suggested reasonable predictive ability of the final model. With the data clustered from sedation scores of 2 to 5, it is not difficult to be within 1 unit of the correct prediction. A model that predicted a sedation score of 4 at all concentrations would be within 1 sedation score 74% of the time ([374 + 248 + 360]/1,329, from Table 3). However, a model that predicted a sedation score of 4 at all concentrations would be off by more than 1 sedation unit 26% of the time, whereas the proposed model missed the prediction by more than 1 sedation unit only 12% of the time. The cross-validation result suggests that a similar performance can be expected prospectively during similar clinical circumstances.

The nonparametric analysis of P^{K}confirmed the usefulness of concentration, as defined for model D, as a predictor of sedation score (Table 7). The P^{K}for model D was greater than the P^{K}for models A, B, and C; and the addition of sufentanil concentrations to model D (model E) did not further enhance the usefulness of concentration as a predictor of sedation scale. The MEM and NPD approaches both yielded similar descriptions for the dissipating anesthetic effect included in model D. As a result, the P (^{K}) results from the MEM and NPD approaches were identical. However, Figure 1shows an obviously better fit with NPD. For MEM model A, B, and C, it is interesting that as the objective function improved, the P^{K}became worse. This could be explained by the finding that P^{K}is a nonparametric assessment of the general trend of the pooled data, whereas MEM is a parametric method that accounts for the interindividual variability. Overall, the P^{K}helped to assess the validity of model D. But the other tools, such as visual inspection of Figure 1, computation of the number of good predictions, and objective function have proved to be the most useful.

(Figure 1) shows that the probability estimated using the NPD approach provides a nearly unbiased estimate of the measured probability. The probability predicted by the MEM approach was steeper (an expected result) and the prediction was biased at concentrations < C^{50}. The results were also similar for models of sedation score >or= to 2, 4, 5, and 6. Based on the goodness-of-fit results shown in Table 7, and the graphic results illustrated in Figure 1, pharmacodynamic model D estimated with the NPD approach was selected as the preferred model for calculating dosage regimens and predicting sedation scores of patients requiring ventilation after CABG surgery.

The authors thank David Greenblatt, M.D., for performing the midazolam and 1-hydroxymidazolam assays and Maria Bergamo, M.D., Carl Hug, M.D., and Jerry Reves, M.D., for help designing and implementing this study.

[dagger, dagger] STANPUMP source and object code are freely available via the World Wide Web at the URL http://pkpd.icon.palo-alto.med.va.gov

The pharmacokinetic and pharmacodynamic data from this study, including the NONMEM control files, can be found on the World Wide Web at pkpd.icon.palo-alto.med.va.gov in the directory /data.dir/midazolam.icu.dir. The version of STANPUMP in the directory /stanpump.dir incorporates the ICU midazolam pharmacokinetics described by Zomorodi et al. [13]

[double dagger, double dagger] Beal SL, Sheiner LB: NONMEM User's Guide. San Francisco, University of California, San Francisco, 1979