Artifact robustness (i.e., size of deviation of an electroencephalographic parameter value from baseline caused by artifacts) and baseline stability (i.e., consistency of median baseline values) of electroencephalographic parameters profoundly influence electroencephalography-based pharmacodynamic parameter estimation and the usefulness of the processed electroencephalogram as measure of the arousal state of the central nervous system (depth of anesthesia). In this study, the authors compared the artifact robustness and the interindividual and intraindividual baseline stability of several univariate descriptors of the electroencephalogram (Shannon entropy, approximate entropy, spectral edge frequency 95, delta ratio, and canonical univariate parameter).

Electroencephalographic data of 16 healthy volunteers before and after administration of an intravenous bolus of propofol (2 mg/kg body weight) were analyzed. Each volunteer was studied twice. The baseline electroencephalogram was recorded for a median of 18 min before drug administration. For each electroencephalographic descriptor, the authors calculated the following: (1) baseline variability (= (median baseline - median effect) [i.e., signal]/SD baseline [i.e., noise]) without artifact rejection; (2) baseline variability with artifact rejection; and (3) baseline stability within and between individuals (= (median baseline - median effect) averaged over all volunteers/SD of all median baselines).

Without artifact rejection, Shannon entropy and canonical univariate parameter displayed the highest signal-to-noise ratio. After artifact rejection, approximate entropy, Shannon entropy, and the canonical univariate parameter displayed the highest signal-to-noise ratio. Baseline stability within and between individuals was highest for approximate entropy.

With regard to robustness against artifacts, the electroencephalographic entropy parameters and the canonical univariate parameter were superior to spectral edge frequency 95 and delta ratio. Electroencephalographic approximate entropy displayed the best interindividual and intraindividual baseline stability.

UNIVARIATE descriptors of the electroencephalogram have been applied as surrogate endpoints for quantification of anesthetic drug effect (*e.g.* , determining relative potencies) 1and for quantification of depth of sedation or anesthesia. 2For both applications, high artifact robustness and interindividual and intraindividual baseline stability (minimal variability in the absence of drug between and within individuals) are essential.

Two of the four parameters of a sigmoid E^{max}model usually describing the concentration–effect relation of central nervous system (CNS) active drugs 3are influenced by data recorded in the absence of drug. Baseline (E^{0}) is estimated directly from those data. The estimated concentration corresponding to the half maximal effect (EC^{50}) is influenced by the baseline value. Therefore, baseline variations may lead to erroneous estimates of these parameters.

The usefulness of an univariate electroencephalographic parameter for quantification of anesthetic depth is also determined by baseline variation. For a univariate parameter, if the CNS arousal state corresponds more closely to a particular percentage decrease from baseline rather than to a particular absolute value, high interindividual variability of the baseline value will broaden the range of measured values corresponding to a certain CNS arousal state, decreasing the predictive ability of the electroencephalogram. Therefore, parameter values corresponding to a certain level of sedation or anesthesia will vary between individuals.

For those reasons, baseline stability is more than a “nice-to-have” feature of a univariate descriptor of the electroencephalogram. It can profoundly affect both electroencephalography-based pharmacodynamic parameter estimation and the usefulness of the processed electroencephalogram as measure of the arousal state of the CNS (depth of anesthesia).

In this study, we compared the artifact robustness and the interindividual and intraindividual baseline stability of the following univariate descriptors of the electroencephalogram: Shannon entropy, 4approximate entropy, 5spectral edge frequency 95 (SEF95), 6delta ratio 7and canonical univariate parameter (CUP). 8–10Bispectral index was not calculated because the electroencephalographic data in the study had already been filtered and digitized, rendering bispectral analysis impossible.

## Methods

### Clinical Protocol

We reanalyzed electroencephalographic data recorded before, during, and after administration of propofol to volunteers. 11After approval by the Stanford University Institutional Review Board (Stanford, CA) and written informed consent were obtained, 16 volunteers aged between 25 and 65 yr, receiving a 2-mg/kg propofol bolus dose, were studied. Only the electroencephalograms recorded during the baseline and during the propofol bolus were analyzed.

The volunteers were asked to lie quietly with closed eyes for baseline recording over a median time of 18 min (range, 9.1–40.9 min). After baseline recording, all subjects received a 2-mg/kg bolus dose of propofol over a median time of 18 s (range, 13-24 s). Each volunteer was studied twice. Both electroencephalographic sets were included in the analysis, allowing calculation of the interindividual and intraindividual variability.

### Electroencephalographic Analysis

The electroencephalogram was recorded continuously with a frontal montage (Fp3–Cz) (international 10-20 system). After gently rubbing the scalp with an abrasive gel (Omniprep; D.O. Weaver Co., Aurora, CO), the electrodes were fixed to the skin with a sticky electrode cream (Grass EC2; AstroMed Inc., West Warwick, RI). The electrodes were manipulated until the impedance was less than 1,500 Ω. The electroencephalogram was digitized at 128 Hz, 12-bit resolution, and stored on a computer hard disk for subsequent processing.

The following electroencephalographic parameters were calculated from 2 10data points (= 8-s epochs):

1. SEF95: 95th percentile of the power distribution.

2. Delta ratio: The percent of total power in the delta band (0.5–4 Hz).

3. CUP: The electroencephalographic power spectrum from 0 to 30 Hz was divided into 10 frequency bins of 3 Hz each. The power in each bin was converted into a natural log (log), and each of the 10 bins was multiplied by a weighting factor. The 10 weighting factors for propofol were previously estimated concurrently with the other pharmacodynamic parameters for the used data set. 11The sum of the 10 weighted bins is the CUP. 8

4. Shannon entropy: The Shannon entropy was calculated according to the following algorithm 12:

where i extends over all observed amplitude values of the data time series, and p

^{i}is the probability that the amplitude value v^{i}occurs anywhere in the data time series. Thus, p^{i}is the ratio of the number of data points with the amplitude value v^{i}to the total number of data points in the data time series.5. Approximate entropy: The approximate entropy quantifies the predictability of subsequent amplitude values of the electroencephalogram, based on the knowledge of the previous amplitude values. The absolute value of the approximate entropy is influenced by three parameters: the length of the epoch (N), the number of previous values used for the prediction of the subsequent value (m), and a filtering level (r). In this study, N was fixed at 1,024; thus, one value of approximate entropy could be calculated for each 8-s electroencephalographic epoch. The noise filter r was defined as relative fraction of the SD of the 1,024 amplitude values. We used the parameter set m = 2 and r = 0.2 × SD, which was found to exert the best performance for electroencephalographic approximate entropy in a preliminary study. 5

### Statistical Analysis

The following distinct periods were defined for comparison: baseline (from start measurement to start bolus) and maximum drug effect, *i.e.* , the time when the maximum electroencephalographic effect was observed (from 1 to 3 min after bolus).

Three ratios were defined:

1. Baseline variability without artifact rejection was calculated as difference between the median baseline value and the median maximum effect (signal) divided by the SD of the baseline values (noise).

2. Baseline variability with artifact rejection was calculated as difference between the median baseline value and the median maximum effect (signal) divided by the SD of the baseline values after discarding the upper and lower 10% of the baseline values for each electroencephalographic parameter (noise). We discarded all electroencephalographic parameter values calculated from 8-s epochs above the 90th percentile and below the 10th percentile of the electroencephalographic parameter values calculated from 8-s epochs during baseline. For example, if baseline consists of 100 calculated 8-s-epoch SEF95 values, we discarded the highest 10 SEF95 values and the lowest 10 SEF95 values, and we discarded the highest 10 approximate entropy values and the lowest 10 approximate entropy values, and so forth. This should eliminate most of the artifacts occurring during baseline. Therefore, this ratio is meant to be the signal-to-noise ratio for the electroencephalographic values after excluding most of the artifacts.

Both signal-to-noise ratios were calculated for both study sessions of each volunteer and each electroencephalographic parameter and in a separate second step only for the first study sessions of each volunteer to correct for intraindividual variations.

3. Baseline stability within and between individuals was calculated as the difference between median individual baseline value and median individual maximum effect averaged over all volunteers divided by the SD of all individual median baseline values. This ratio measures the consistency of the absolute baseline values between different study days and different individuals.

This signal-to-noise ratio was calculated, including the data of both study sessions of each volunteer and each electroencephalographic parameter, and in a separate second step, only including the data of the first study sessions of each volunteer to correct for intraindividual variations.

We compared the values of the two baseline variability ratios for SEF95, delta ratio, CUP, Shannon entropy, and approximate entropy using the Wilcoxon rank test. Statistical significance was assumed at probability levels of *P* ≤ 0.05.

## Results

### Baseline Variability without Artifact Rejection

Without artifact rejection, electroencephalographic Shannon entropy and the CUP displayed the best signal-to-noise ratios (average baseline − average effect)/SD baseline). Shannon entropy displayed significantly better signal-to-noise ratios (3.08 ± 0.39; mean of the ratios in the study population ± SEM) than approximate entropy (2.48 ± 0.44), delta ratio (2.09 ± 0.23) (*P* < 0.05), and SEF95 (1.69 ± 0.32) (*P* < 0.01), but not than CUP (2.61 ± 0.21). In addition, the signal-to-noise ratios for CUP, approximate entropy, and delta ratio were significantly better than for SEF95 (*P* < 0.05).

Correcting for intraindividual variations by only considering the first study session of each volunteer did not yield relevant differences of the calculated signal-to-noise ratios (Shannon entropy, 3.31 ± 0.49; approximate entropy, 2.56 ± 0.55; delta ratio, 2.39 ± 0.35; SEF95, 1.67 ± 0.37; CUP, 2.62 ± 0.29).

### Baseline Variability with Artifact Rejection

Electroencephalographic approximate entropy benefitted most from rejection of the baseline values below the 10th percentile and above the 90th percentile. The signal-to-noise ratios with artifact rejection were 2.79 times the signal-to-noise ratios without artifact rejection for approximate entropy, compared with 1.92 times (Shannon entropy), 1.91 times (CUP), 1.87 times (SEF 95), and 1.55 times (delta ratio) for the other electroencephalographic parameters. The signal-to-noise ratios with artifact rejection for approximate entropy (6.92 ± 1.27) (mean of the ratios in the study population ± SEM), Shannon entropy (5.91 ± 0.61), and CUP (4.97 ± 0.37) were significantly better than for delta ratio (3.24 ± 0.39) and SEF95 (3.17 ± 0.62) (*P* < 0.05).

Correcting for intraindividual variations by only considering the first study session of each volunteer did not yield relevant differences of the calculated signal-to-noise ratios (approximate entropy, 6.85 ± 1.40; Shannon entropy, 6.16 ± 0.74; CUP, 4.82 ± 0.50; delta ratio, 3.61 ± 0.60; SEF95, 3.18 ± 0.77).

### Baseline Stability within and between Individuals

The interindividual and intraindividual median baseline values for approximate entropy vary less than those for the other electroencephalographic parameters (fig. 1). Although the median baseline values for the 16 volunteers on the two study days (*i.e.* , a total of 32 median baseline values) are in a narrow range for approximate entropy, differences between two median baseline values for the other electroencephalographic parameters may be even bigger than the difference between median baseline value and mean maximum electroencephalographic effect after the 2-mg/kg propofol bolus dose.

Approximate entropy displayed an average maximal electroencephalographic effect exceeding interindividual and intraindividual baseline variability by a factor of 5.19. The average maximal electroencephalographic effect for the other investigated parameters was 2.89 (CUP), 2.60 (delta ratio), 2.37 (Shannon entropy), and 2.22 (SEF95) times the interindividual and intraindividual baseline variability throughout all volunteers and study days.

Correcting for intraindividual variations by only considering the first study session of each volunteer did not yield relevant differences of the calculated signal-to-noise ratios (approximate entropy, 7.47; Shannon entropy, 2.98; CUP, 3.66; delta ratio, 3.31; SEF95, 2.71).

## Discussion

In this study, we compared the baseline variability without artifact rejection, the baseline variability after artifact rejection, and the baseline stability within and between individuals for five univariate electroencephalographic parameters. Even if we investigated only the pharmacodynamic effect on the electroencephalogram and not hypnosis or anesthetic depth, the results have impact for both, using electroencephalographic parameters for pharmacologic research and assessing hypnosis and anesthetic depth.

### Baseline Stability without Artifact Rejection

Shannon entropy and CUP had the best signal-to-noise ratio without artifact rejection. This can be explained by the influence of total power on the value of the respective parameters, which directly translates into resistance against typical electroencephalographic artifacts in the awake state, as will be shown herein. The calculation of delta ratio, SEF95, and approximate entropy take total power into account, which is not the case for Shannon entropy and CUP. The delta ratio is a percentage of the total power. SEF95 is a percentile of the total power. The filter level r, a substantial part of the approximate entropy algorithm, is calculated as a percentage of the SD of the amplitude values. 13,14The main source of artifacts during the awake state are eye and lid movements. The amplitude of these artifacts are much larger than the small average amplitude observed during awake state. Therefore, these artifacts contribute significantly to total power and consequently influence the absolute values of delta ratio, SEF95, and approximate entropy.

In contrast, Shannon entropy and CUP are not normalized to total power and are therefore less influenced by artifacts substantially altering total power. Furthermore, the Shannon entropy algorithm weights infrequently occurring amplitude values very slightly. Therefore, outliers do not greatly contribute to the Shannon entropy value even when substantially altering total power. The robustness of the CUP against artifacts is due to splitting the power spectrum into frequency bins before determining the parameter values. 8A slow-frequency artifact, such as eye movement, will change the value in the bin corresponding to this frequency, leaving the other frequency bins untouched. Based on this assessment alone, Shannon entropy and CUP seem to be the preferable univariate descriptors of the electroencephalogram.

### Baseline Variability after Artifact Rejection

Approximate entropy, Shannon entropy, and CUP had the best signal-to-noise ratio after artifact rejection. Our approach to the problem might be questioned because of apparent arbitrariness and not adhering to standard procedures. There are three different approaches to artifact detection and rejection in electroencephalographic signals:

1. The raw electroencephalographic data is visually inspected by a blinded, experienced neurophysiologist before the analysis process.

2. The algorithms include simple threshold values of atypical parameters (

*e.g.*, amplitude artifacts, slope detection, testing for normal distribution).3. The algorithms rely on a comparison with the electroencephalographic parameter values of surrounding epochs.

The first approach might not solve the problem in a reliable and reproducible manner. Van de Velde *et al.* 15found only a 76% mean consensus between human observers marking electroencephalographic artifacts, with a consensus down to less than 60% in some patients. The second approach is based only on the properties of a single electroencephalographic epoch and is only successful for few, mostly very clear artifacts. No artifact rejection algorithm detecting all artifacts in an electroencephalographic signal has been published. 16In addition, some artifacts, such as slow body movements, can mimic low-frequency–dominated electroencephalography as it occurs in the presence of anesthetic drug effect. Consecutively, these artifacts cannot be detected by common artifact detection algorithms, such as testing for normal distribution. 16Therefore, the third approach is increasingly used during electroencephalographic monitoring. 17

Although using the second approach on-line is mathematically more sophisticated (knowing only the parameter values of the previous epochs and not of the subsequent epochs necessitates a smoothing and a time-dependent adaption), using the second approach off-line (knowing the parameter values of the previous and of the subsequent epochs) is quite simple and is exactly what we all intuitively do while screening data series visually for outliers.

We admit that choosing to discard the upper and lower 10% of the parameter value may seem somewhat arbitrary, but choosing a fixed percentage guaranteed that for each different electroencephalographic parameter, the same number of epochs in one patient were discarded. The overall 20% discarded epochs during baseline are in good accordance with the published average range of 7–30% of electroencephalographic epochs contaminated with artifacts. 15,17

Surprisingly, SEF95 and delta ratio continued to have a significantly worse signal-to-noise ratio than Shannon entropy and CUP. In contrast, approximate entropy benefitted most and now had the best signal-to-noise ratio. As shown in figure 2, the baseline of the approximate entropy parameter is disturbed by infrequent but pronounced outliers, which can easily be eliminated by an artifact rejection algorithm. Based on this assessment alone, approximate entropy, Shannon entropy, and CUP seem to be the preferable univariate descriptors of the electroencephalogram.

### Baseline Stability within and between Individuals

Minimal interindividual and intraindividual variation of the mean baseline value of an electroencephalographic parameter is essential for a clinically applicable parameter of CNS suppression. Otherwise, any prediction of the state of arousal based on a “standard” value (*e.g.* , a Bispectral Index between 40 and 60, an SEF95 between 8 and 12 Hz, as suggested for clinically adequate anesthesia) independent of the observed individual is of questionable value.

The values of Shannon entropy and CUP are not normalized to total power, which changes with skin impedance while frequency distribution remains unchanged. Therefore, interindividual differences or intraindividual changes of skin impedance might influence Shannon entropy and CUP, while approximate entropy, SEF95, and delta ratio remain unchanged. From this, it immediately follows that Shannon entropy and CUP must display low interindividual and intraindividual baseline consistency, as displayed in figure 1. Although SEF95 and delta ratio are normalized with regard to power, their interindividual and intraindividual baseline consistency was similarly low. From the parameters investigated, approximate entropy displayed the highest baseline stability and therefore seems to be the clinically most useful indicator of anesthetic depth. However, the filter level r of the approximate entropy algorithm has to be set as a percentage of the SD of the amplitude values, as recommended previously. 13,14Fixing the filter level r, as Sleigh and Donovan 18did, inherently leads to higher interindividual baseline variation and therefore to a weaker prediction power awake *versus* asleep.

For technical reasons, we did not assess the Bispectral Index in this study. Despite the widespread adoption of the Bispectral Index for clinical determination of anesthetic adequacy, it has been shown that other electroencephalographic parameters can be more sensitive as measures of drug effect for pharmacodynamic modeling. 7We cannot guess *a priori* how stable the Bispectral Index is by our measures.

Two possible limitations must be considered:

1. Our calculations have been normalized to the maximal electroencephalographic effect after a bolus dose of 2 mg/kg propofol. If a given parameter is more sensitive to propofol effect, normalizing to this effect will make the parameter seem better in relation to others. This may not be true for other drugs or other dosage levels, but our approach is near to clinical practice: propofol is one of the most commonly used drugs for induction and maintenance of anesthesia, and 2 mg/kg propofol is a clinical standard bolus dose for induction of anesthesia.

2. Intraindividual baseline variability over time may be a biologic phenomenon, depending on vigilance, habituation, distraction, and so forth. We experienced that phenomenon in other settings, especially with constant low doses of hypnotic drugs. However, the variability (

*e.g.*, caused by changing vigilance) was a slower fluctuant change, consistently seen at all different electroencephalographic parameters, and was not like the seemingly arbitrary jumps between 8-s epochs with big differences, induced by artifacts, between the different electroencephalographic parameters as observed in the current investigation.

The electroencephalographic parameter most suitable for a certain application depends on the requirements at hand. If intraindividual and interindividual baseline stability is not required as in the experimental setting and artifact rejection not available, Shannon entropy and CUP are most appropriate. An example for this application is the determination of the potency of CNS active drugs, in which the baseline measurement in each patient serves as a control and change from baseline is more important than the absolute value. If artifact rejection can be introduced before further processing of the electroencephalographic signal, approximate entropy becomes equally suitable. Approximate entropy showed the most stable interindividual and intraindividual baseline value, making this parameter ideal for clinical applications where therapeutic decisions (dosing of anesthetic drugs) are based rather on absolute values than on change from baseline.