A recent U.S. Food and Drug Administration warning advised that prolonged or repeated exposure to general anesthetics may affect neurodevelopment in children. This warning is based on a wealth of preclinical animal studies and relatively few human studies. The human studies include a variety of different populations with several different outcome measures. Interpreting the results requires consideration of the outcome used, the power of the study, the length of exposure and the efforts to reduce the confounding effects of comorbidity and surgery. Most, but not all, of the large population-based studies find evidence for associations between surgery in early childhood and slightly worse subsequent academic achievement or increased risk for later diagnosis of a behavioral disability. In several studies, the amount of added risk is very small; however, there is some evidence for a greater association with multiple exposures. These results may be consistent with the preclinical data, but the possibility of confounding means the positive associations can only be regarded as weak evidence for causation. Finally, there is strong evidence that brief exposure is not associated with any long term risk in humans.
THE US Food and Drug Administration (FDA) recently issued a change in labeling regarding the safe use of anesthetic and sedative agents (http://www.fda.gov/Drugs/DrugSafety/ucm532356.htm; accessed November 10, 2017).
The opening sentence states that “repeated or lengthy use of general anesthetic and sedation drugs during surgeries or procedures in children younger than 3 years or in pregnant women during their third trimester may affect the development of children’s brains.” This warning then suggests that brief exposure is probably safe and continues to a summary of available preclinical and human evidence. It also encourages healthcare professionals to consider the risks of delaying surgery. The issue of how, or even if, anesthetic agents affect the developing brain has been the subject of a great deal of research over the last couple of decades. There is now overwhelming preclinical evidence that most anesthetic agents can, in some situations, modulate various aspects of brain development, but there is great uncertainty over how these findings in animal models translate to clinically relevant human scenarios.1–3 Evaluating the strengths and limitations of human evidence is essential when determining potential change in practice. Several important human studies have been published recently, and with the recent FDA warning, it is an opportune time for this review which aims to critically appraise the human evidence. This review will focus on studies that provide the strongest evidence, and will provide a summary of the current human evidence.
What Are We Looking for in Human Studies?
The question of how anesthesia exposure in childhood influences neurodevelopmental outcome arose after findings made in the laboratory. Before these findings, a link between major surgery in the neonate and increased risk of neurodevelopmental problems was suspected, however, anesthesia toxicity was not considered as the cause for this association.4–6 No link between surgery and neurodevelopmental outcome had been noticed outside the neonatal period; there was no obvious clinical problem. However, as the preclinical data has grown, there has become an increasingly pressing need to determine if anesthesia exposure does indeed cause clinically relevant changes in neurodevelopment of children. Answering this question is not straightforward. There are several facets or domains of neurodevelopmental function that could be affected. The effect may be dependent on time or duration of anesthesia administration, or the effect may be only apparent in subpopulations. In the absence of an obvious clinical problem when designing human studies, the preclinical data should drive not only the choice of population to be studied in terms of age and duration at exposure, but also the choice of domain of neurodevelopmental function to be assessed. Unfortunately, translating preclinical data in humans is an inherently imprecise science. Also, even if the preclinical data could be easily translated, the preclinical data show a range of effects over a range of ages and duration of exposures. A recent review of 440 preclinical studies found a greater proportion of studies reporting abnormalities with progressively longer exposures, but no clear exposure duration threshold below which no abnormalities were seen.2 Many of the early preclinical studies suggested that only newborn animals were affected; however, the same review of the entire range of literature found that while the vast majority of preclinical studies were in very young animals (equivalent to the fetus or neonate), there was no clear evidence that abnormalities were absent in older animals.2 The neurodevelopmental functional domain that is likely to be affected is also uncertain. The diffuse nature of the injury seen in preclinical studies would suggest that domains such as higher executive function are most likely affected in humans. Animal studies have demonstrated functional defects in memory and learning, while recent non-human primate studies have shown functional defects in behavior. Thus, in terms of population and neurodevelopmental outcome, it is difficult to know where to look in human studies.
Given the uncertainty of translating preclinical data, it is impossible to design a single definitive human study that could confirm, or rule out, an effect of anesthesia on neurodevelopment, and it is appropriate for a range of studies to examine a range of outcomes, anesthetic durations and ages at exposure. In this review, we will group the human studies in terms of outcomes assessed. When considering the evidence provided by each study, it is important to consider the size or power of the study, and the likelihood of confounding factors.
The clinical studies can be broadly divided into three groups in terms of outcomes: group or population administered academic performance or school readiness tests; a diagnosis of a neurodevelopmental or behavioral disorder or learning disability; and abnormalities in neurocognitive function or behavior based on validated neuropsychologic or behavior assessment tools.7 One study has also looked at structural changes on magnetic resonance imaging.8 It is important to consider all domains, as brain insult early in life may result in a deficit in one of these outcome groups without a deficit in the others. For example, an event may increase the risk of behavioral disorder without having an impact on cognition, and vice versa.
Academic Performance and School Readiness as Outcome Measures
Measures of academic performance or school grades are administered in a standard way in large populations of children, and school readiness tests are administered to preschool children to determine if they are ready for school. Neurodevelopmental outcome data for very large numbers of children can be relatively easily extracted from existing databases. School grades are important outcomes from the child’s perspective but they are imperfect ways to assess possible brain injury and may be insensitive to defects in some domains of neurodevelopment. Many other factors can affect academic performance. School grades are also only available for those children that have sufficient capacity to be in the school system. Table 1 summarizes the studies which have used academic performance or school readiness as an outcome measure. School grades are important, and easy to obtain in large numbers, but imperfect measures of neurodevelopment.
Using the Young Netherlands Twin Registry, Bartels et al., examined two different outcomes: education achievement (using standardized achievement scores) and cognitive problems (identified using the Conners’ Teacher Short Form).9 Twin studies are particularly useful as twins are very similar in terms of genetics and environment. Parents were questioned with regard to whether their children had received anesthesia both before the age of 3 yr and by the age of 12 yr. The twins in the registry were then divided into three groups: twins who were concordant for exposure to anesthesia at age 3 or by age 12, twins who were concordant for no anesthesia exposure before age 12, and twins who were discordant for anesthesia exposure before age 3 or by age 12. The results of this analysis found there were no differences detected in outcomes of twins who were discordant for exposure; however, the small numbers of study participants limit the power of this study. Interestingly, the concordant, unexposed twins had more favorable outcomes compared to both of the other two groups. The authors suggested that this may be interpreted as the need for anesthesia exposure being a genetically mediated link to vulnerability and that exposure was not the cause for any adverse neurodevelopmental outcome, per se.
Hansen et al. conducted a birth cohort study using the Danish National Patient Register (1986 to 1990) in infants (under 1 yr) exposed to anesthesia during inguinal hernia surgery,10 or in those under 3 months of age who were exposed to anesthesia during pyloromyotomy11 and compared them to a random age-matched 5% sample of the entire population using ninth grade test scores and teachers’ scores as the primary outcome. Their results found no evidence for differences between anesthetized and control children in primary outcomes, but children exposed to anesthesia were more likely to be categorized by “nonattainment of score.” Nonattainment is defined as the child being unable to sit for the test, for whatever reason. This includes children that are unable to sit for tests due to significant neurodevelopmental or behavioral problems. In another population-based Danish study, Clausen et al. compared academic achievement in 509 adolescents with cleft lip and/or palate that had received anesthesia compared to a 5% sample of the population (14,677 adolescents).12 The analysis was adjusted for sex, birth weight, parental age, and parental level of education. Compared to controls, there was evidence that children with cleft palate alone had lower test scores, while there was no evidence of differences compared to controls for either cleft lip alone or combined cleft lip and palate. The cleft palate alone group was also more likely to have nonattainment of scores.
A similar but smaller study from the University of Iowa13 examined infants (under 1 yr) who had one of three surgeries: inguinal hernia, with or without orchiopexy; pyloromyotomy; and circumcision. The outcome measure used was the composite score on the Iowa Tests of Basic Skills and Educational Development (IOWA test scores). The authors examined both the mean scores and the proportion of the group that had scores that represented the lowest fifth percentile of the population norm. From the initial total of 519 subjects whose relevant medical records were retrievable, 287 had available IOWA test scores (group 1). Of these, 133 agreed to have their medical records examined (group 2). The authors then applied a list of 18 prespecified central nervous system–related problems or risk factors, and 58 of the 133 were found to have no risk factors (group 3). In both groups 1 and 2, the mean scores were significantly lower than the population mean, but not in group 3. However, in all three groups, there was a disproportionally larger percentage of the group that scored in the lowest fifth percentile for the IOWA tests (12% in group 1, 11% in group 2, and 14% in group 3). The authors concluded that anesthesia and surgery during infancy was associated with over-representation of very low IOWA test scores. The Iowa study is consistent with the Danish studies where mean scores were similar between those with and without exposure to surgery/anesthesia; however, there was a slightly increased risk of being more likely to have very poor performance scores or nonattainment of the standardized tests.
Williams et al. examined academic performances of individuals who received spinal anesthesia as infants in Vermont for the same three surgical procedures that the Iowa study evaluated, and found no evidence of any differences between infants who had surgery/spinal anesthesia and the general population.14 Bong et al.15 matched 100 healthy children in Singapore who had minor surgery (inguinal hernias, circumcisions, cystoscopies, and pyloromyotomies) under general anesthesia before age one with 106 age-matched children without anesthesia or sedation exposure. Two different outcome measures were measured at age 12 yr: academic performance using the standardized aggregate Primary School Leaving Examination scores; and diagnosis of learning disability. They found no evidence for a difference in examination scores but children having surgery had a greater risk for having a formal diagnosis of a learning disability.
The Bartels, Block, Williams, and Bong studies all had relatively small samples. Small sample sizes will inevitably provide imprecise results with wide 95% CI that may overlap with clinically relevant added risk. Thus, when small studies find no evidence for an association or added risk it should be noted that they may not be sufficiently large enough to rule out a clinically relevant risk.
A recent large Swedish study compared 33,514 children that had received one anesthetic before age four with 159,619 matched controls.16 They also included a subgroup of 3,640 children who have been exposed to multiple anesthetics. The primary outcome was school grades at age 16 yr. In a subgroup, they also compared the IQ in boys that were tested as part of their national military service. The analysis was adjusted for sex, month of birth, gestational age, Apgar score, parental education, household income, cohabiting parents, and number of siblings. They found strong evidence of a small difference. One anesthetic was associated with a 0.41% (95% CI, 0.12 to 0.70%) lower score in school grades. In those with IQ scores, one anesthetic was associated with a 0.97% (95% CI, 0.15 to 1.78%) lower IQ score. The impact was greatest with ear, nose, and throat surgery, and interestingly, the impact was greater in children exposed at an older age. There was also some evidence that the impact was greater with multiple anesthetic exposures, as compared to single anesthetic exposure. Exposure to two anesthetics was associated with a 1.41% (95% CI, 0.50 to 2.31%) lower score in school grades and having three or more was associated with a 1.82% (95% CI, 0.15 to 3.49%) lower score. While this study did find evidence of an association, it is important to note that the added risk was very small – a difference of 1% or less in school grades. This is compared to a difference of 10% in school grades that is associated with sex or maternal education.
Two very similar Canadian studies examined the association between surgery in early childhood and the Early Development Index (EDI). The EDI is a test of readiness for school and is administered at around 5 yr of age. It has five domains (physical health and well being, social knowledge and competence, emotional health and maturity, language and cognitive development, and communication skills and general knowledge).
The first study, from Ontario, matched 28,366 children that had surgery before the EDI, with 55,910 controls.17 They excluded children with physical disability, health-related causes of impaired development, and any diagnosis of a behavioral or learning problem. Children were matched on gestational age, maternal age, rurality, sex, and year and quartile of birth. The primary outcome was any EDI domain score below the tenth percentile. The analysis was adjusted for aboriginal status, age and household income. They found weak evidence for a small difference in the percentage of children with one or more domain scores below the tenth percentile. There were 25.6% with such low scores in the surgical group compared with 25.0% in the controls. The difference was largest in the physical health and well-being, and social knowledge and competence domains. In sub-analyses, there was evidence for a difference in scores in children aged 2 to 4 yr at time of surgery, but insufficient power to conclude whether or not there was any difference in scores in the 0 to 2 yr old group. They found no evidence that number of surgeries had an impact. The other Canadian study from Manitoba compared 4,470 children that had surgery before the age of four with 13,586 matched controls, excluding children with any diagnosed developmental disability.18 They matched children according to gestational age, maternal age, rurality, income quartile, sex, and year of birth; their analysis adjusted for welfare status, being small for age at birth, maternal age, child’s age, and John Hopkins Resource Utilization Band. They found strong evidence of a very small difference, with surgical children doing slightly worse. The effect was greatest in communication skills and general knowledge, and language and cognitive development domains. Of note, the developmental areas affected were different compared to the other Canadian study. The Manitoba study also found no evidence of a difference between single and multiple exposure, but did find strong evidence of an interaction between age of exposure and outcome, with the risk greater in older children. Studies using databases with large numbers of patients, such as the Canadian studies and the Swedish study, can detect differences that are very small in magnitude, but are statistically significant. The clinical significance of the very small added risk reported in these studies remains uncertain.
In summary, most, but not all, large population-based studies looking at school performance have found evidence of small differences in performance in children that had surgery in early childhood. The increased risk is considerably smaller than other factors that have an impact on performance. The studies do not indicate that exposure at 0 to 2 yr is worse than 2 to 4 yr (indeed, some show the opposite). They are mostly insufficiently powered to determine if multiple exposures pose any greater risk than single exposures. Interestingly, some studies have found that surgery increased the risk of nonattainment without a difference in mean scores. This may be consistent with exposure being associated with a small increase in risk for more severe learning disabilities. This is addressed further in the studies in the next section.
As mentioned above, school assessment may be insensitive. Using the Raine birth cohort, Ing et al. examined academic performance, clinical diagnoses, and direct neuropsychologic testing results.19 They demonstrated that academic performance measures using standardized test scores were the least sensitive, and those from direct assessment using validated neuropsychologic instruments were most sensitive in detecting a difference between exposure to surgery and no exposure.
Clinical Outcomes or Diagnoses as Outcome Measures
Clinical outcomes and diagnoses of particular behavioral or learning disabilities are also important outcomes for the child. Like school grades, they too can be accessed using data linkage in large population-based studies. Table 2 summarizes the studies that used these outcome measures. Attention Deficit Hyperactivity Disorder (ADHD) and autism both have genetic risk factors; however, a variety of early life insults have also been found to be associated with an increased risk of these disorders.20 It is therefore biologically plausible to explore an association with surgery and anesthesia. It is important to note the limitation of these studies. Definitions of behavioral disorders and learning disability change over time and are inconsistently applied across populations. This makes comparisons between studies difficult and limits the generalization of results.
A learning disability in reading, verbal language, or mathematics, or a clinical outcome/diagnosis were outcome measures used in several studies that examined the effects of early childhood exposure using the Olmsted County birth cohort from 1976 to 1982.21 The first study had a total of 539 children who underwent a total of 875 procedures that required general anesthesia before the age of four.21 Anesthetics used were predominantly halothane and nitrous oxide (88% received halothane, 91% received nitrous oxide), and ketamine was used in most of the remaining cases (9% received ketamine). The authors reported an increased risk for learning disability associated with multiple, but not single, anesthetic exposure. To further determine whether the frequency of exposure may be a consequence of the health status of the child, the authors used the same cohort in a second study but conducted a matched cohort design in the subgroup that had general anesthesia before age two.22 They used two methods to adjust health status: the American Society of Anesthesiologists Physical Status assignment and the Hopkins Adjusted Clinical Groups case-mix system. Their results were the same as in the original study, showing that multiple, but not single, anesthetic exposure was associated with increased risk of learning disability. In addition, there was also demonstrable increased risk for the need of language Individualized Education Programs. The same authors also used the Olmsted County birth cohort from 1976 to 1982 in another study, but the clinical condition of ADHD was the outcome measure.23 They found multiple, but not single, anesthetic exposures before age two increased the risk for ADHD. The same group recently performed another study looking at children from Olmsted County exposed to surgery between 1996 and 2000 (when they would have had more modern anesthetic care). In this study they also found that multiple, but not single, exposures increased the risk of a diagnosis of a learning disability and ADHD as compared to children that had no exposure to surgery and anesthesia. Multiple and single exposures were also associated with decreases in some aspects of academic achievement.24 The Olmsted County cohort studies have consistently found that multiple, but not single, exposures to anesthesia in early childhood increased the risk of some learning disabilities and subsequent diagnoses of ADHD.
Using the International Classification of Diseases, Ninth Revision (ICD-9) coded diagnoses for developmental delay, mental retardation, autism spectrum disorders, speech/language problems, and behavior problems as the outcome variable, DiMaggio et al. created a birth cohort from all children enrolled in the New York State Medicaid program from 1999 to 2001. The ICD-9 code for inguinal hernia surgery before age 36 months was used as the exposure variable.25 For comparison, the authors used a group derived from random sampling, matched for age that did not have the procedure code for inguinal hernia surgery. In their analyses, they adjusted for low birth weight, perinatal hypoxia, perinatal infections, and central nervous system anomalies. Their conclusion was that inguinal hernia surgery and anesthesia were associated with increased risk of developmental and behavioral disorders. A second study by DiMaggio et al. used a birth cohort of twins from the New York State Medicaid dataset (1999 to 2005).26 Any child was considered to have been exposed to anesthesia if there was an ICD-9 procedure code for any type of surgery before 36 months of age, and was included in the analysis if there was no history of developmental disorder at the time of surgery. ICD-9 codes for developmental delay, mental retardation, autism spectrum disorders, speech/language problems, and behavior problems were used as the outcome variable. The analysis was adjusted for low birth weight, perinatal hypoxia, perinatal infections, and neurologic anomalies. The study found that anesthesia exposure and surgery before age three were associated with an increased risk of subsequent diagnosis of developmental or behavior disorders.
A Taiwanese study by Ko et al. examined a birth cohort of 114,435 children, amongst whom 5,197 received general anesthesia before 2 yr of age.27 These were matched with 20,788 unexposed children. They found no evidence of increased risk of autism in the exposed group, after single or multiple exposures. In another manuscript, Ko et al. also examined the association with ADHD.28 In a cohort of 16,465 children, 3,293 children exposed to anesthesia before the age of three were matched to unexposed children. Unlike the Olmsted County studies, the Taiwanese study found no evidence for increased incidence of ADHD in the exposed group, after multiple or single exposures.
In summary, several, but not all studies, have found evidence for an association between surgery and anesthesia in early life, and increased risk of behavioral disorder or learning disability diagnoses. The association is greater with multiple exposures.
Neuropsychologic Testing Results as Outcome Measures
Table 3 summarizes the studies that have used neuropsychologic tests as outcomes. There are many different neuropsychologic tests available, and they are divided into domains: intelligence, language, learning and memory, visual-spatial skills, attention and executive functions, and motor and psychomotor abilities.7 Broadly speaking, there are apical tests, such as the intelligence quotient (IQ), which measure and collate function over a range of domains, and other tests that focus on particular subdomains or functions. Apical tests have better psychometrics and can more accurately predict a child’s overall future functioning. Tests focusing on particular subdomains may indicate a deficit that is not detectable with apical tests; however, the relevance of that deficit on future function is less certain. Usually, a battery of tests is applied. These will be a mix of apical tests that have clearer implications for the child’s future, and other tests that focus on particular sub-domains of interest. The choice of domains that are examined in more detail should be driven by the findings from preclinical studies. Preclinical studies have suggested that cognition, learning, memory, and executive function should be of particular interest.
Testing at an older age is generally preferable as tests in older children have greater capacity to predict future function. Some domains, such as executive function and some aspects of memory, are not fully developed until the child is older. Some children will also “grow into” a defect; wherein the defect is only apparent as the period between exposure and test lengthens. In other cases, a child may recover over time and the defect becomes undetectable. Using neuropsychologic tests as outcomes is logistically more difficult and considerably more expensive than using data linkage to identify school grades or diagnoses.
Several studies have made use of data already collected in various birth cohorts. The Western Australian Pregnancy Cohort (Raine) study enrolled 2,900 pregnant women in their early pregnancy.29 The Raine birth cohort includes 2,868 children born to these mothers. The health and other related data in these children have been reviewed in detail at ages 1, 2, 3, 5, 8, 10, 14, 17, 18, 20, and 23 yr. Direct neuropsychologic testing and parental interviews occurred at these different ages. The largest number of tests were performed at age 10, and included testing for cognitive function (using Symbol Digit Modality Test and Raven’s Colored Progressive Matrices), language (using Clinical Evaluation of Language Fundamentals Peabody Picture Vocabulary), and motor function (using McCarron Assessment of Neuromuscular Development), as well as reports of behavior (using Child Behavior Checklist). A total of 2,608 children, age 10, were included in the analysis. Ing et al. divided the children into an exposed cohort: those who had received surgery/anesthesia before age 3 yr (n = 321) and those who had not (n = 2,287);29 in the analysis adjusted for family income, maternal education, and birth weight. The exposed cohort had significantly lower scores in language, receptive, expressive, and total, as well as in abstract reasoning, but not in any other neuropsychologic domains or in behavior. This was found with either single or multiple episodes of anesthesia exposure. In another analysis of this cohort, Ing et al. examined outcomes in children exposed between the ages of 3 and 5 yr, and 5 and 8 yr, as compared to unexposed children.30 In both of these relatively older age groups, there was no evidence of worse cognitive and language outcomes in the exposed groups; however, both exposed age groups did have decreased motor function.
De Heer et al.31 examined data from the Dutch birth cohort, “Generation R.” The cohort consisted of 9,901 children born between 2002 and 2006. The outcome of interest was nonverbal IQ measured at age six. IQ data was available for 3,441 children. Of these, 415 had been exposed to anesthesia before the age of five. The authors reported an association between anesthesia and IQ, after adjusting for sex, prematurity, maternal education, IQ, smoking history, and alcohol use.
Stratmann et al.32 studied 28 children aged 6 to 11 yr old who had general anesthesia exposure before age one. They compared recollection and familiarity memory, IQ scores, and behavior with 28 age- and sex-matched control children. They found impaired recollection memory, but otherwise no differences were found between groups.32 Backeljauw et al., from Cincinnati Children’s Hospital, used a cohort of healthy participants, ages 5 to 18, in a language development magnetic resonance imaging study that examined the effects of anesthesia and surgery before age four.8 The comparison group was matched for age, sex, handedness and family income. The exposed group (n = 53) scored lower than the control group (n = 53) in performance IQ scores, and in listening comprehension. This was for both single and multiple episodes of anesthesia exposure. Another small study compared 68 children who had surgery before 3 yr of age for glaucoma, with 47 children that had not.33 They also examined a subgroup of children that had received multiple anesthetics. They found no evidence of differences when the children were tested for verbal fluency or digit span tests when tested at 5 to 16 yr of age.
The cognitive outcomes following a single and relatively brief anesthesia exposure were examined in the Pediatric Anesthesia NeuroDevelopment Assessment (PANDA) study in a sibling-matched cohort of healthy children.34 The PANDA study is an ambidirectional cohort study of American Society of Anesthesiologists Physical Status I or II children who had a single episode of general anesthesia for inguinal hernia surgery at 36 months or before, as compared with their siblings who received no anesthesia or sedation before age 36 months. Use of siblings as controls is a powerful technique to reduce many of the confounding influences. The median duration of exposure was 80 min. The assessment was performed in both siblings using a comprehensive neuropsychologic battery between 8 to 15 yr of age. One-hundred-five sibling pairs were recruited (mean age: 17.3 months at surgery/anesthesia); with 95 males/10 females in the surgical group and 59 males/46 females in the unexposed siblings group. Full scale IQ was the primary outcome. The mean IQ was similar in both groups: 111 in the exposed group and 111 in the unexposed group, with a difference between groups (exposed – unexposed) of only 0.2 points (95% CI, −2.6 to 2.9). There was also no evidence for any significant difference in IQ sub-domains with a difference of 0.5 (95% CI, −2.7 to 3.7) in performance IQ and −0.5 (95% CI, −3.2 to 2.2) in the verbal IQ. There was no evidence of any significant difference in the other tests of memory/learning, motor/processing speed, visuospatial function, attention, executive function, and language. In the unadjusted analysis, differences were seen in some aspects of behavior and verbal fluency, but these differences were no longer apparent once adjustments were made for sex. In a secondary analysis, proportions of children scoring below clinically relevant cut offs for behavior were also compared. Twenty-one percent of exposed and 10% of unexposed siblings did have abnormal Child Behavior Check List internalizing scores (greater than 60), which was statistically significant, after adjusting for sex. The significance of this result should be regarded with caution, as this was a secondary analysis of a secondary outcome. Subanalyses looking at those with longer exposure found no evidence of differences in IQ in those exposed to up to 120 min of anesthesia, and age of exposure had no impact on the outcome.
In the PANDA study, IQ scores had mean differences of 0.2 to 0.5 points and the 95% CI around the differences are within ±4 points. This is because the PANDA study was designed to detect differences that have been reported to be significant in developmental neurotoxicology studies. Thus, it was adequately designed to provide evidence to rule out that a single episode of anesthetic exposure is likely to have any adverse effects on neurodevelopment.
Both single and multiple episodes of anesthesia exposures are being examined in the Mayo Anesthesia Safety in Kids study, which also uses an ambidirectional study design. The study’s population included children who received general anesthesia before age three, either as a single episode of exposure or with multiple episodes of exposure.35 The control group is a propensity age-matched group. Testing of the study subjects occurred at two later ages (age 8 to 12 yr or 15 to 19 yr). The neuropsychologic battery included the Operant Test Battery, a test that was used in an earlier nonprimate study. The results of the Mayo Anesthesia Safety in Kids study are still pending.
In summary, there is mixed evidence for an association between anesthesia exposure and deficits in neuropsychologic testing. Some well-powered and carefully conducted studies such as PANDA found no evidence of association with any deficit. Other studies found deficits in IQ, language, abstract reasoning, or some aspects of memory.
Surgery in the Neonatal Period
There have been several cohort studies that specifically examined the impact of surgery in the neonatal period.36,37 In one study, more than half of children with tracheoesophageal fistula repair had neurodevelopmental delays requiring referral to early intervention services.38 Similarly, children who have had congenital diaphragmatic hernia repair had a high rate of poor neurodevelopmental outcomes,39 and extremely premature neonates who underwent laparotomy had poorer neurodevelopmental outcomes compared with matched controls.5 Another study looked at surgery in very preterm infants, and found that when compared to a nonsurgical group, the surgical group had lower mental development index scores, lower brain volumes, smaller deep nuclear gray matter volumes, and more white matter injury.40 However, there was no evidence of differences in mental development scores when adjusted for potential confounding influences. In another matched cohort study, infants who underwent major surgery had poorer school grades compared to a matched control group of infants who had major nonsurgical medical conditions.6 In children weighing less than 1,000 g at birth, neurologic impairment was present in more children who had undergone surgery for patent ductus ligation compared with those that had received medical therapy.41 In another study of extremely preterm infants, the IQ of those who had undergone surgery was lower at 5 yr of age and they exhibited more sensorineural disability than those who had not undergone surgery.4 A recent systematic review that examined 23 studies found developmental delays in 23% of children that had neonatal surgery for non-cardiac conditions.37 These studies used the Bayley Scales of Infant Development as a measure of neurodevelopment and when compared to population normative data, the meta-analysis of these studies found evidence of delays in both motor and cognitive subscales. While the majority of these studies show good evidence for significantly increased risk of poorer neurodevelopmental outcomes in neonates that had major surgery, nearly all these studies involve neonates that have many other risk factors for poor neurodevelopment, including presence of syndromes or premature birth.
Cardiac surgery in infants often requires long surgery with substantial exposure to anesthetic agents; however, children having surgery for congenital heart disease are at risk for adverse neurodevelopmental outcomes for many reasons. In neonatal cardiac surgery, Andropoulos et al. found an association between the dose of volatile anesthetic and neurodevelopmental outcome measured at 12 months of age.42 In contrast, Guerra et al. found no evidence of an association between cumulative dose of sedative/analgesia used before, during, or after cardiac surgery in infants younger than 6 months of age, and adverse neurodevelopmental outcome measured at 18 to 24 months of age.43 However, when assessed at 4 yr of age they did find evidence of an association between cumulative dose of chloral hydrate and lower performance IQ, and between cumulative dose of benzodiazepines and lower visual motor integration scores.44 In both studies it should be noted that children who required longer surgery or longer sedation were more likely to be the children that were sicker and had more major surgeries. Cohort studies cannot determine the impact of anesthesia exposure in children who have cardiac surgery.
Limitations of Cohort Studies: Confounding Factors
All previously reported studies are cohort studies. Cohort studies are inherently limited due to the possibility of confounding factors. Confounding occurs when another factor directly influences both the likelihood of needing anesthesia and neurodevelopmental outcome, resulting in the false assumption that anesthesia causes the neurodevelopmental outcome. It is difficult to eliminate known or probable confounding factors in studies examining possible effects of anesthetics on neurodevelopment. Children receive anesthetics for a procedure or surgery. The procedure or surgery is often being done because the child has a condition that may directly affect neurodevelopment or is associated with conditions that affect neurodevelopment (e.g., prematurity, syndromes, chromosomal abnormalities, or cerebral palsy). Other examples of potential confounding influences are that children with poor hearing are more likely to have a myringotomy to optimize their hearing, or children from low socioeconomic circumstances are more likely to have poor dentition. Similarly, uncooperative children with a yet to be diagnosed behavioral problem may be more likely to need anesthesia for procedures that healthy children may tolerate awake. The surgery itself may also have an impact on neurodevelopment. Major surgery is associated with a significant inflammatory response which may have an impact on the developing brain. There are other perioperative factors such as pain, low cerebral perfusion, hypoxia, and electrolyte disturbances that could all impact brain development. In summary, there are a great many possible sources of confounding factors in the association between anesthesia and neurodevelopmental outcome.
Careful patient selection, matching, and adjusted analyses can reduce some of the effects of known confounding; however, the matching and adjustments are not perfect and no adjustment can reduce the possible impact of unknown confounding factors. Most importantly, the studies cannot adjust for the confounding effect of the surgery itself. Thus, a positive finding of association in a cohort study can never be assumed to be due to the effect of the anesthetic. Sibling or twin matching is perhaps the best way to reduce the confounding influences of genetics and environment. It is important to note that the PANDA study, which used sibling matches, and the Bartels twin study found no evidence for an association.
The most effective way to reduce non-random confounding influences is with a randomized controlled trial. To reduce the confounding effect of surgery, the trial needs to randomly allocate surgical patients to anesthesia and no anesthesia. This is clearly not feasible, but a spinal anesthesia is a feasible alternative in some circumstances. It is unlikely that spinal anesthesia has any direct toxic effects on the brain.45 This was the rationale behind the General Anesthesia compared to Spinal anesthesia trial (GAS).46 Seven-hundred-twenty-two infants under 60 weeks postmenstrual age were randomized to awake-regional (predominantly spinal) or sevoflurane-based general anesthesia for inguinal hernia repair. The median duration of anesthesia in the general anesthesia group was 54 min. The primary outcome was full scale IQ using the Wechsler Preschool and Primary Scale of Intelligence, Third Edition to be assessed at 5 yr of age. These results will not be available until early 2018. Neurodevelopment assessed by the Bayley-III at 2 yr of age was a predefined secondary outcome. The Bayley-III has five domains: cognitive, language, motor, social emotional, and adaptive behavior. Each domain has a normalized mean of 100 and a SD of 15 points. In the GAS trial, the difference in the cognitive composite score was 0.17 points (95% CI, –2.30 to 2.64). This was within the predefined equivalence margin of 5 points, which provides strong evidence for no difference between groups, and is well within any margin that would be regarded as clinically relevant. There was also strong evidence of equivalence in all the other four domains of the Bayley-III. There was no difference in results comparing intention-to-treat and as-per-protocol analyses, implying that the 19% failed spinal cases did not bias the outcome, and no difference between multiple imputation and complete case analyses, implying the 15% loss to follow-up did not bias the results either. It must be stressed, however, that neurodevelopmental testing at 2 yr of age is inherently limited for higher executive function and some aspects of memory.47 Thus, while the preliminary results of the GAS trial found strong evidence of no added risk associated with an hour of general anesthesia as compared to spinal anesthesia, the results are not definitive.
Collating the Human Data
It is not possible to make a single definitive conclusion as to whether or not the human evidence supports or refutes the possibility that anesthetic exposure in children causes adverse effects on neurodevelopment. As Ted Eger pointed out previously, it is essentially impossible to prove absolutely that a technique is always safe, or a drug completely nontoxic.48 No number of negative studies will ever prove that anesthetics have no impact on neurodevelopment; however, human studies can provide some idea of the likelihood for an underlying causative association, which populations and domains might be most at risk, and which strategies reduce the risk.
Overall human studies have found mixed evidence for an association between anesthesia exposure in early life and neurodevelopmental outcome. This is not inconsistent with the underlying effect, given the range of outcomes and populations studied. Some, but not all, large population-based studies have found evidence for a small difference in tests of academic achievement and school readiness, while some studies have found an increased risk of not being able to be tested. The difference in school grades is small and unlikely to have a measurable impact on a child’s wellbeing. The added risk is far less than other factors such as sex or maternal education. Similarly, some but not all studies, have found evidence of an association between surgery and anesthesia in early life, and increased risk of a diagnosis of a behavioral disorder or learning disability. The added risk is small, but given the implications of such a diagnosis, the overall impact on society is potentially worrisome. Lastly, there is mixed evidence of an association between anesthesia exposure and poorer outcome in some domains of neuropsychologic testing.
Confounding factors are the greatest limitation for all the human cohort studies. The GAS trial is the only trial where randomization would minimize confounding factors, and the PANDA and Bartels studies are perhaps the most carefully matched cohort studies. Both the GAS and PANDA studies found no evidence of any difference in neurodevelopmental outcome in children having less than 2 h of anesthesia in infancy.
The FDA warns that anesthesia exposure in children younger than 3 yr of age having long duration or multiple exposures may have an impact on neurodevelopment. The age limit is presumably derived from both preclinical and human data. There is very little human data to support or refute using 3 yr of age as the limit. Most human studies have focused on younger children so there is very limited data on children over 3 yr of age. A few studies have examined different age subgroups, and those have not consistently found evidence for greater risk in younger children. The impact of multiple exposures compared to single exposure is also unclear. Several, but not all, studies have found a greater impact associated with multiple exposures, and most studies do not have sufficient power to differentiate between single and multiple exposures. The greater impact with multiple exposures may also be explained by the greater influence of confounding factors—children that have more comorbidities require more surgeries. Human studies also shed little light on what duration is “safe.” Most anesthetics in children are under 2 h in duration. Thus, the large population-based studies would have a considerable number of these relatively “short” exposures. If the effect were duration dependent, this could explain the modest effect sizes seen in these studies. Some studies have looked at the impact of duration of exposure; PANDA found no evidence of a difference in outcome when comparing less than 1 h and 1 to 2 h exposure. There are very few human data about exposures greater than 3 or 4 h. Relatively few children have long procedures; longer procedures, like multiple procedures, may be associated with a greater influence of confounding factors. Many of the cases described in the neonatal and cardiac studies would be of long duration; however, these are also the groups with potentially the greatest likelihood of confounding influences.
In summary, there is only extremely weak human evidence to support the FDA warning that repeated or lengthy use of anesthetic drugs may affect the development of children’s brains. The warning should be regarded as being largely based on the extensive and much more robust evidence from preclinical studies. There is, however, more substantial human evidence to support the FDA statement that single, relatively short exposures are not associated with increased risk. There is very little, if any, human evidence to support a recommendation that a particular age is safe or unsafe.
Other Implications of the Human Studies
If the results from the preclinical studies were to be completely ignored, then what could be concluded from the results of the human studies alone? The majority of the studies found some association between surgery in early childhood and increased risk of adverse neurodevelopmental outcomes. The association may be due to pathology, or indication for surgery, but it may also be due to other perioperative factors which are under the control of the anesthesiologist. The PANDA study results could indicate that these perioperative factors are not likely to be a problem for healthy infants having short procedures, but they may still be important in other populations. When a detailed examination of the association between surgery and neurodevelopmental outcome is performed, considerations should be given to all potentially reversible causative factors, and not just neurotoxicity.
To further define whether or when anesthetics have a direct impact on neurodevelopment requires more high quality human studies, especially in children having prolonged or repeated exposure to anesthesia. Ideally, these would be randomized trials in healthy infants comparing anesthesia regimens that do and do not produce the changes seen in preclinical studies. These would not be easy trials to perform. Particular problems are the paucity of healthy infants having long procedures or repeated procedures, the uncertainty over which “nontoxic” anesthetic regimens would be clinically feasible, and the long delay between randomization and the ideal outcome measure. Another useful line of research would be to do larger and more detailed cohort studies to identify and further characterize those at greatest risk and determine what psychometric domains are most affected. Due to confounding factors, these more detailed cohort studies will always have limited capacity to provide evidence about whether or not anesthetics directly impact on neurodevelopment, but they may give a clue as to the other possible causes of the poor outcome. They will also provide valuable information about how to design future clinical trials comparing different strategies to reduce any impact on neurodevelopment.
The human studies provide mixed evidence of an association between anesthesia exposure in early childhood and later deficits in a range of neurodevelopmental outcomes. When added risk has been observed, it is very small. The variations in examined outcomes and generally small differences seen in human studies are not inconsistent with the preclinical data given the predominantly short exposures and ranges of populations and outcomes assessed. However, the strong likelihood of confounding influences in these studies, which are predominantly cohort studies, means that the human evidence for any association can only be regarded as very weak evidence that anesthesia actually causes these poorer outcomes. Thus, any recommendations for changing practice, including the FDA warning, continue to be driven largely by the preclinical evidence. In contrast, there is stronger human evidence that a single brief exposure in a healthy infant is not associated with poorer neurodevelopmental outcome.
Support was provided solely from institutional and/or departmental sources.
The authors declare no competing interests.