Few studies of how exposure of children to anesthesia may affect neurodevelopment employ comprehensive neuropsychological assessments. This study tested the hypothesis that exposure to multiple, but not single, procedures requiring anesthesia before age 3 yr is associated with adverse neurodevelopmental outcomes.
Unexposed, singly exposed, and multiply exposed children born in Olmsted County, Minnesota, from 1994 to 2007 were sampled using a propensity-guided approach and underwent neuropsychological testing at ages 8 to 12 or 15 to 20 yr. The primary outcome was the Full-Scale intelligence quotient standard score of the Wechsler Abbreviated Scale of Intelligence. Secondary outcomes included individual domains from a comprehensive neuropsychological assessment and parent reports.
In total, 997 children completed testing (411, 380, and 206 unexposed, singly exposed, and multiply exposed, respectively). The primary outcome of intelligence quotient did not differ significantly according to exposure status; multiply exposed and singly exposed children scoring 1.3 points (95% CI, −3.8 to 1.2; P = 0.32) and 0.5 points (95% CI, −2.8 to 1.9; P = 0.70) lower than unexposed children, respectively. For secondary outcomes, processing speed and fine motor abilities were decreased in multiply but not singly exposed children; other domains did not differ. The parents of multiply exposed children reported increased problems related to executive function, behavior, and reading.
Anesthesia exposure before age 3 yr was not associated with deficits in the primary outcome of general intelligence. Although secondary outcomes must be interpreted cautiously, they suggest the hypothesis that multiple, but not single, exposures are associated with a pattern of changes in specific neuropsychological domains that is associated with behavioral and learning difficulties.
There is strong evidence from preclinical studies that most general anesthetics modulate brain development. There is mixed evidence in humans that anesthesia exposure in early life is associated with changes in neurodevelopmental outcomes. The association may be stronger after multiple exposures.
This matched cohort study found that anesthesia exposure before age 3 yr was not associated with deficits in the primary outcome of general intelligence.
Single exposures were not associated with deficits in other neuropsychological domains (assessed as secondary out comes). However, multiple exposures were found to be associated with modest decreases in processing speed and fine motor coordination. Parents also reported that multiply exposed children have more difficulties with behavior and reading.
DRUGS producing general anesthesia can cause neurodegeneration and long-term deficits in learning and behavior in young animals (including nonhuman primates).1–3 Numerous studies have sought evidence for similar effects in children. Most observational studies find that multiple exposures to procedures requiring general anesthesia are associated with deficits in learning and behavior, albeit with small effect sizes in some studies.4–10 Some, but not all, human studies also find an association between single exposures and a variety of outcomes.4,8,9,11–18 These studies employed a wide range of designs and outcomes. Only two studies reported a comprehensive assessment of neuropsychological function: an unmatched cohort study14 and another that carefully matched subjects who were and were not exposed to anesthesia, but included only children undergoing herniorraphy, who were predominantly male.18 Thus, any specific pattern of neuropsychological changes associated with the exposure of a general population of children to procedures requiring anesthesia, if present, is still poorly defined.
The aim of the Mayo Anesthesia Safety in Kids (MASK) study was to test the hypothesis that exposure to multiple, but not single, procedures requiring general anesthesia before a child’s third birthday is associated with adverse neurodevelopmental outcomes. Using a matched-cohort design, this hypothesis was evaluated by prospective neuropsychological testing of a propensity-guided sample of children born in Olmsted County, Minnesota, from 1994 to 2007. The primary outcome for analysis was the Full-Scale intelligence quotient score of the Wechsler Abbreviated Scale of Intelligence. This score was chosen as the primary outcome based on comparability with other studies10,12,13,18 and the availability at the time of study design of school achievement test data that permitted power calculations, with the assumption that intelligence quotient is related to achievement test performance.5 Secondary outcomes included the results of a comprehensive battery of neuropsychological assessments and parent reports of behavior and learning difficulties.19
Materials and Methods
This study was approved by the Mayo Clinic and Olmsted Medical Center Institutional Review Boards (Rochester, Minnesota), and written informed consent/assent was obtained. Study methods have been previously published19 and are here summarized. In addition to the neuropsychological testing reported here, children were tested on the National Center for Toxicological Research Operant Test Battery; results of this testing will be presented in a future analysis.
Children born from January 1, 1994, to December 31, 2007, in Olmsted County, Minnesota, who resided within Olmsted County until their third birthday and who resided within 25 miles of Rochester, Minnesota, according to available records at study onset were identified using the resources of the Rochester Epidemiology Project, a medical records linkage system that provides access to the complete medical records of all Olmsted County residents, and birth certificate information obtained from the Division of Vital Statistics, Minnesota Department of Health. This date range was chosen as approximately coinciding with the more widespread use of sevoflurane into clinical practice and as providing a sufficient number of children who could be tested at age 8 or greater during the study period. Birth certificate information was used to establish that children were born in Olmsted County, an approach that minimized the potential for referral bias and facilitated recruitment for testing.
Subjects were eligible for testing if enrolled between the ages 8 and 12 yr or 15 and 19 yr to allow evaluation of any evolution of anesthesia-associated changes. These age ranges were chosen to represent two developmental stages (preadolescence and adolescence) and based on preliminary estimates of the number of children who would be available for testing. Those who enrolled at age 19 yr were tested even if they turned 20 yr before testing could be scheduled.
Through medical records review, each eligible child was classified as unexposed, singly exposed, or multiply exposed to anesthesia before their third birthday. Target recruitment goals were initially determined based on considerations of statistical power and feasibility, goals that were adjusted at the approximate midpoint of subject recruitment to account for actual recruitment patterns while maintaining statistical power (Supplemental Digital Content 1, https://links.lww.com/ALN/B695, and 2, https://links.lww.com/ALN/B696, showing enrollment goals and statistical power).
It was initially estimated that it would be necessary to contact all eligible multiply exposed children to meet recruitment goals. To minimize the potential for confounding, we sought to recruit singly exposed and nonexposed children who were best matched with multiply exposed children on a variety of characteristics potentially affecting the outcomes of interest. Singly exposed and nonexposed children were selected for recruitment using a frequency-matched approach, with strata defined based on their propensity for receiving single and multiple exposures to general anesthesia. Propensity scores were calculated using multinomial logistic regression including data available from the birth certificate (sex, gestational age at birth, birth weight, Apgar scores at 1 and 5 min, and mother’s and father’s age and level of education), and health status from data available in the medical record as estimated using the Johns Hopkins Adjusted Clinical Group Case Mix System, which calculates 32 binary indicator variables representing comorbidity clusters (aggregated diagnostic groups). Based on quintiles of the observed distribution of propensity scores for single and multiple exposures, 50 sex-specific propensity-matched strata (25 each for males and females) were defined and used to select those singly exposed and unexposed children to be randomly sampled within each stratum that included at least one multiply exposed child. Children were excluded if conditions that would preclude testing were present, including severe intellectual disability, limited English language proficiency, autism, and spastic cerebral palsy.
Each subject was assessed by a trained psychometrist who tested domains typically measured in clinical practice (Supplemental Digital Content 3, https://links.lww.com/ALN/B697, which lists all tests).19 The study psychometrists were chosen from a pool of 18 who staff our clinical Psychometric Assessment Lab. Initially, psychometrists undergo at least 4 months of full-time training and are not deemed independent for roughly 12 months after they begin training. Periodically thereafter, they are observed to assure test administration fidelity. After each testing session, as a quality control measure, all neuropsychometric assessment data were reviewed by another psychometrist for accuracy. Parent/guardian questionnaires assessed perceived behavior and learning difficulties. A summary of how selected tests of primary interest would be interpreted in terms of underlying domains measured was formulated before analysis (table 1), including study-specific composite scores to increase the ability to detect effects when more than one instrument assessed a particular domain. This a priori approach provided an overall roadmap for how any observed differences would be interpreted.
Details of the analysis plan were made available via a web-based repository before commencing analysis (https://osf.io/k93nb/; accessed March 1, 2018).
The sampling strategy planned to invite all available multiply exposed children to participate and sample propensity-matched singly exposed and unexposed children in fixed ratios to the multiply exposed children for each stratum. With ideal sampling, the characteristics of these singly exposed and unexposed children would be distributed similarly to the multiply exposed children. However, because not all who were invited participated, some sampling strata were missing subjects with a given exposure status. Thus, the primary analysis used inverse probability of treatment weighting to account for imbalances across exposure categories among children actually tested.20 The approach weighted the observed sample of singly exposed and unexposed children to mimic the originally planned fixed-ratio sampling and thus balance potential confounders across exposure groups. The need for this procedure was identified early during the accrual period and reflected in the analysis plan developed before completing subject accrual and posted before analysis.
Propensity scores were estimated from a multinomial model. Using Z to denote exposure and X to denote the vector of explanatory covariates in the propensity model, three probabilities were estimated for each individual: P(Z = 2|X), P(Z = 1|X), and P(Z = 0|X), using Z = 2 as shorthand for two or more exposures. Then, for a given individual with observed exposure group z (0 = none, 1 = single, 2 = multiple) and explanatory covariates X, the weight for the individual is given by
where P(Z = z|X) indicates the probability of being in group z given covariates X, and I(Z = z) is an indicator function taking values 1 if the individual was in group z and 0 if not. The weighted sample is expected to be balanced with respect to the distribution of baseline covariates across the three exposure groups, with a distribution similar to the population of multiply exposed individuals. Standardized differences in explanatory covariates between singly exposed versus unexposed and multiply exposed versus unexposed were compared before and after weighting to evaluate balance.
Prior literature suggests that explanatory variables in propensity score models should include factors associated with the outcome and factors that occur temporally before the exposure.21 This would include potentially confounding variables. For the inverse probability of treatment weighting propensity score, most explanatory variables were retrieved from birth certificate data. Only aggregated diagnostic groups occurring before age 3 were included in the model to best reflect subject characteristics over the period they were at risk for anesthesia exposure. Because some aggregated diagnostic groups are sparsely represented or omnipresent, all of them could not be included in the propensity model without overfitting the model. Further, some may be highly correlated, which may lead to collinearity in the propensity score model. We thus identified the subset of aggregated diagnostic groups associated with the primary outcome of intelligence quotient using multivariable linear regression. Explanatory variables including sex, gestational age at birth, birth weight, Apgar scores at 1 and 5 min, mother’s and father’s age and level of education, and socioeconomic status as measured by the HOUSES index,22 and all aggregated diagnostic groups were considered in a linear regression of the Wechsler Abbreviated Scale of Intelligence Full-Scale intelligence quotient score outcome. Backwards selection was used to assess what aggregated diagnostic groups were associated with this outcome, while keeping other explanatory variables in the model. A P < 0.1 was used for stay criteria to conservatively include aggregated diagnostic groups associated with outcome. Using this procedure, the final model for the inverse probability of treatment weighting propensity score thus included sex, gestational age at birth, birth weight, Apgar scores at 1 and 5 min, mother’s and father’s age and level of education, socioeconomic status, dermatologic aggregated diagnostic group, psychosocial aggregated diagnostic group, minor infection aggregated diagnostic group, asthma aggregated diagnostic group, and major infection aggregated diagnostic group. Sex by characteristic interactions were also included. The data for birth characteristics and aggregated diagnostic groups were complete for all individuals, whereas parental characteristics and socioeconomic status were complete for at least 97% of the study sample (socioeconomic status incomplete for 3%, father’s age missing for 3%; at most 2% missing for other data). Multiple imputations (n = 50 imputations) were performed to obtain complete data sets of characteristics necessary for the calculation of the inverse probability of treatment weighting propensity scores.
Each endpoint was analyzed as a continuous variable, with transformations used as necessary to satisfy the distributional assumptions (normally distributed errors) implicit in the analysis model. Linear regression including inverse probability of treatment weighting weights evaluated the relationship between exposure status and each outcome, using generalized estimating equations and a robust variance. For both the primary and secondary endpoints (table 1; Supplemental Digital Content 3, https://links.lww.com/ALN/B697), a two-tailed P value of less than 0.05 was considered statistically significant for the overall two degrees of freedom tests across exposure categories. Pairwise comparisons of single and multiple exposures versus no exposure were performed using P < 0.025 (Bonferroni adjustment) to denote statistical significance. For all comparisons, findings were summarized using point estimates and the corresponding 95% CI values, reflecting the combined analysis of multiple imputations. Age at testing was evaluated as a potential moderator of the effect of exposure on outcomes, with the interaction between exposure group and age at testing group (8 to 12 vs. 15 to 20 yr) assessed for all outcomes.
Some of the variables measured have established cutoffs for defining clinically meaningful deficits, including the Wechsler Abbreviated Scale of Intelligence Full-Scale intelligence quotient score (less than 85) and the Child Behavior Checklist (more than 60). These variables were dichotomized accordingly and analyzed using logistic regression in additional analyses.
For endpoints found to be significantly associated with exposure, four potential moderators (sex, gestational age, birth weight, and socioeconomic status) were examined.23 For each, regression analyses were performed that included explanatory variables for exposure category, the potential moderator variable, and the moderator-by-exposure interaction effect.
In the inverse probability of treatment weighting analysis, some combinations of covariates in the propensity score model may lead to small or large weights such that individuals have small or large amounts of influence on the exposure comparisons, which can lead to large variation of effect estimates. Weight truncation was performed to evaluate a possible bias-variance tradeoff and sensitivity to the original weights.21,24 Truncations were performed on the distribution of weights using the 1st and 99th percentiles, 5th and 95th percentiles, and 10th and 90th percentiles. As another method to explore the potential effect of extreme weights, for each sex, the subjects were stratified by quintiles of the propensity score distribution (10 strata total) among the multiply exposed. For this procedure, the propensity to be multiply exposed was estimated as in the primary analysis. Among multiply exposed, separately for each sex, the quintiles were obtained, and the participants were stratified according to those quintile and sex combinations. Because quintiles reflect the distribution of multiply exposed, this analysis also targets the average treatment effect among the multiply exposed. The results of the stratified analyses reflect the combined estimate across the strata.
In additional sensitivity analyses, multivariable regression models were used rather than inverse probability of treatment weighting to adjust for potential confounders. These models used those variables previously identified and used in the propensity score model, providing an estimate of an effect of exposure on treatment, conditional on baseline covariates. In another analysis, crude estimates of the differences between exposure groups were also performed without adjustment or weighting; this approach does not account for the sampling framework or any imbalance of covariates among the exposure groups. Finally, a post hoc sensitivity analysis using the primary inverse probability of treatment weighting approach excluded 18 children with cardiopulmonary bypass or intracranial procedures.
At the time of study design, no estimate of effect size for the primary endpoint of intelligence quotient was available.19 Prior work found that mean group academic achievement test scores in the multiply exposed were lower than for those not exposed by approximately 0.4 SD units and that the scores of those with single exposures were similar to those with no exposure.5 Power calculations were based on these data. The originally targeted sample sizes provided statistical power (two-tailed, α = 0.025) of 80% to detect a difference of 0.37 and 0.32 SD units, respectively, within each age group for pairwise comparisons of multiply and singly exposed children versus those not exposed, respectively. Power calculations based on the actual numbers tested are provided in Supplemental Digital Content 2 (https://links.lww.com/ALN/B696). All data were analyzed using SAS 9.4 TS1M3 (SAS Institute, Inc., USA).
Subjects were tested from November 2012 to November 2016. From the 19,296 children initially screened as potentially eligible for recruitment, 3,106 were invited to participate, and 998 (32%) enrolled, with highest enrollment rates in the multiply exposed (26%, 35%, and 43% of unexposed, singly exposed, and multiply exposed children, respectively; fig. 1). One subject refused all testing subsequent to enrollment. Those who enrolled had parents who were older, better educated, more likely to be married, and more likely to be white, but child characteristics were not different except that enrolled children were more likely to be the product of multiple births, and small differences were present in the frequency of some individual aggregated diagnostic group comorbidity clusters (table 2).
The median cumulative duration of anesthesia was 45 and 187 min in singly and multiply exposed children tested, respectively, with two-thirds of multiply exposed children receiving more than 2 h of anesthesia (table 3). The most common procedure type was otorhinolaryngologic (42% of all procedures; table 4); cardiovascular and neurologic surgeries comprised 4% of procedures (Supplemental Digital Content 4, https://links.lww.com/ALN/B698, presents details of all procedures for the multiply exposed). The most common anesthetic agents utilized included sevoflurane and nitrous oxide (79% and 90% of procedures, respectively; Supplemental Digital Content 5, https://links.lww.com/ALN/B699, which lists agents utilized). Approximately half of children received at least one anesthetic after their third birthday (Supplemental Digital Content 6, https://links.lww.com/ALN/B700, which presents this subsequent exposure history). Parent and child characteristics were similar among exposure categories; exceptions included small differences in the education of the father, delivery method, and some individual aggregated diagnostic group categories (table 5). The standardized differences between the factors used in the propensity scoring for the primary analysis were small after inverse probability of treatment weighting adjustment (fig. 2).
Interactions between exposure and age at testing were not significant for any outcome (P > 0.05); i.e., any effects of exposure did not depend on the age at testing. Age at testing (8 to 12 vs. 15 to 20 yr) was still included in all models to account for any differences in outcomes that would depend on age.
The primary outcome of Wechsler Abbreviated Scale of Intelligence Full-Scale intelligence quotient did not differ significantly according to exposure status, with multiply exposed children scoring 1.3 points (95% CI, −3.8 to 1.2; P = 0.32) lower and singly exposed children scoring 0.5 points (95% CI, −2.8 to 1.9; P = 0.70) lower than unexposed children on average (table 6). For the other psychometrist-assessed neuropsychological testing scores of a priori primary interest as secondary outcomes (table 1), only processing speed/automaticity associated with reading skills (as determined by the rapid naming composite of the Comprehensive Test of Phonological Processing) and the fine motor study composite differed significantly between multiply exposed and unexposed children (differences of −3.5 [−6.3 to −0.7] and −5.5 [−8.4 to −2.6] respectively, both standard scores; table 6). These scores did not differ significantly between singly exposed and unexposed children. There were no significant differences in measures of attention, memory, executive function, expressive language, visual–motor abilities, or visual–spatial abilities between unexposed and either exposure category (table 6).
When all psychometrist-assessed scores were considered (Supplemental Digital Content 7, https://links.lww.com/ALN/B701 and 8, https://links.lww.com/ALN/B702, which present statistical comparisons and estimates, respectively, for all psychometrist-assessed scores and parent reports), of the eight scores that were both dependent on motor ability and had a timed component (Grooved Pegboard dominant and other hand, Beery Motor Coordination, and the five Delis–Kaplan Executive Function System trail making tasks). seven were significantly lower in multiply exposed children. Thus, the results suggest a consistent impairment of fine motor function and processing speed associated with multiple exposures. No score was significantly different in singly exposed children.
For the secondary outcome of parent reports, the Behavior Rating Inventory of Executive Function and the Colorado Learning Difficulties Questionnaire Reading (but not Math) Scale were significantly greater (indicating more problems) in both singly and multiply exposed children (table 6) compared with unexposed children. All scales of the Child Behavior Checklist were significantly greater (indicating more problems) in multiply but not singly exposed children compared with unexposed children. The proportion of children with clinically abnormal parent-reported scores was significantly greater for the Child Behavior Checklist Externalizing Problems Scale in both singly and multiply exposed children (Supplemental Digital Content 7, https://links.lww.com/ALN/B701, and 8, https://links.lww.com/ALN/B702).
In moderator analyses, examining those scores significantly different in multiply exposed children, sex, gestational age, birth weight, and socioeconomic status did not moderate the association between exposures and any score, with the exception of the interaction term for socioeconomic status and the Comprehensive Test of Phonological Processing (table 7). Given the multiple interaction terms sought across multiple outcomes, the significance of this isolated interaction term is unclear.
In sensitivity analyses (Supplemental Digital Content 9, https://links.lww.com/ALN/B703, which provides the results of these analyses), the crude analysis (i.e., no adjustments via weighting or other methods) produced the largest effect sizes for most scores, with trends observed in the primary analysis now statistically significant for several scores. For most scores, absolute effect sizes also were larger for covariate-adjusted analysis and increased as the degree of inverse probability of treatment weighting truncation increased. Stratification of inverse probability of treatment weighting scores and imputation of missing outcome values had little effect. For all adjusted analyses, there was still little evidence in any analysis for exposure effects on any measure of attention, executive function, memory, expressive language, visual–motor abilities, or visual–spatial abilities. For the parent reports, several of the sensitivity analyses now demonstrated significant differences in all measures reported for singly exposed children. These results suggest that despite the propensity-guided recruitment strategy employed, there were still imbalances in the baseline characteristics among children who actually enrolled that affected some interpretations of exposure effects in terms of statistical significance for secondary outcomes (fig. 2).
Regarding the primary outcome, exposure to procedures requiring general anesthesia before the age of 3 yr was not associated with significant differences in general cognitive ability as quantified by the Full-Scale intelligence quotient score, relative to unexposed children. Regarding secondary outcomes, multiple, but not single, exposures were associated with decreases in a processing speed task related to retrieval of verbal codes associated with reading and fine motor coordination but not other psychometrist-assessed domains. The parents of multiply exposed children reported more problems related to executive function, behavior, and reading (but not math); the parents of singly exposed children reported more problems related to executive function and reading. These findings did not depend on age at testing.
The absence of association between exposure and Full-Scale intelligence quotient is consistent with several smaller prior studies.12,13,18 A large population-based study10 found small effects of exposures before age 4 yr (decreases of 0.97 points [95% CI, −1.78 to −0.15] for a single exposure and 1.02 points [95% CI, −3.43 to 1.39] for two exposures) of similar magnitude to the present study, which was not powered to detect this small effect size. Thus, the present results add to the evidence that exposure is associated with no effect or a small effect on general intelligence.
Analysis of secondary outcomes revealed a specific pattern of changes. This analysis is of importance (albeit with appropriate cautions in interpretation because they are secondary outcomes) given that there was little understanding of a likely phenotype at the time of study design. Two prior studies have utilized comparable comprehensive neuropsychological assessments. A sibling-matched cohort study of 105 children singly exposed before age 36 months (the PANDA study) found no significant differences in a battery of tests similar to ours, with the exception of more exposed children having abnormal Child Behavior Checklist Internalizing Scores.18 Unlike our results, they found no differences in the Behavior Rating Inventory of Executive Function parent assessment of executive function. In a study of approximately 200 children who were singly or multiply (20% of children) exposed to anesthesia before age 3,14 exposure was associated with significant deficits in performance intelligence quotient and language abilities, as well as tendencies for decreases in combined fine and gross motor performance and increased Child Behavior Checklist problems. In an additional analysis, among those exposed between ages 3 and 5 yr, motor performance but not other domains were significantly affected.25 Two other small studies evaluated a more limited range of domains. Single exposures were associated with decrements in listening comprehension and performance intelligence quotient.13 Children exposed before age 1 yr assessed with an object recognition test had lower recollection memory scores but no differences in the Child Behavior Checklist or familiarity scores.12 Due to differences in study design and assessments, it is difficult to directly compare all of these results with ours. Broad areas of consistency include some evidence for exposure being associated with differences in performance intelligence quotient, motor skills, and parent ratings of behavior. In contrast, we failed to identify associations with measures of language processing or memory, although we utilized different assessments that may have measured different constructs. Thus, our study is unique in finding decreases in scores reflecting fine motor skills and processing speed, in the absence of changes in scores assessing other cognitive domains, in children receiving multiple exposures. These decreases are modest (effect sizes of less than 0.5 SD) and occur in the context of relatively normal performance in unexposed children (estimates in Supplemental Digital Content 8, https://links.lww.com/ALN/B702).
If confirmed in further analyses, would modest differences in these two domains be potentially relevant to children and their families? It is not possible to make definitive conclusions, but the parent-reported outcomes and results of prior studies may provide insights. Several reviews summarize studies examining the association between exposure and patient-relevant outcomes such as behavioral problems, learning difficulties, and academic achievement.26,27 Multiple, but not single, exposures to anesthesia are associated with an increased risk of attention deficit hyperactivity disorder, learning disabilities, and decreased performance in group-administered assessments of ability and achievement,5–7,23 outcomes of potential relevance to children and families. The association between multiple exposures and parent reports of attention deficit hyperactivity disorder problems on the Child Behavior Checklist is consistent with this prior work, also performed in children born in Olmsted County, Minnesota. The association of single exposures with reduced scores on reading, but not math, achievement tests11 is also consistent with the current Colorado Learning Difficulties Questionnaire results. The prior work also found an association of multiple exposures with decreases in both reading and math achievement tests;5 in the current study, the Colorado Learning Difficulties Questionnaire differed significantly only for reading. However, most of the neuropsychological test results in the current study did not depend on exposure status, including some that may reflect problems with behavior or learning. For example, children with learning disabilities often exhibit impairment in domains such as attention, memory, and executive function,28,29 yet these domains were not affected. Many children with attention deficit hyperactivity disorder exhibit deficits in executive function and attention tests,30–32 but others do not, especially when evaluated in a focused laboratory setting as contrasted with their natural environment.33 Also in the current study, children being treated for attention deficit hyperactivity disorder were not instructed to discontinue medications for testing, which also could have affected results.
Nonetheless, motor deficits and decreases in processing speed are common in children with attention deficit hyperactivity disorder or reading disabilities.34–36 For example, motor deficits are characteristic of developmental coordination disorder, which is associated with attention deficit hyperactivity disorder and learning difficulties.35,37 The fine motor composite was significantly correlated with both Child Behavior Checklist: Attention Deficit Hyperactivity Disorder Problems (Spearman’s ρ [rs] = −0.22) and reading difficulties (rs = −0.27), and the Comprehensive Test of Phonological Processing was significantly correlated with both Child Behavior Checklist: Attention Deficit Hyperactivity Disorder Problems (rs = −0.14) and reading difficulties (rs = −0.31; all P < 0.0001), suggesting that these changes may be related to these behavioral and learning difficulties. The finding of a correlation between the Comprehensive Test of Phonological Processing and parent report of reading difficulties could also be explained by weaknesses with other fundamental skills needed for successful reading (e.g., phonological awareness, sight word vocabulary, and/or phonics), which in part also determine performance on the Comprehensive Test of Phonological Processing. However, these were not formally assessed due to time constraints with the testing session. In addition, attention deficit hyperactivity disorder and learning disabilities frequently cooccur, and other studies suggest that defects in processing speed may be an underlying explanatory cognitive risk factor for both conditions.28,38 Our prior work found a high rate of concordance between attention deficit hyperactivity disorder and learning disabilities in children multiply exposed to anesthesia.23 Consistent with this observation, Child Behavior Checklist: Attention Deficit Hyperactivity Disorder Problems and Colorado Learning Difficulties Questionnaire Reading Scales were correlated (rs = 0.42, P < 0.0001).
Although diagnoses such as attention deficit hyperactivity disorder and learning disabilities are clinically useful, their causes are multifactorial, and similar phenotypes may result from different neuropsychological deficits.33,39,40 If further evidence supports the hypothesis that anesthesia exposure causes a phenotype diagnosed as attention deficit hyperactivity disorder or learning disabilities, the underlying mechanism may be unique, and the pattern of neuropsychological abnormalities may differ from other children diagnosed with attention deficit hyperactivity disorder or learning disabilities who are not exposed to aesthesia.
These observations also provide context to interpret preclinical studies, although correlations between measures in humans and animals must be made cautiously. Most rodent studies consistently find sustained impairments in learning and memory,1,12,41 as do the limited primate studies,42,43 but we find no evidence for an association between exposure and these domains in humans. Most rodent studies find little effect of exposure on measures of attention or locomotor activity or behavioral tests,44–46 although some recent studies in mice suggest effects on social behavior.47 In contrast, primate studies find effects on anxiety-related behaviors48,49 and motor reflex deficits.48 Deficits in response rates to operant test battery tasks dependent on motor skills and processing speed are also consistently observed in ketamine-exposed macaques.43
As with all observational studies, unmeasured confounders may affect outcomes.6,50,51 A propensity-guided strategy attempted to recruit children who were comparable for health status and other factors potentially relevant to neurodevelopment within a population-based sample to reduce the potential for referral bias, with inverse probability of treatment weighting used to account for residual imbalances between exposure groups. Still, children who need procedures differ from those who do not, and it is not possible to fully account for such differences.6,23,52 This raises the potential for confounding by indication if the procedural indication affects neurodevelopmental outcomes,53 such as may be the case with cardiopulmonary bypass and intracranial procedures.54,55 However, a post hoc sensitivity analysis excluding children who received at least one of these procedures had little effect on the results (Supplemental Digital Content 9, https://links.lww.com/ALN/B703). The finding of a specific pattern of changes in secondary outcomes also argues against confounding by indication, because it is not immediately apparent what common underlying condition across all children receiving procedures would produce such a specific pattern. Finally, it is also possible that elements of procedural experience other than anesthesia exposure, such as a stress response to surgery and pain, may affect neurodevelopment. Thus, these findings cannot directly demonstrate causality but should be interpreted in the context of other animal and human data.52
Selection bias is possible given that not all who were invited accepted, and some characteristics of parents who accepted differed from those who did not. Parents who accepted may have been more concerned about their child’s development than those who did not, which could bias parent reports of behavior and learning if such concerns differ according to exposure status. However, the alignment of the current results with our prior population-based studies of attention deficit hyperactivity disorder and learning disabilities based on records review5–7,23 (in which potential selection bias is not an issue) argues against significant bias.
The need to adjust for residual imbalances even after propensity-guided sampling raises the potential for statistical artifacts. Sensitivity analyses revealed that although effect size estimates depended on the adjustment method, the overall pattern of results was little affected. Testing multiple secondary endpoints also has the potential to detect spurious associations (type 1 error), prompting an a priori analysis plan that specified how results would be interpreted (table 1) and the creation of study composites that reduced the number of comparisons. Although we did note a specific pattern of effects on these secondary endpoints, these results must be interpreted cautiously because they are secondary endpoints, and multiple comparisons were made.
Although this study represents the largest in the field to employ detailed neuropsychological assessments, there may still be limitations in the ability to detect small differences according to exposure category. For example, in singly exposed children, the mean effect sizes for some scores, including the Comprehensive Test of Phonological Processing and fine motor composite, were between 0 and the effect sizes for multiply exposed children, but their CI included 0 (i.e., they were not statistically significant). It is thus possible that even single exposures were associated with subtle changes in some scores but that our study lacked sufficient power to detect these differences.
Other potential limitations include that (1) although most characteristics of Olmsted County residents resemble those of other Minnesotans, some differ from the U.S. population as a whole;56 (2) neuropsychological tests were selected to assess important domains across a wide range of ages within a feasible testing period, but there are strengths and weaknesses for all tests, and some relevant domains may have been missed; (3) approximately half of subjects had exposure after the age of 3 yr, and these exposures could bias against finding differences if they too affect outcomes8–10 ; and (4) the analysis examined mean effects over all children tested; this approach may not be sufficiently sensitive to detect significant effects if only some children are affected.
Exposure of children to procedures requiring general anesthesia before the age of 3 yr is not associated with lower Full-Scale intelligence quotient in later life (assessed as a primary outcome). In addition, single exposures are not associated with deficits in other neuropsychological domains (assessed as secondary outcomes). These findings should be reassuring to clinicians and families. However, multiple exposures are associated with modest decreases in processing speed and fine motor coordination but not changes in other neuropsychological domains. Parents report that multiply exposed children have more difficulties with behavior and reading. These secondary outcomes must be interpreted cautiously, but suggest the hypothesis, which will need to be evaluated in future work, that exposure to multiple procedures requiring general anesthesia is associated with a subtle, specific pattern of injury that may have consequences for subsequent learning and behavior.
The authors thank the staff at the Psychologic Assessment Lab (Mayo Clinic, Rochester, Minnesota) for their dedicated work in subject testing, Bradley Peterson, M.D. (Children’s Hospital of Los Angeles and the Keck School of Medicine at the University of Southern California, Los Angeles, California), for his helpful comments, and Young Juhn, M.D. (Mayo Clinic), for assistance with determining socioeconomic status as measured by the HOUSES index. The authors also recognize the extraordinary contributions of Robert Colligan, Ph.D., who was crucial in the design of the study and died during its conduct; he is sorely missed.
This document has been reviewed in accordance with U.S. Food and Drug Administration policy and approved for publication. Approval does not signify that the contents necessarily reflect the position or opinions of the Food and Drug Administration, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Food and Drug Administration.
Supported by grant No. R01 HD071907 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health, Bethesda, Maryland, and also utilized the resources of the Rochester Epidemiology Project, supported by grant No. R01 AG034676 from the National Institute on Aging of the National Institutes of Health, Bethesda, Maryland.
The authors declare no competing interests.