Part task training (PTT) focuses on dividing complex tasks into components followed by intensive concentrated training on individual components. Variable priority training (VPT) focuses on optimal distribution of attention when performing multiple tasks simultaneously with the goal of flexible allocation of attention. This study explored how principles of PTT and VPT adapted to anesthesia training would improve first-year anesthesiology residents' management of simulated adverse airway and respiratory events. The authors hypothesized that participants with PTT and VPT would perform better than those with standard training.
Twenty-two first-year anesthesia residents were randomly divided into two groups and trained over 12 months. The control group received standard didactic and simulation-based training. The experimental group received similar training but with emphasis on PTT and VPT techniques. Participant ability to manage seven adverse airway and respiratory events were assessed before and after the training period. Performance was measured by the number of correct tasks, making a correct diagnosis, assessment of perceived workload, and an assessment of scenario comprehension.
Participants in both groups exhibited significant improvement in all metrics after a year of training. Participants in the experimental group were able to complete more tasks and answered more comprehension questions correctly. There was no difference in perceived workload or the number of correct diagnoses between groups.
This study in part confirmed the study hypotheses. The results suggest that VPT and PTT are promising adjuncts to didactic and simulation-based training for management of adverse airway and respiratory events.
PART task training (PTT) and variable priority training (VPT) are techniques that have been developed by psychologists to optimize human performance when completing complex tasks. These techniques have been successfully implemented in a number of simulator-based professional training arenas and have led to higher pass rates in settings where students are asked to manage multiple tasks.1
Part task training is defined as the decomposition of large multicomponent tasks into a set of component tasks that when trained as individual components either separately or in various combinations can become highly automatized.2–4This training reduces processing demands by streamlining effort associated with the individual elements of the task. Focused training also leads to more rapid development of automatic skills that might otherwise not be achieved in the context of the whole task. Variable priority training is a method for training people to flexibly distribute attention over multiple aspects of a task. Participants in VPT learn to coordinate and control how attention is allocated to components of a task and assign different processing priorities to the components as they are performed in concert. VPT fosters flexible cognitive style that reduces the likelihood of cognitive tunnel vision.5
We have used PTT and VPT techniques as part of the didactic and simulation-based training for first-year anesthesia residents (CA-1s) over a 12-month period. Training was directed toward detection and appropriate treatment of adverse airway and respiratory events reported in the closed anesthesia malpractice claims database.6–8These events were made up of unrecognized esophageal intubations as a result of difficult intubations,8airway trauma, pneumothorax, airway obstruction, aspiration, and bronchospasm.7These adverse events occur with a higher frequency in pediatric patients with more severe consequences (e.g. , higher rate of mortality or brain injury).6Airway management difficulties, impaired vigilance, inadequate supervision, poor judgment, diversion of attention, and misinterpretation and misuse of data were also noted as potential sources for bad outcomes associated with adverse airway and respiratory events.6
The aim of this study was to demonstrate that PTT and VPT would improve CA-1 management of simulated adverse airway and respiratory events. Compared with CA-1s with conventional simulator training, we hypothesized that CA-1s with PTT- and VPT-oriented simulator training would (1) complete more critical tasks essential to managing an adverse event, (2) reach a correct diagnosis more often, (3) report a decreased perception of workload, and (4) demonstrate an increased level of comprehension when managing simulated adverse airway and respiratory events than CA-1s with conventional training.
Materials and Methods
After University of Utah institutional review board approval (Salt Lake City, Utah), 22 University of Utah CA-1s were consented to participate over a 2-yr period. Training consisted of 12-month-long rotations in general adult anesthesia, pediatric anesthesia, and surgical intensive care practice. Participants were randomly divided into two equal-size groups: control and experimental. The random allocation process was computer generated. To identify potential differences between groups, preliminary in-training examination scores and US Medical Licensing Examination parts I, II, and III were compared between groups with a two-tailed Student t test. Operating room supervision, simulation training, and didactic sessions were conducted by board-certified or board-eligible staff anesthesiologists.
Simulation-based assessment of anesthesia provider skill is an emerging method of characterizing performance in managing critical events.9–11Previous work has led to the development of checklist, time-based, and participant self-assessment methods of measuring performance. Using these techniques, participants cared for simulated patients using an adult and a pediatric human simulator (HPS version 5.55; METI, Sarasota, FL) to establish a baseline skill level. Scenario topics and performance expectations are presented in table 1.
A physiologic monitor (Datex AS/3; Helsinki, Finland) displayed the electrocardiogram, pulse oximeter (with tone), and capnogram waveforms and digital values for heart rate, blood pressure, oxygen saturation, end-tidal carbon dioxide, and fraction of inspired oxygen. All standard alarms were set to default limits. Mechanical ventilation was provided by an anesthesia machine (Narcomed 2B; North American Dräger, Telford, PA).
A preanesthetic evaluation, anesthetic record, and scenario introduction describing the recent course of events were prepared for six scenarios: three adult and three pediatric scenarios. The second adult scenario consisted of two parts, each treated as a unique scenario, for a total of seven adverse events. Participants reviewed the preanesthesia evaluation and scenario introduction in a quiet room before entering the simulation laboratory. Once in the simulation laboratory, participants were encouraged to think aloud. Video and audio information were recorded for all simulations. A video image of the physiologic monitor was inset into a video image of the participant caring for the simulated patient. A timer with a resolution to 1 s was superimposed on the video image. The order of scenario presentation was randomized for all participants at two levels: Participants were randomized first as to which set of scenarios they would receive first, pediatric or adult, and second as to the order of the three scenarios in each set. Each scenario lasted 7 min from the start of the adverse event. Testing of all participants was conducted over a 3-day period at the beginning of the academic year and again 12 months later. Two hours were allotted to each participant to complete all scenarios.
Anesthesiology faculty at the authors' institution developed by consensus a list of appropriate diagnostic and therapeutic tasks for each scenario. A series of three pilot studies were conducted with volunteer residents not assigned to a study group. A case report form was created that recorded the start of each adverse event, the time when the appropriate diagnosis was made, and the task lists for each scenario.
Validation of the task list was accomplished by distributing the list among six experts consisting of three board-certified anesthesiologists from the authors' institution (different from those involved in developing the tasks lists) and three from outside institutions. A modified Delphi technique was used to gain a consensus among experts12,13and develop a weighted task score list for each scenario ( appendix 1).
During each scenario, two investigators, blinded to participant group assignment, watched the video image in an adjacent room. Based on previous work, two observers were considered adequate to properly capture data of this type and achieve adequate interrater reliability.12,14They independently checked off tasks from the task list as they were performed and recorded whether participants identified the correct diagnosis. Interrater reliability values for the number of tasks completed and the number of participants making a correct diagnosis were assessed with a κ measure of agreement. Weighted task scores were defined as the sum of all correctly performed items on the weighted task score list. The number of correct diagnoses was defined as the number of correct diagnoses each participant made in the seven scenarios.
After each scenario, participants completed a comprehension questionnaire ( appendix 2). The number of correct responses to comprehension questions was defined as the number of correct responses to questions presented in appendix 2for each participant. After each scenario, participants also completed a self-assessment of their perceived physical and cognitive workload using the National Aeronautic Space Administration Task Load Index (NASA-TLX) questionnaire.15The NASA-TLX evaluated six areas: mental demand, physical demand, temporal demand, self-assessment of performance, self-assessment of effort, and self-assessment of frustration. From these six areas, a composite score was derived for each scenario.
Training interventions were of identical duration for both study groups; however, the teaching method differed between groups. The control group received standard didactic and simulation training. The experimental group received PTT-based didactic and VPT-based simulation-based sessions.
All participants in both groups received forty 45-min didactic sessions. The didactic sessions consisted of grand rounds, case conferences, textbook chapter reviews,16and visiting professor lectures. For 15 of the 40 didactic sessions, the control group received instruction on airway management, management of a difficult airway, cardiopulmonary physiology, and administration of anesthesia to patients with cardiopulmonary disease, and the experimental group received 15 PTT sessions focused on information from four competence areas: the American Society of Anesthesiologists Difficult Airway Algorithm,17a differential diagnosis for hypoxia, treatment options for each item in the differential diagnosis, and knowledge of normal ranges of cardiopulmonary variables and key relations between selected variables. Study sheets§§were created for each of these areas and were distributed to all participants in both groups at the beginning of the 12-month intervention period.16,18,19
As a study aid, computerized flash cards were developed to review study sheet content (Java; Sun Microsystems Inc., Santa Clara, CA). A personal computer was used to automate question presentation and to record response time and accuracy. The flash cards were time limited, randomly reintroduced incorrectly answered questions, presented the correct response when an incorrect answer was entered, and recorded and presented the number of correctly answered questions.
Three types of flash cards were developed. The first type, “fill in the blank,” introduced information by having participants look up answers to questions and then fill in blanks. After the first three PTT sessions, participants were asked to fill in the blank from memory. The second type, “qualitative assessment,” was designed to solicit a qualitative evaluation of values from the study sheets (e.g. , a qualitative responses of high, low, or normal for selected vital signs). The intent was to accelerate interpretation of physiologic monitor and ventilator values. The third type, “patient management,” was designed to put into practice information contained from the study sheets. Participants were presented with cardiopulmonary data and were asked to comment on their state (high, low, or normal) and identify the most likely diagnosis and treatment consistent with the variable profile.
During each session, 25 min was allocated to flash card use. Participants were not allowed to use the electronic flash cards outside of the 15 didactic sessions dedicated to PTT. The remaining time was dedicated to small group discussions where participants were asked to recall information from the study sheets in the presence of their peers and discussion proctor. The goal of PTT was to achieve a level of mastery of the study sheet material such that it would be easily recalled during stressful moments.
All participants in both groups received five 90-min simulation sessions covering topics on the difficult airway, hypoxia, hypertension, tachycardia, bradycardia, and hypovolemia. Scenarios were preprogrammed into the simulator computer. Teaching objectives were standardized for each session. Participants were divided into smaller subgroups of five or six to facilitate simulation training. Each participant managed at least one adverse cardiopulmonary event per 90-min session while the remaining participants observed. After each scenario, a short debriefing was conducted to review participant performance. In the control group, instruction focused on the teaching objectives; in the experimental group, instruction focused on the teaching objectives using VPT.
During each adverse cardiopulmonary event, VPT consisted of participants reviewing four areas of patient data: relevant segments of the patient history, a targeted physical examination (airway, breathing, and circulation), physiologic data from the monitors, and mechanical ventilation data. Participants were trained to quickly go through all the items contained within the VPT checklist (table 2). At the same time, participants were asked to synthesize and order a differential diagnosis during data collection and prioritize their therapeutic interventions according to the differential diagnosis. The main purpose of this technique was to ensure that information from all four areas of patient data was reviewed, to allocate additional attention as needed to accurately describe abnormal findings, and to reduce the likelihood of cognitive tunnel vision, all while managing an adverse event.
For the first two simulation sessions, participants were allowed to look at table 2. During the last three simulation sessions, they were asked to use table 2from memory. After each scenario, participants who observed their peer managing an adverse event provided an item-by-item critique of their peer's performance using table 2. The goal was to ensure that all available data were considered in a timely and consistent manner and to ensure that participants flexibly distributed attention to abnormal findings.
After the 12-month training period, participants underwent an assessment of their skill in managing seven adverse events as described for the baseline assessment. The vignettes used to introduce the participant to each scenario were altered. The adverse events remained the same as those used in the baseline analysis.
Four metrics were compared for the effect of teaching method: weighted task scores, the number of correct diagnoses, the number of correct responses to comprehension questions, and the NASA-TLX scores using statistical software (Statview, version 5.1; SAS Institute, Cary, NC). Weighted task scores were compared with a repeated-measures multivariate analysis of variance (MANOVA). Response variables were the weighted task scores for each scenario before and after 1 yr of training. Interactions between teaching method, results before and after 1 yr of training, and the seven scenarios were explored. MANOVA statistical tests were performed with the Roy largest root criteria transformed to an F statistic; statistical significance was declared for α < 0.05. If MANOVA tests were significant, a post hoc analysis between teaching methods by individual scenario was performed using a Bonferroni/Dunn test.
The numbers of correct diagnoses for all scenarios were compared between teaching methods with a Kruskal– Wallis test. If significant, paired comparisons within each group before and after training were made with a Wilcoxon signed rank test, and unpaired comparisons between groups before and after training were made with a Mann–Whitney U test. To account for multiple comparisons, a Bonferroni-corrected P value less than 0.01 was required to declare significance at a nominal α of 0.05. A similar approach was taken with the number of correct responses to comprehension questions. Composite NASA-TLX scores for all scenarios were compared between groups with repeated-measures MANOVA.
Results
Of the 22 participants enrolled, 1 did not complete the study. That participant pursued training in a different specialty. Eleven (3 female, 8 male) and 10 (3 female, 7 male) participants were randomly assigned to the control and experimental groups. No difference in US Medical Licensing Examination parts I, II, and III and preliminary year in-training examination results were observed between groups (P values of 0.712, 0.297, 0.609, and 0.182, respectively). All participants completed the baseline assessment, the simulation sessions, the didactic sessions, and the postintervention assessment.
Using κ as measure of agreement between two observers, there was substantial interobserver agreement (κ between 0.6 and 0.79)20; interrater reliability for the task list scoring and correct diagnoses were 0.70 and 0.82 at baseline and 0.72 and 0.86 after 12 months of training.
Weighted task scores for each scenario before and after training are presented in table 3. The repeated-measures MANOVA revealed that both groups demonstrated a significant increase in their weighted task scores after 12 months of training (P < 0.0001); there was a significant difference between groups after 12 months of training (P = 0.014; fig. 1A). A post hoc analysis of the after training weighted task scores by scenario indicated significant difference between groups in scenario 4, aspiration pneumonia (P = 0.005; fig. 1B).
The numbers of correct diagnoses for each scenario before and after training are presented in table 3. Participants improved their number of correct diagnoses from baseline to 12 months after training (P < 0.001). After training, there was no difference between groups in determining the correct diagnosis (median count of 5 out of 7 and 4 out of 7 for the experimental and control groups, respectively; P = 0.249; fig. 2A).
Participants improved their number of correct responses to comprehension questions from baseline to 12 months after training (P < 0.0001; fig. 2B). Both groups answered approximately 15 of the questions correctly at baseline. After training, the experimental group completed more questions correctly (median count of 23 out of 30) than participants in the control group (median count of 19 out of 30; P < 0.001).
Participants reported a decrease in their composite NASA-TLX scores of perceived workload between baseline and posttraining (P < 0.001); however, there was no difference between groups (P = 0.259; fig. 3).
Discussion
We explored how PTT and VPT techniques applied over a 12-month period would improve CA-1 management of adverse airway and respiratory events. The results in part confirmed our hypotheses. Participants with PTT and VPT outperformed those in the control group in two of three performance metrics. In the remaining one and the assessment of perceived workload, there was no difference between groups.
After a year of training, we observed an improvement in all performance metrics in both groups. Whether this is a function of the clinical training, didactic training, or simulation-based training is difficult to discern. This finding also suggests that the measurement methods used are a valid construct of assessing resident performance in managing adverse airway and respiratory events.
Participants in the experimental group were able to complete on average 9% more tasks than those in the control group. Those trained with PTT and VPT may have allocated their attention more effectively and were mentally prepared to assess and prioritize the information available to them in an efficient manner. This may be a function of an efficient yet complete survey of all available information and use of easily retrieved mental templates to organize and use information to implement therapeutic interventions. Another potential explanation for this finding is that the PTT and VPT were directly applicable to managing the seven simulated adverse events and may be a function of the focused training time rather than the training techniques.
A large improvement in the number of correct diagnoses (fig. 2) was observed from baseline (35–39% correct) to the end of training (61–73%) independent of group assignment. A subtle trend suggested that PTT and VPT may improve diagnostic performance, but the trend was not significant. Interpretation of this result is difficult because in several instances the correct diagnosis was mentioned amid verbalization of a list of potential diagnoses. Furthermore, participants may have known the correct diagnosis but neglected the “talk aloud” protocol. One nuance of this analysis is the possibility that VPT may have slowed participants in the experimental group in reaching a diagnosis within 7 min. Although we did not ask participants in the experimental group whether they used VPT to reach their diagnosis, it is conceivable that going through the items in table 2would prolong the time required to reach a diagnosis.
Although there is overlap, clinical task completion and diagnostic accuracy differentiate two important aspects of adverse event management. Clinical task completion reveals what participants do while managing an adverse event under stressful conditions but does not explicitly reveal their diagnostic assumptions. By contrast, diagnostic accuracy reveals the participant's verbal acknowledgment of the diagnosis, whereas it does not describe what the participant does with that information. Our results indicate that PTT and VPT improved task performance under stressful conditions but did not improve diagnostic accuracy.
After 12 months of training, perceived workload decreased by 30% from baseline. This is an expected finding given that participants, with a year of clinical experience, are more likely to have a better subjective belief about their performance than after 1 week of residency. PTT and VPT made no impact on perceived workload. It is interesting to note that additional work generated by participants using cognitive aids developed through PTT and VPT did not contribute to increasing the perceived workload.
Given that participants in the experimental group improved their number of correct responses to comprehension questions from baseline more than the control group suggests that PTT and VPT allowed participants to collect and retain more pertinent information in regard to the adverse event. This may be a direct result of participants in the experimental group completing more diagnostic and therapeutic tasks. In so doing, they gained a better understanding of the adverse events and were able to answer more questions correctly.
In terms of study limitations, general concerns with simulation-based evaluations include the difficulty of creating scenarios that mimic real patients,21controlling for atypical participant behavior (i.e. , a hypervigilant or cavalier participant),22and controlling for scenario recognition from the beginning of the year to the end of the year. In this study, the scenario vignettes were changed, but the adverse events were identical.
An additional confounder to this study is that conventional training may already contain many elements of PTT and VPT, diminishing the usefulness of these techniques. Although we found significant improvements with PTT and VPT, the long-term benefit beyond 1 yr is unknown. Furthermore, the 10 PTT and 5 VPT sessions were spread out over 12 months and led to long time periods between training sessions. Shortening the time between training sessions may have improved use of these techniques during the end-of-year testing phase of this study.
When interpreting results presented in figure 1, participants who performed lifesaving maneuvers without completing all of the tasks for a given scenario listed in table 1may have been inadvertently penalized. For example, should a participant have immediately recognized an esophageal intubation and immediately intubated the trachea without recognizing low saturation, absence of carbon dioxide, or failing to auscultate the lungs, the participant would have successfully managed the adverse event but would have been penalized for not completing several tasks. This limitation underscores the importance of using the Delphi method to weight absolutely essential tasks higher than other useful, but not essential, tasks. In addition, this limitation emphasizes the need to use other assessment methods to mitigate this potential limitation.
In summary, we implemented PTT and VPT techniques aimed at improving management of adverse airway and respiratory events as part of CA-1 training. The goal of these training techniques was to develop a flexible cognitive style that would increase vigilance and produce a high level of automaticity during critical events. After a year of didactic and simulation-based training, resident performance significantly improved in both groups. PTT and VPT led to modest improvements in performance when compared with conventional training. Further work is warranted to explore the potential value of using innovative teaching techniques such as PTT and VPT to better prepare anesthesia personnel for their role in managing adverse events.
References
Appendix 1: Modified Delphi Analysis to Develop a Weighted Task Score List for Each Scenario
Anesthesiology faculty at the authors' institution created a list of appropriate diagnostic and therapeutic tasks for each of the seven scenarios. Three board-certified anesthesiologists from the authors' institution different from those who created the lists and three from outside institutions were independently and iteratively asked to rank tasks according to importance using a Likert scale, where 1 = not important and 5 = extremely important. Median and interquartile ranges for each task and suggested additions and eliminations to the task list were redistributed to the experts. Experts were then asked to consider modifying their rankings that deviated from the median or justify their reasons for not changing scores. This process was repeated until an acceptable level of concordance between experts (overall concordance >0.75 using a Kendall W statistic) was observed. The content validity index of the task list was calculated as the percentage of tasks rated by the experts as 4 or 5. The task lists were weighted from 1 to 5 based on median rankings from the final round.
Acceptable concordance among experts was reached within three rounds. The Kendall coefficients of concordance (Kendall W) for rounds 1, 2, and 3 were 0.312, 0.532, and 0.752. The final scoring system consisted of 110 tasks weighted for importance with a total score of 520.5. The content validity of the final weighted task list as a percentage of items ranked greater than 3 was very high (95%) (table 4).
Appendix 2: Comprehension Questions for Each Scenario
Correct answers are presented in bold .
Adult Scenario 1: Unanticipated Difficult Airway
The room air oxygen hemoglobin saturation was (low , normal, high).
The arterial partial pressure of oxygen was (low , normal, high).
The relation between the room air oxygen saturation and arterial partial pressure of oxygen is (normal, suggestive of an intrapulmonary shunt , suggestive of an enlarged function residual capacity).
Structures or conditions that impaired successful intubation of this patient include all of the following: (tongue , posterior pharynx , laryngospasm, bronchospasm, nasopharyngeal airway).
Transtracheal jet ventilation would have been a logical next step in airway management (true, false ).
Adult Scenario 2: Bronchospasm on Emergence and Development of a Tension Pneumothorax
After extubation, airway resistance was (low, normal, high ).
After reintubation, the patient developed (endobronchial intubation, fat embolism, negative-pressure pulmonary edema, an increase in the vital capacity, none of these ).
After reintubation, the peak airway pressure was (low, normal, high ) and the measured tidal volume was (low , normal, high).
Before extubation, the depth of the endotracheal tube was (deep, normal , shallow).
Auscultation of lung sounds after reintubation revealed (course breath sounds, wheezing , absent breath sounds, normal breath sounds).
Adult Scenario 3: Aspiration after Induction
After intubation, the delivered tidal volume was (low , normal, high).
The breath sounds were (normal, coarse , wheezing, absent).
The pulmonary compliance was (low , normal, high).
The airway resistance was (low, normal , high).
After aspiration, bronchoalveolar lavage is warranted (true, false ).
Pediatric Scenario 1: Esophageal Intubation
The peak airway pressure after intubation by the anesthesia resident you were supervising was (low, normal , high).
The endotracheal tube cuff was intact (true , false).
The end-tidal carbon dioxide after intubation by the anesthesia resident you were supervising was (low , normal, high).
The delivered tidal volume after intubation by the anesthesia resident you were supervising was (low , normal, high).
The most likely source of hypoxia was (inadequate ventilation , bronchospasm, laryngospasm, pulmonary embolism, carboxyhemoglobin).
Pediatric Scenario 2: Laryngospasm, Bronchospasm, No Intravenous Access
After intubation, the peak airway pressure was (low, normal, high ).
Bronchospasm is an example of (dead space, intrapulmonary shunt , chest wall rigidity, aspiration of foreign body).
After intubation, the delivered tidal volume was (low , normal, high).
The inspiratory flow rate was set at (low, medium , high).
The patient became hypoxic because of a ventilator malfunction (true, false ).
Pediatric Scenario 3: Faulty Ventilator Circuit
The peak airway pressure was (low , normal, high).
The delivered tidal volume was (low , normal, high).
The inspiratory flow rate was set at (low, medium , high).
The endotracheal tube cuff was intact (true , false).
The wall oxygen source was low (true, false ).