Abstract
Although feedback conversations are an essential component of learning, three challenges make them difficult: the fear that direct task feedback will harm the relationship with the learner, overcoming faculty cognitive biases that interfere with their eliciting the frames that drive trainees’ performances, and time pressure. Decades of research on developmental conversations suggest solutions to these challenges: hold generous inferences about learners, subject one’s own thinking to test by making it public, and inquire directly about learners’ cognitive frames.
The authors conducted a randomized, controlled trial to determine whether a 1-h educational intervention for anesthesia faculty improved feedback quality in a simulated case. The primary outcome was an analysis of the feedback conversation between faculty and a simulated resident (actor) by using averages of six elements of a Behaviorally Anchored Rating Scale and an objective structured assessment of feedback. Seventy-one Harvard faculty anesthesiologists from five academic hospitals participated.
The intervention group scored higher when averaging all ratings. Scores for individual elements showed that the intervention group performed better in maintaining a psychologically safe environment (4.3 ± 1.21 vs. 3.8 ± 1.16; P = 0.001), identifying and exploring performance gaps (4.1 ± 1.38 vs. 3.7 ± 1.34; P = 0.048), and they more frequently emphasized the professionalism error of failing to call for help over the clinical topic of anaphylaxis (66 vs. 41%; P = 0.008).
Quality of faculty feedback to a simulated resident was improved in the interventional group in a number of areas after a 1-h educational intervention, and this short intervention allowed a group of faculty to overcome enough discomfort in addressing a professionalism lapse to discuss it directly.
Feedback conversations are a critical part of effective teaching
Overcoming concerns about the faculty–resident relationship and understanding the learner’s frame of reference is challenging
The investigators conducted a randomized trial evaluating the effect of a 1-h simulation-based training session on feedback quality
Training improved faculty ability to maintain a psychologically safe environment, explore the resident’s frame of reference, and address professionalism along with technical issues
DIRECT and timely formative feedback to improve a learner’s ongoing practice is one of the strongest predictors of improved performance in learning.1–7 Such feedback conversations are difficult for faculty because of three challenges: (1) faculty worry that honest task-relevant critique will harm their relationship with the learner,8–11 (2) overcoming cognitive biases impeding faculty’s ability to diagnose the cognitive frames driving trainees’ performances10,12–15 and tailoring the feedback conversation appropriately,16 and (3) working with time pressure that makes the first two challenges even more acute.
Faculty typically resolve the tension they experience between the “task versus relationship” dilemma in feedback conversations by emphasizing one over the other. Although skillful direct task feedback can be effective,12,17,18 harsh task feedback can: (1) degrade performance;17–19 (2) prevent reflection,13 absorption, and retention of knowledge;20–23 or (3) harm the pair’s capacity to talk about difficult topics in the future.24,25 Alternatively, feedback that emphasizes relationship preservation usually employs kindly leading questions and strategic silence to camouflage the instructor’s critique and guide the learner to the instructor’s hidden answer.8,24–26 Although benign in intent, this approach is time consuming, conveys the meta-message that errors are not discussible, and obscures important task-related feedback.24,26
Diagnosing learners’ actual learning needs by uncovering the cognitive frames driving their actions poses a second challenge. Changing anyone’s knowledge, skills, or attitudes requires learning by the teacher and the learner both at the level of visible actions and invisible cognitive processes.27–33 Faculty’s ability to diagnose learner thought processes can be impaired by cognitive biases, such as mistaking their internal conclusions for reality,34 erroneously estimating frequency of behaviors, or attribution error.35–38 These instructional mistakes are not the result of inadequate clinical expertise; rather, they are driven by ubiquitous cognitive biases propelling instructors to assume (often erroneously) that they know the reasons behind other people’s actions.16,39
Given the above conditions, specific, corrective feedback is exceedingly rare, measured as less than 3% of all utterances in some teaching encounters.40 Feedback on the delicate topic of professionalism is even rarer.41
We know little about how anesthesia faculty manage these feedback challenges, typical patterns of how anesthesia faculty provide feedback, or the impact of efforts to enhance their skills. A recent review of faculty development interventions found no assessment of faculty’s teaching or feedback skills in the acute care setting.42 A handful of studies outside the field of anesthesia examined the impact of interventions to improve faculty and resident feedback skills in the context of precepting. They reported only modest positive impact.40,41,43,44
To ascertain how anesthesia faculty handle these feedback challenges with and without training, and to characterize anesthesia faculty’s feedback patterns, we conducted a randomized, controlled trial to test the impact of an educational intervention as measured by rating feedback skills in a simulated case. The intervention addressed (1) how to manage the perceived task versus relationship dilemma by holding generous inferences about a learner; (2) how to provide direct, specific feedback pairing a statement of observable action (“advocacy”) from the instructor’s point of view, with an open-ended question (“inquiry”) designed to determine the learners’ thought processes;26,29,30,45 (3) how to address a specific lapse in professionalism by using a combination of these techniques.
Materials and Methods
With approval of the Institutional Review Board (Boston, MA), a balanced randomization (1:1), rater-blinded, parallel-group–controlled experiment was conducted during a recurring, mandatory, simulation-based crisis management course for practicing anesthesiologists from five academic hospitals in greater Boston, Massachusetts (Children’s Hospital of Boston, Brigham and Women’s Hospital, Massachusetts General Hospital, Beth Israel Deaconess Medical Center, and Newton-Wellesley Hospital). To detect an improvement in feedback of 2 of 3 of a unit of the rating scale with a one-sided 5% significance level and a power of 80%, a sample size of at least 30 subjects per group was required. It was anticipated that 10 subjects’ cases would be needed for rater training and there would be other sources of attrition of subjects. Thus, the experiment was conducted during 71 regularly scheduled (approximately weekly) 6-h simulation-based teamwork and communication courses, each with four simulation scenarios from March 2008 to February 2010. In advance of the first course session, each course day was designated as an intervention day or a control day from a computer-generated random number table. Each course hosted between three and eight anesthesiologists. No more than two individuals from any of the five hospital departments were present during a given course. At the beginning of the course day, a single course participant was randomly selected to be the experimental subject by arbitrarily assigning an integer number to each participant and rolling one or two dice. Figure 1 shows a flow diagram of the experiment.46
Each course day started with a 1-h standardized introduction to the course and a simulation scenario and debriefing unrelated to this experiment. After the unrelated scenario and debriefing, participants in the intervention group were exposed to the intervention itself, a 1-h didactic, video assessment, and role-play workshop on how to resolve the perceived task versus relationship dilemma, how to diagnose trainees’ learning needs, and how to address different kinds of errors including professionalism lapses (appendix 1). The videos in this case were three prepared examples of a simulated faculty member engaged in a feedback conversation with a simulated resident after the resident committed a breach in sterility. Each video depicted different feedback skills and demonstrated the advantages of a “frame-based feedback” approach. Then the experimental case scenario was conducted. On control days, the 1-h workshop on principles of giving effective feedback was conducted after the experimental scenario was completed.
The experimental scenario was designed with two parts. The first part allowed the subject to observe a resident commit four errors while managing a simulated patient. In the second part, the subject conducted a feedback conversation with the resident about his performance. For each error, the resident had a series of standardized unspoken reasons driving his actions (known as frames). This frame remained hidden unless the subject made appropriate inquiries during the feedback portion of the case scenario. Table 1 lists the four errors and their associated frames.
Scenario
All participants were given a written clinical stem of the case before its starting. They were told that they would watch an anesthesia resident at the “end of his first year” (CA1, or PGY-2) give an anesthetic from an observation window outside the operating room. Then one of them would be asked to give feedback to the resident on his conduct of the case. All participants, including the subject, were then brought to the observation window where they watched an operative procedure starting.
The patient was a 25-yr-old woman with a medical history significant only for hypertrophic obstructive cardiomyopathy who was undergoing an open appendectomy. There were four clinical actors in the scenario: a surgeon, scrub technician, circulating nurse, and an anesthesia resident (“resident”). The patient had already been induced under anesthesia and intubated, was stable with a heart rate (HR) of 60, blood pressure of 115/75 mmHg, and Spo2 of 99%. During the surgeon’s surgical prep, the patient’s HR rose to 70 beats/min and the blood pressure fell to a systolic blood pressure of approximately 95 mmHg. The surgeon noticed this and asked whether the patient was okay, to which the anesthesia resident responded that he would need to increase the anesthetic agent to blunt the rising HR. The anesthesia resident increased the sevoflurane-inhaled concentration from 1 to 4% (with high fresh gas flows), a relative overdose. The HR returned to 65 beats/min and systolic blood pressure to 100 mmHg, and the surgeon inquired whether the antibiotic had been given. This distracted the anesthesia resident from decreasing the sevoflurane, which led to commission of Error 1, leaving the sevoflurane at 4%.
The surgeon requested a 2-min warning to allow the antibiotic to circulate, which led to the resident giving the entire dose as an intravenous (IV) bolus. The HR rose to 70 beats/min and the systolic blood pressure fell to less than 100 mmHg, but, unlike the previous episode, the Spo2 decreased to 90%, the peak inspiratory pressure increased to 35 cm H2O, and the ETco2 waveform developed a sloped early expiratory phase, indicative of respiratory obstruction (from anaphylaxis). The resident then gave esmolol (20 mg IV bolus) in response to the increased HR (commission of Error 2). The circulating nurse, noticing the deteriorating vital signs, asked whether the resident would like help, which the resident declined. After further deterioration of the vital signs (blood pressure = 80/60 mmHg, HR 90 beats/min, Spo2 = 85%, peak inspiratory pressure = 40 cm H2O, ETco2 = 28 mmHg with a sloped early expiration phase), the circulating nurse again asked whether she should call the attending anesthesiologist; the resident again declined help (commission of Error 3).
The surgeon then inquired about a rash that had just developed (“Does this patient have a rash?”), and the resident replied that the patient’s skin was blotchy at the start (“I don’t know, her skin was kind-of blotchy to begin with….”).
At this point the simulation was paused. One of the investigators joined the participants at the now-closed observation window to conduct and guide a brief (3–5 min) discussion of the case to this point, to ensure a differential diagnosis was generated with anaphylaxis at the top of the list. Then one of the other participants (not the study subject) was asked to go into the operating room to help the resident, and the simulation was resumed.
The case continued with compromised vital signs remaining, and was then under the direction of the course participant. The resident was cooperative and competent. If asked to give phenylephrine, the resident would reply, “Ok, but I have to make up a drip [infusion]. My attending doesn’t like me to make up phenylephrine ahead of time as he says it is a waste of money,” (commission of Error 4). If the participant did not ask for phenylephrine, the resident asked whether phenylephrine would be useful in this case, and then made the statement above regarding a previous attending’s preference. The resident treated the patient in whatever manner directed by the course participant. If the initial treatment did not include epinephrine, the resident would suggest treating the anaphylaxis with a small dose of epinephrine in spite of the relative contraindication in a patient with hypertrophic obstructive cardiomyopathy. Once treated with epinephrine in any dose (all course participants gave epinephrine within a 5-min period), the patient recovered with near-normal vital signs and the simulation was ended.
Upon completion of the simulation case, the course participants returned to a debriefing room where a trained facilitator (D.B.R. or J.W.R.) conducted a discussion of four types of errors as described by Bosk.47 Furthermore, four of the errors made by the observed resident were listed on a whiteboard and put into one of the four categories.
These errors were briefly discussed among participants and in all cases consensus was reached that the Normative Error of refusing to call for help was the most troubling.
The experimental subject was asked to move to “the coffee room” where the resident was taking a short break between cases, to give the resident feedback on his performance. The subject was suitably oriented to this exercise. Throughout the feedback session, the resident responded to the subject in a semiscripted manner according to a predetermined set of rules (appendix 2). The resident would try to keep the conversation as normal as possible, responding to direct questions only, answering to the ability of a competent resident nearing the completion of his CA1 year, and would be slightly evasive or defensive if asked vague questions about his performance. When offered an observation and asked relevant questions, the resident would reveal the first part of his mental frame, and would follow with more revealing of his mental frame if the subject pursued the topic further. If the subject lectured or taught about something that was not directly relevant to the resident’s frame, the resident would listen and respond neutrally only with a nod or say “OK.” If the subject seemed to be finished with the feedback session by explicitly saying so, by repeating the same material, or by summarizing the session, the resident would say that his pager had gone off and he needed to go do the next case.
After completing the feedback session, the subject returned to the debriefing room where an extensive discussion about the case and feedback was facilitated by one of the investigators (D.B.R. or J.W.R.).
The feedback session in the coffee room was video recorded for later analysis.
Feedback Performance Rating
The investigators developed a two-part rating instrument to assess the performance of the subjects in giving feedback. The first part was a behaviorally anchored rating scale (BARS) with six elements (fig. 2). This part was modified from another BARS tool that has been used for simulation debriefing, a specific form of feedback conversation, and has been partially validated.48 The average rating of the elements was intended to be a measure of feedback performance and was the primary outcome variable for this study. The second part of the instrument was an objective scoring of 12 aspects of the feedback session designed to capture patterns of feedback (fig. 3). This instrument was developed to mirror some of the generic feedback skills gleaned from the literature and expected to be discussed during the educational intervention.49
Feedback assessment instrument using a six-element behaviorally anchored rating scale.
Feedback assessment instrument using a six-element behaviorally anchored rating scale.
Feedback assessment instrument using objective observation of 12 aspects of the feedback conversation. F/U = follow up; HOCM = hypertrophic obstructive cardiomyopathy.
Feedback assessment instrument using objective observation of 12 aspects of the feedback conversation. F/U = follow up; HOCM = hypertrophic obstructive cardiomyopathy.
Four blinded raters experienced in simulation, debriefing, and precepting residents or nurses, from different specialties (nursing education, endocrine surgery, pediatric anesthesiology, and intensive care medicine), were selected and trained to use the two-part instrument. Approximately 12 h of rater training was conducted using the first 10 videos, and those videos were then removed from the pool of study videos.50 Adequate calibration of raters was confirmed upon completion of training, when all the raters were within one point of each other on the three test videos. See appendix 3 for a full description of the rater training process.
Of the remaining 61 videos, three were excluded from performance rating because in two cases the video recording was found to be unusable and in one case the resident actor was absent and a substitute was used. Thus, video files from 30 control and 28 intervention cases, all using the same resident actor, were randomly distributed to the four raters such that each rater had a balanced number of cases from each group and two raters would rate each video.
Statistical Analysis
The institution at which the participants practiced anesthesia was compared between the intervention and control groups using a chi-square test. In addition, the years since completing medical school and anesthesia residency were computed for the participants in the intervention and control groups. A Mann–Whitney U test was used to test whether the experience level of the groups was different.
All ratings were entered into a spreadsheet for analysis. The consistency of feedback performance rating for all pairs of scores was analyzed by first averaging and rounding and then computing the frequency of scores within one point. Cohen’s κ-statistic with quadratic rating was used on these values as another measure of interrater reliability (Vassar-Stats*). Videos with mean absolute differences in scoring greater than 1.5 points were independently rerated by one of the investigators blinded to the experimental conditions. A tertium quid model of score resolution was used where the two closest scores were included in further analyses.50
All the ordinal performance ratings for the intervention and control group were computed and reported as mean ± 1SD and compared using a Mann–Whitney U test (Vassar-Stats*), with P value less than 0.05 considered significant. Performance ratings for each of the six elements for the intervention and control group were similarly compared.
The raters scored the 21 objective measures for each feedback performance video and the results were transferred to a spreadsheet. The frequencies of the dichotomous scores of each objective measure were computed for the control, intervention, and combined groupings. Where appropriate, a chi-square statistic was used to compare intervention to control groups, with P value less than 0.05 considered significant (Vassar-Stats*).
Results
Table 2 shows the demographic distribution of the subjects. There were no differences in institution or experience between the intervention and control group course participants.
Interrater Agreement
Two raters assessed each case except for one case that was assessed by all four raters due to an error in distributing the videos. Including the case assessed by the four raters, 66% of all scores and 81% of averaged element scores were within one point. Excluding the case assessed by the four raters, moderate agreement was seen (κ = 0.53). Sixteen cases revealed a between-rater average difference of 1.5 points or greater and these cases were rated by a third rater independently, who was one of the investigators who did not participate in the conduct of the experiment. Subsequent to score resolution, 88% of all score pairs were within one point and substantial agreement was achieved (κ = 0.72).
Quality of Feedback Measured by BARS
Averaging ratings across all six elements of the feedback rating scale, the intervention group scored higher (4.2 ± 1.28) than the control group (3.8 ± 1.22; P < 0.0001). Scores for individual elements are shown in table 3.
General Patterns of Feedback Assessed by Objective Structured Instrument
Twenty-eight percent of subjects in both groups executed an “entry phase” to the conversation by giving the resident an opportunity to state his or her reactions to the scenario or how the patient fared. Subjects in both the intervention and control groups explicitly stated they were about to give feedback in 49% of the cases and explicitly asked permission to do so in 20%. We analyzed the frequency of faculty members “previewing,” or verbally signaling that a feedback conversation was about to occur (an example would be, “I’d like to talk with you about your performance just now and give some feedback on A, B, and C.”). Subjects in the intervention group (18%) were more likely to use a preview statement to outline the upcoming feedback than the control group (10%; P = 0.05). Subjects in the intervention group more commonly used advocacy/inquiry language (24 vs. 9%, respectively; P = 0.04) and less commonly used “guess what I am thinking” questioning (13 vs. 34%, respectively; P = 0.01) than their peers in the control group as a primary style of giving feedback.
Intervention and control group subjects did not differ in their balance of talking and listening, with 66% being rated as talking more than listening, 32% balancing talking and listening, and 2% predominantly listening.
Subjects were rated as seeming activated (79%), neutral (16%), deactivated (4%), and displaying a pleasant (72%), neutral (22%), and unpleasant (6%) attitude and were not different between groups. Raters scored their perception of how a trainee might feel with the given feedback as activated (68%), neutral (21%), and deactivated (11%); and pleasant (59%), neutral (21%), and unpleasant (20%) regardless of the subject grouping.
Thirty-five percent of subjects in both groups never checked the resident’s understanding of any feedback provided by the faculty. Closure, a conversation that consists of some form of summary or planning of next steps, was seen in 48% of the cases of feedback conversations in both the control and intervention groups.
The control and intervention groups differed in the emphasis they placed on different topics for feedback. Looking at how faculty distributed their time in the conversation, we found that the intervention group showed a greater frequency of emphasis on the normative/professionalism error of failing to call for help or accepting help when offered over the anaphylaxis treatment algorithm than the control group did (66 vs. 41%; P = 0.008). Conversely, emphasizing the anaphylaxis treatment algorithm over the normative/professionalism error regarding help was seen more commonly in the control group than in the intervention group (39 vs. 21%, respectively; P = 0.04).
Subjects in the intervention and control groups did not differ in some of the other topics such as managing hypertrophic obstructive cardiomyopathy (66%), leaving the anesthetic agent at a high level in the face of hypotension (43%), other topics (25%), and not having a phenylephrine infusion prepared (14%).
Discussion
We were able to demonstrate a significant difference in the quality of feedback (as defined by the average rating elements of the BARS tool) that faculty gave to a simulated resident committing a series of scripted errors after a short educational intervention. The intervention group received higher scores than the control group. The intervention group also scored higher on creating and maintaining a psychologically safe environment and exploring the simulated resident’s frame, and more frequently addressed the difficult-to-discuss professionalism lapse of rejecting offers to call for help.
Baseline patterns of feedback conversations were also established for the groups. The raters judged 59% of the feedback conversations as potentially causing the resident to feel activated and pleasant, whereas 20% were judged as potentially causing him to feel “unpleasant.” Our control and intervention groups of academic faculty members did not differ from each other on other confounding factors, and may therefore represent the population of faculty needing to provide feedback. Review of the feedback literature yielded many opinion articles and commentaries, and various theoretical methods have been published regarding how best to deliver or facilitate feedback. Theories and strategies are emerging about creating psychological safety, uncovering learners’ frames to tailor teaching, and holding generous inferences about learners to foster curiosity in the instructor. On the basis of these theories, we chose to teach advocacy/inquiry45 over other models (e.g., Pendleton’s self-assessment or the feedback “sandwich”)49 because it is a technique that efficiently solves a number of feedback challenges. (1) It allows faculty to quickly share and test the validity of their observations and opinion (the advocacy). (2) It ascertains the cognitive frames driving the learner’s performance or performance gaps (the inquiry). (3) By encouraging faculty to hold generous inferences about learners (e.g., the learner is capable, wants to learn, wants to improve) and curiosity about the learner’s thinking, it helps them resolve the difficult tension they feel regarding attending to task versus relationship. The generous inferences and curiosity solidify the relationship whereas the advocacy plus inquiry allows for direct task feedback.
Our current understanding of what constitutes quality feedback continues to expand. Creating a psychologically safe environment for conversation is increasingly recognized as the sine qua non of learning in groups and dyads.51–54 Psychological safety is a person’s assessment that the situation is safe for interpersonal risk-taking such as exposing one’s reasoning, asking for help, and speaking up. Creating a psychologically safe environment is a key factor in resolving the perceived task versus relationship dilemma; faculty rightly worry that providing direct feedback will feel hurtful when learners do not feel psychologically safe. The fact that the intervention group rated higher in creating a psychologically safe environment and were able to provide direct feedback suggests that the intervention helped faculty resolve the tension they experienced regarding the task versus relationship dilemma such that they could address both simultaneously.
The intervention group’s skillfulness at giving direct feedback and its ability to explore the cognitive frame driving the resident’s performance gap were improved. By definition of our rating scale, the control group was in the “mostly ineffective” to “somewhat effective” range, whereas the intervention group was somewhere between “somewhat effective” and “mostly effective,” a substantively and statistically significant difference with regard to quality. This suggests that pairing an advocacy that reveals the instructor’s point of view with an open-ended inquiry that elicits the learner’s thinking may protect faculty from early closure on an erroneous hypothesis about what the resident needs, and allow faculty to discover the resident’s actual thinking process. Making one’s reasoning public and inquiring into others’ reasoning is a hallmark of “reflective practice,” a discipline developed to help professionals assess and improve subordinate, peer, and one’s own cognitive routines and emotional reactions.33,55–58 This finding indicates the positive potential for helping faculty to teach residents by using a simple conversational rubric.
The control group covered the topic of anaphylaxis treatment more extensively than the intervention group. However, the simulated resident in our scenario already understood the treatment algorithm for anaphylaxis; he did not have a gap in medical knowledge in this area, but rather he had difficulty in applying his knowledge to the actual clinical situation. With only limited time available for face-to-face teaching, the control group more frequently chose to spend this precious time on a comfortable, clinical topic of anaphylaxis treatment rather than spending needed time discussing the most important, but presumably more difficult-to-discuss lapse in the resident’s performance: not calling for or accepting help when indicated.
We helped intervention subjects address the more difficult topic of not calling for help by introducing Charles Bosk’s47 landmark study completed in 1976 of how surgery faculty used residents’ errors as occasions for learning and socialization. This study presaged the barriers faculty face in many contemporary residency programs to address lapses in professionalism.47 In Bosk’s study, faculty characterized technical and clinical judgment errors as “forgivable” whereas “normative” errors, or lapses in professionalism, were considered “unforgivable.” When such errors make it difficult or impossible for faculty to hold the resident in high regard (to believe the best of a resident), this makes the feedback conversation much more difficult.
In our intervention, we contrasted the idea of “unforgivable errors” with the idea of holding “generous inferences” about the resident, at least for the period of the conversation. By generous inferences we mean something like “the resident is intelligent, capable, trying to do the right thing,” or “innocent until proven guilty,” or “there is a 5% chance there is a good reason they did what they did.” These inferences free the faculty member to overcome two barriers to good feedback: (1) it makes it easier to resolve the perceived task versus relationship dilemma because they are not having to cover up a negative assessment of the learner’s character; and (2) it frees faculty to use their curiosity and diagnostic skills to understand the resident’s perspective and learning needs and thereby better close performance gaps.
Our BARS was based on the Debriefing Assessment for Simulation in Healthcare,48 and feedback is a form of debriefing conversation. Drawing from our understanding of these debriefing conversations in simulation, we evaluated specific techniques such as using a reactions phase (to allow the resident to “vent” his/her feelings, or as an icebreaker/conversation starter), previewing (signaling to the resident that a feedback conversation was to begin), and reaching closure (signaling the end of the conversation, summarizing). Reaching closure was only seen in half of the cases, and previewing or conducting a reactions phase were seen even more infrequently, which suggests that faculty may need additional time and effort to learn and practice these techniques.
Although the majority of feedback conversations were judged to make the resident feel activated and pleasant, one of five feedback conversations were judged to make the resident feel unpleasant. We do not know whether the first finding is due to emphasis on relationship at the expense of accurate task feedback or whether it is skillful feedback that was able to attend to both task and relationship. The percentage of interventions judged by raters as likely to elicit unpleasant reactions in the resident is quite large when considering the importance of this feedback session, and deserves more attention. Although we did not collect data on specifics of why raters felt the trainee would react negatively, this is an area of focus for the future, because it is likely that both language choice and nonverbal cues could have contributed.
There are a number of limitations of this study. First, the study was not designed to evaluate retention of feedback skills, and perhaps our findings for the intervention group could be explained by a “recency effect” whereby the recently provided instruction was easily accessed at the moment, without long-term effect. In addition, not all clinical conditions necessitating feedback were replicated, which could limit applicability of our findings. Second, as in any simulation, there is a risk that participants might not have taken the conditions seriously; however, in exit surveys done after each course, there was no indication that this was a problem for any participant. Furthermore, even if participants took the simulation seriously, they did know it was a simulation and this may have affected their performance in some way. Third, although the performance-rating system we used was based on a partially validated subjective scoring instrument used for debriefing, it has not been used previously for clinical feedback conversations. Thus, its psychometric properties for this purpose are not well known. Fourth, there are a variety of alternative feedback models in the published literature. Further studies of these models could be helpful to compare and gain insight into whether a “best fit” for constructing feedback exists. Finally, although the raters were blinded, the resident actor was not. Even though he was working from a script, there may have been subtle differences in the resident’s responses to the feedback provided and this could have biased the results.
Although review of the literature on feedback has yielded many opinion articles and commentaries and a handful of qualitative empirical studies, we present this report of a randomized controlled trial of teaching feedback, which demonstrated a real and immediate effect. Most importantly, this short educational intervention allowed a group of faculty to overcome enough of the discomfort of addressing a professionalism lapse to discuss it directly. Further studies will be needed to address timing of faculty development sessions, need for upkeep of skills and duration of retention, and transferability to other practices and disciplines.
Available at: http://vassarstats.net. Accessed September 20, 2012.
References
Appendix 1. Educational Intervention
One-hour Workshop on Giving Effective Feedback
Learning goals:
Appreciate the rationale for diagnosing the trainee’s learning needs by using frame-based feedback;
Appreciate the role of holding generous inferences about the learner as a way to create psychological safety and overcome the task versus relationship dilemma;
Gain skill in pairing advocacy with inquiry to give direct, corrective feedback and elicit learner’s frames to determine their learning needs.
Curriculum
Interactive slide and video presentation and exercises
Cognitive frame model of learning lecture (5 min)
Analyze example of clinical error by using cognitive frame model (5 min)
Fundamentals of feedback; especially the role of assuming the best about learners as a starting point lecture (5 min)
Assess video examples of attending giving feedback to residents (10 min)
Feedback “algorithm” lecture; how to use previewing statement and pair advocacy (instructor’s opinion about performance paired with inquiry about learner’s frame) (5 min)
Coached role-play exercise by using algorithm (15 min)
Bosk error taxonomy lecture (5 min)
Categorize resident’s errors by using Bosk taxonomy (5 min)
Prepare participants to address not-calling-for-help error (5 min)
Appendix 2. Detailed Instructions for Both Study Subjects and Resident Actor
Instructions to Subject
She or he was also told that for the purpose of this exercise, she or he could assume that she or he had perfect knowledge of the case as an observer and that the resident would not find that surprising. On arrival at the simulated coffee room, the subject found the resident alone at a conference table using a computer. The subject was free to sit anywhere, shut or leave the door open, and attain any posture she or he chose. Upon engaging the resident, she or he was free to follow any lines of conversation.
Semiscripted Responses for Resident Actor
(1) While trying to keep the conversation as normal as possible, the resident responded to direct questions only and did not elaborate unless asked. (2) If the resident was asked questions to elucidate his knowledge of a clinical subject, he responded with correct answers at depth of knowledge expected from a competent trainee at the end of first year level (in the opinion of the investigators). For example, if asked about the physiology of hypertrophic obstructive cardiomyopathy, he would describe a cardiomyopathy whereby the outflow track of the left ventricle would become obstructed if the heart rate were increased. (3) If asked questions about his performance, the resident replied evasively and defensively. For example, if asked whether he thought he asked for help early enough, he would reply that he was glad that someone came when he needed help. (4) If the subject made a clear observation and asked a relevant question, the resident revealed the first part of his mental frame. For example, if the subject said, “I noticed the circulating nurse asked you twice if you needed help and you declined. How come?” the resident would reply, “I wanted to be see if the esmolol worked and that I had done everything I was supposed to do in that situation.” (5) If the subject pursued the topic with further questioning the resident would reveal the second part of his mental frame. For example, if the subject said, “It seemed that the esmolol was not helping the blood pressure and I thought you could have used the help. How did you see it?” the resident would reply, “Actually, I have been chastised by my attending before for calling for help prematurely. I think I have gotten a reputation for being weak and I know that is hard to shake.” (6) If the subject lectured or taught about something that was not directly relevant to the resident’s frame, the resident would listen and respond neutrally only with a nod or say “ok.” (7) If the subject seemed to be finished with the feedback session by explicitly saying so or by repeating the same material, or by summarizing the session the resident would say his pager had gone off and he needed to go do the next case.
Appendix 3. Rater Training Process
Raters read and discussed case materials.
An investigator introduced the instruments and discussed each item.
The raters watched as a group and independently rated a video.
An investigator and the raters discussed score matches and differences.
Items 3 and 4, above, were repeated until three videos were analyzed.
Each rater independently watched and rated three videos and the results were compared to see whether all element ratings were within one unit and objective measures matched.
Raters and an investigator repeated 3 and 4, above, for one more video.
Item 6, above, was repeated.