The caffeine-halothane contracture test (CHCT) is the only recognized laboratory test to diagnose malignant hyperthermia (MH). The authors report the results of their analysis of pooled data from the North American Malignant Hyperthermia Registry database to determine the sensitivity and specificity of the CHCT.
The MH Clinical Grading Scale was used to identify 32 case subjects who were "almost certain" to be MH susceptible based on clinical criteria alone. Their CHCT results were compared with those of a group of 120 control subjects considered to be at low risk for MH. Diagnostic thresholds of the CHCT were adjusted, and its component tests were combined to generate receiver operating characteristic curves. The maximal Youden index for each component test was chosen as the diagnostic threshold indicative of MH susceptibility.
The highest sensitivity (97%; 95% CI, 84-100%) was achieved with a two-component test with thresholds of > or = 0.5 g contracture for 3% halothane, > or = 0.3 g contracture at 2 mM caffeine, or both, considered positive for MH. The test specificity was 78% (95% CI, 69-85%). The addition of other CHCT component tests did not improve CHCT sensitivity or specificity.
The CHCT achieves high sensitivity and acceptable specificity as a clinical laboratory diagnostic test when it is performed according to published standards. However, it cannot be used as a screening test because of the low prevalence of MH in the general population.
THE caffeine-halothane contracture test (CHCT) is the only generally recognized test for the laboratory diagnosis of malignant hyperthermia (MH). Centers worldwide use one of two protocols-either the European Malignant Hyperthermia Group protocol or the standards published by the North American Malignant Hyperthermia Group. With a mandate from the North American Malignant Hyperthermia Group, the North American Malignant Hyperthermia Registry was created in 1987 to collect and analyze data from all participating biopsy centers. The Registry published a report on the specificity of the CHCT in 1992. 
In 1992, the Registry published its initial findings on the sensitivity of the North American CHCT*. After more data were collected and analyzed, the North American Malignant Hyperthermia Group met in September 1994 to discuss the results. We now report the analysis of that data, which involves 120 control subjects and 32 case subjects. This represents the first full report of diagnostic thresholds derived from the Registry database, one of the largest disease registries in the world. We also discuss the implications of the results for clinicians who are treating patients thought to be MH susceptible and for future genetics studies of MH.
Materials and Methods
After we received institutional review board approval, we studied all persons reported to the Registry between 15 March 1989 and 19 August 1994 who underwent skeletal muscle biopsy by North American Malignant Hyperthermia diagnostic centers. The diagnostic centers' results were included for analysis if the center had reported at least 10 control subjects and at least 1 case subject (Table 1). An expert panel, blinded to CHCT results, excluded subjects according to explicit criteria determined at the study's outset (Table 2(A and B)). Later the panel reviewed the results for all muscle strips and excluded those that did not meet specific technical criteria (Table 2(C)). The CHCT results were reviewed only after subjects had been identified.
We searched the entire Registry database and identified case subjects using the MH Clinical Grading Scale as the clinical case definition of MH susceptibility;the scale does not rely on CHCT results. A case subject was ranked D6 on the scale; i.e., “almost certain” to be MH susceptible. A control subject (D1) was considered to be at low risk for MH susceptibility (negative personal and family history of MH, no history of myopathy) and underwent muscle biopsy during an unrelated surgical procedure, such as total hip arthroplasty.
All diagnostic centers followed the protocol for in vitro testing as published in 1989 by the North American Malignant Hyperthermia Group. The halothane and caffeine contracture tests were done in triplicate; an abnormal response in any muscle strip was considered diagnostic for MH susceptibility. The protocol defined an abnormal muscle contracture response as one of the following: for the halothane contracture test, 0.2–0.7 g contracture after exposure to 3% halothane for as long as 10 min; for the caffeine contracture test, (1) >or= to 0.2 g contracture at 2 mM caffeine, (2) caffeine-specific concentration (CSC) < 4 mM caffeine, or (3) > 7% increase in tension at 2 mM compared with maximal tension generated at 32 mM caffeine (percentage maximal response). These diagnostic thresholds were set by consensus at the 1987 meeting of the North American Malignant Hyperthermia Group. Each center was to determine its diagnostic threshold or cutpoint within the range agreed on for 3% halothane and to determine what method it would use to interpret the caffeine contracture test. 
We compared the case and control responses to 3% halothane alone and to caffeine alone. This allowed us to calculate the sensitivity and specificity of the CHCT, assuming that the cases were truly MH susceptible and that the controls were not. These calculations used combinations of the halothane contracture test and the three methods of interpreting the caffeine contracture test (2 mM caffeine, CSC, percentage maximal response) to form two-, three-, and four-component test strategies.
Demographic and muscle strip characteristics of case and control groups after exposure to halothane and caffeine were compared. Associations between MH status (MH case vs. control subjects) and dichotomous variables, controlling for the center where the subject had the biopsy, were assessed using Mantel-Haenszel methods. Differences between MH status for continuous measures were assessed using mixed-effects analysis of variance models, in which the individual subject was considered a random effect. To accommodate skewed distributions and to meet modeling assumptions, appropriate transformations of the dependent variable were used when necessary. For ease of interpretation, untransformed data are reported as a mean +/- 1 SD.
Demographic and muscle strip characteristics that appeared to differ between case and control subjects (P < 0.005 after multiple comparison testing) were further investigated. Characteristic tests were constructed using combinations of these factors to determine if these characteristic tests performed better than the component tests. For multiple-factor combinations of the characteristics, logistic regression was used to model MH status. In addition, logistic regression models were fit to determine if a component test was confounded with MH status in the presence of these demographic and muscle strip characteristics. 
Receiver operating characteristic curves were generated for each component and characteristic test, varying the diagnostic thresholds of each test. The diagnostic thresholds for multiple-factor characteristic tests were based on the predicted probabilities that the subjects would have MH based on the logistic regression models. The area under the receiver operating characteristic curve for each test was calculated and compared. The diagnostic threshold indicative of MH susceptibility for each test was selected based on the maximal Youden index (sensitivity + specificity - 1). Comparison of sensitivity and specificity estimates between tests at selected diagnostic thresholds were done using generalized estimating equations in marginal regression models to accommodate unbalanced data and correlation between tests on the same persons. 
In this article, the sensitivity of the CHCT is defined as the percentage of positive test results in a population of “almost certain”(D6) MH susceptible subjects. Specificity refers to the percentage of negative CHCT results in a population of control, low-risk (D1) subjects. Thus an MH-susceptible subject who had a positive CHCT result would be described as a “true positive;” if the same MH-susceptible subject had a negative CHCT result, that result would be labeled as a “false negative.” The false-negative rate (%) is calculated using the formula: 100 - sensitivity. Similarly, if a subject who truly was not MH susceptible had a positive CHCT result, that result would be called a “false positive.” The false-positive rate (%) of a test is calculated by the formula: 100 - specificity.
All statistical analyses were performed with the SAS statistical package (SAS Institute, Cary, NC).
We reviewed data from 19 diagnostic centers. We excluded data from nine centers for the following reasons: fewer than 10 control subjects (five centers), no case subjects (four centers), and use of non-vastus muscle (one center). This excluded data on 122 subjects (109 controls, 13 cases). Then we applied the subject and muscle strip exclusion criteria to the data from the remaining 10 centers (Table 3). After these exclusions, data from 120 control subjects (607 muscle strips) and 32 case subjects (183 muscle strips) remained for analysis.
(Table 4) lists the characteristics of the two study groups. Case subjects were younger than control subjects and were more likely to be male. There was no difference in subject body build between groups. Muscle strips from case subjects did not differ from control subjects with respect to length, wet weight, or cross-sectional area. Muscle twitch height immediately before testing was higher in case subjects, and the interval from biopsy to the beginning of the CHCT was shorter.
(Table 5) lists the sensitivity and specificity of the 3% halothane contracture test and the 2 mM caffeine contracture test at varying diagnostic thresholds. These data are displayed in Figure 1(A and B), where the areas under the curves are 0.908 and 0.859, respectively. For diagnostic testing of a potentially lethal condition, a high degree of sensitivity (no false negatives) and an acceptable level of specificity (or false positives) are desired. For the 3% halothane test, this occurred at a cutpoint of >or= to 0.5 g, where sensitivity equaled 87% and specificity 86%. For the 2 mM caffeine test, a cutpoint of >or= to 0.2 g produced a sensitivity of 84% and specificity of 79%. When the results of these two tests were combined, the cutpoints that produced the highest test sensitivity (97%; 95% CI, 84–100%) and specificity (78%; 95% CI, 69–85%) were >or= to 0.5 g for 3% halothane, >or= to 0.3 g for 2 mM caffeine (Figure 1(C), area under curve, 0.885), or both.
(Table 6) summarizes the results of other combinations of CHCT data interpretation. This includes the two other methods of analyzing the caffeine contracture test (CSC, percentage maximal response)and several multiple component tests. None matches the sensitivity and specificity of the two-component CHCT with a cutpoint of >or= to 0.5 g for 3% halothane, >or= to 0.3 g for 2 mM caffeine, or both, while involving only two tests. Although the two-component test of 3% halothane >or= to 0.5 g, percentage maximal response > 7%, or both, had similar rates of sensitivity and specificity, data were lost because the muscle strips frequently tore apart when exposed to 32 mM caffeine.
When comparing the area under the receiver operating characteristic curve of the two-component CHCT to the area under the curve of each of the other component tests, only one, the curve for the halothane, CSC < 4 mM test was significantly smaller (P = 0.01). This indicates that using the CSC < 4 mM as one arm of a two-component test does not provide as good diagnostic information as using other measurements of caffeine sensitivity.
(Table 7) summarizes combinations of demographic and muscle strip characteristic tests. Cutpoints for the multiple-factor characteristic tests were chosen based on the maximal Youden index from logistic regression models. Only the characteristic test using age, sex, and predrug twitch tension per cross-sectional area of muscle strips exposed to caffeine had sensitivity and specificity rates that did not differ significantly from the two-component CHCT using 3% halothane and 2 mM caffeine. The area under the curve for the two-component CHCT was greater than the individual characteristic tests of sex (P = 0.001) and predrug twitch tension per cross-sectional area exposed to halothane or caffeine (P < 0.001).
The coefficient for the two-component CHCT in a logistic regression model using only this test was 0.32 (SE, 0.05). The coefficient for the same test in the presence of the factors of age, sex, predrug twitch tension and cross-sectional area for muscle exposed to halothane and caffeine was 0.34 (SE, 0.08). There was no significant difference between the coefficients or the areas under the curves. This suggests that the two-component test was not confounded with MH status in the presence of age, sex, or predrug twitch tension per cross-sectional area for muscle strips exposed to either halothane or caffeine.
In 1987, the North American Malignant Hyperthermia Group reached a consensus on a protocol for performing the CHCT. At that time it was agreed that a muscle contracture >or= to 0.2 g to 2 mM caffeine would be considered abnormal in determining MH susceptibility. In addition, two other methods of interpreting the caffeine contracture test (CSC, percentage maximal response) were accepted. For the 3% halothane contracture test, a diagnostic threshold was not identified. Instead, a range of thresholds (0.2–0.7 g) was recommended, with each center to determine its own threshold after reviewing the results of at least 30 control subjects.
The present analysis refines the decisions made in 1987 by reviewing pooled data from 10 MH diagnostic biopsy centers. This analysis has determined the optimal diagnostic thresholds of >or= to 0.5 g for the 3% halothane contracture test, >or= to 0.3 g at 2 mM for the caffeine contracture test, or both (Table 5). The combination of these thresholds for a two-component test appears to be superior to other combinations of diagnostic thresholds, test components, or both (Table 6). However, when we compared the area of the receiver operating characteristic curve for this test with other component tests, no significant differences were found except for the halothane, CSC < 4 mM test.
In this study, we determined the sensitivity and specificity of the CHCT by comparing results from control subjects who were unlikely to be MH susceptible with results from case subjects who were “almost certain” to be MH susceptible. We calculated a sensitivity of 97%(95% CI, 84–100%) for the two-component CHCT (response to 3% halothane alone, to 2 mM caffeine alone, or both) with a specificity of 78%(95% CI, 69–85%). The determination of sensitivity and specificity requires two distinct populations of subjects. The MH Clinical Grading Scale was used as a clinical case definition for MH susceptibility; the scale does not rely on CHCT results to score subjects. Because MH susceptibility is expressed under general anesthesia with triggering agents, phenotyping subjects accurately is difficult. Before 1994, no consensus definition for MH existed, which made the comparison of results among subjects or biopsy centers highly subjective.
The MH Clinical Grading Scale is a great improvement over personal opinion as to what constitutes an MH episode. The scale has certain limitations-some D6 cases may have been excluded because of insufficient data reporting, most often laboratory results (arterial blood gases, serum potassium, urine myoglobin). The scale is somewhat subjective because it relies on the reporting clinician's suspicion that a given sign is abnormal and inappropriate. However, when the scale was created, its developers believed that the reporting clinician must have some discretion about the appropriateness of the signs. Only the clinician observing the event can consider a clinical sign in real-time and in light of the patient's premorbid condition, the surgical procedure, and any medications.
Our results may be limited by our exclusion of other subjects. For example, we excluded biopsy centers that did not report at least one case subject and ten control subjects, biopsy reports that were incomplete, and tests that did not adhere to the published CHCT standards. However, this was the only way to ensure a uniform study population and reduce possible bias.
Control subjects represent a nonrandom surgical population chosen because their surgical procedure allows an incidental muscle biopsy to be performed, such as total hip arthroplasty. Because of this, control and case subjects differ in age (mean age, 53 vs. 25 yr;Table 4), but all control subjects were ambulatory and had negative personal and family histories. Malignant hyperthermia appears to be a disorder that more frequently affects young males, so it is not surprising that the case subjects were predominantly men. However, the results of the two-component CHCT were not confounded by subject age or sex, supporting the validity of comparing the case and control subjects as we have done.
Other studies support the high sensitivity of the CHCT performed according to the North American standards. In swine studies, in which each animal's MH status is known, the CHCT has been accurate. The major advantage of animal studies is the ability to use anesthetic challenge with triggering agents as a final test of MH susceptibility. Another swine study questioned the role of the RYR1 mutation in MH susceptibility based on the animals' lack of response to halothane and succinylcholine. 
Recently, the European Malignant Hyperthermia Group published a report of the performance of the CHCT according to their protocol. Using the Clinical Grading Scale, 20 centers provided data on 105 patients thought to be MH susceptible who were rated as D6 and 202 low-risk subjects who underwent muscle biopsy and contracture testing. The sensitivity of the European protocol was determined to be 99%(95% CI, 94.8–100%); the specificity was 93.6%(95% CI, 89.2–96.5%). It was not clear how the 202 low-risk subjects were selected because the European Malignant Hyperthermia Group does not have a central registry, but all met criteria similar to those that we used in this study. Only 13 of 20 centers included D6 patients and low-risk subjects; in the present study, only centers that provided both types of subjects were included.
The European and North American CHCT protocols are similar but not identical. The European protocol uses slightly different caffeine concentrations for the caffeine contracture test. More importantly, it uses an incremental dosing technique for the halothane contracture test (0.5%, 1%, 2%, 3%) instead of the single 3% halothane exposure used in the North American protocol.
The European protocol uses a uniform diagnostic threshold of >or= to 0.2 g, with an abnormal response occurring at a caffeine concentration <or= to 2.0 mM or at a halothane concentration <or= to 2%(0.44 mM). An abnormal contracture response to caffeine and to halothane must occur for an “MHS” diagnosis to be given to the test results. If an abnormal response occurs only to caffeine or to halothane, the test result is labeled as “MHE” or equivocal. Patients are counseled as if they are thought be be MH susceptible, pending further refinements of various diagnostic tests. The North American protocol considers the CHCT results abnormal if there is an abnormal response either to caffeine or to halothane. The difference in the specificity between the two protocols may be due to the higher likelihood of a false-positive response with the single exposure to 3% halothane used in the North American protocol. 
Isaacs and Badenhorst reported four cases of possible false-negative CHCT results using the European protocol. All four patients had suspicious episodes under anesthesia, but subsequent CHCT results were negative for MH. One patient underwent muscle biopsy on two different occasions, and both CHCT results were negative. This case report illustrates that false-negative CHCT results may occur. Like nearly all tests used in clinical medicine, a degree of uncertainty exists in CHCT results. 
False-positive results are also possible with many tests, including the CHCT. Serfas et al. studied a large Mennonite family and compared CHCT results for the presence of the Arg614Cys MH mutation on chromosome 19. One subject had a positive CHCT (3.9 g response to 3% halothane; 0.8 g response to 2 mM caffeine) but lacked the chromosome 19 mutation. The authors suggested that the CHCT result was a false-positive one; otherwise there was no linkage of MH susceptibility to the chromosome 19 mutation in this family. Most MH biopsy center directors would argue that such CHCT responses are abnormal and unlikely to be a false-positive result. Perhaps the subject represents a new mutation. Because there is no ethical way to determine the correct diagnosis, clinically the patient would be considered MH susceptible. Unlike swine, there appear to be several mutations that lead to MH susceptibility in humans. The presence of heterogeneity is supported by another study from the United Kingdom of a large kindred in which linkage to the RYR1 region of chromosome 19 was demonstrated, but the Arg614Cys mutation was not present. 
An important reason for reporting this study is to provide molecular geneticists with the sensitivity and specificity of the CHCT at various thresholds. Genetic studies require subjects who are likely to be MH susceptible when their CHCT results are positive; i.e., they must have high specificity or a low false-positive rate. The diagnostic thresholds used by geneticists may be different from those used by clinicians. We suggest that genetics investigators may wish to use thresholds of >or= to 0.7 g for the halothane contracture test, >or= to 0.3 g for the caffeine contracture test, or both, which have a sensitivity of 88%(95% CI, 71–97%) and a specificity of 81%(95% CI, 73–88%). However, we caution that the confidence intervals for sensitivity and specificity between these thresholds and those recommended for clinical diagnosis (i.e., halothane contracture >or= to 0.5 g, 2 mM caffeine contracture >or= to 0.3 g) are similar. We also believe that, for uniformity, the Clinical Grading Scale should be used as a clinical case definition to identify potentially affected persons for genetics studies.
In September 1994, members of the North American Malignant Hyperthermia Group met to discuss the results of this study at the Fifth Malignant Hyperthermia Biopsy Standards Conference. The group decided that an equivocal range of contracture responses should be adopted for both parts of the two-component test. A positive response to 3% halothane was identified as a contracture >or= to 0.7 g, an equivocal response to be >or= to 0.5 to < 0.7 g, and a negative response as < 0.5 g. A positive response at 2 mM caffeine was set as a contracture >or= to 0.3 g, an equivocal response to be >or= to 0.2 to < 0.3 g, and a negative response as < 0.2 g. This yields a test sensitivity of 88% and a specificity of 81% for an unequivocally positive response. The equivocal category was suggested to give individual biopsy centers more latitude in diagnosing persons as MH susceptible. However, it was agreed that the Registry may use the thresholds of >or= to 0.5 g for 3% halothane and >or= to 0.3 g for 2 mM caffeine for investigations that require maximum test sensitivity.
Because MH is potentially fatal, the thresholds for diagnostic testing require a high degree of sensitivity and an acceptable degree of specificity. That is, physicians are willing to accept false-positive responses to avoid false-negative ones because the consequences of a false-negative diagnosis might be disastrous. Sensitivity and specificity are stable properties of a test; i.e., they usually do not vary with disease prevalence. However, as one deals with less severe forms of a disease and with different patients, test sensitivity and specificity may decrease (spectrum bias) and reduce the utility of the test. This problem usually occurs when tests are used to screen asymptomatic populations. In the case of the CHCT, because it requires an open muscle biopsy and is performed at a limited number of centers, patients are carefully selected after their clinical records are reviewed and their pretest probability is already high. There may be differences in the onset and severity of MH among affected individuals, but such a theory cannot be tested in humans. Other factors, both intrinsic and extrinsic to the affected patient, may be involved. However, if MH is inherited in an autosomal dominant pattern, then susceptibility should be all-or-none, and no spectrum of susceptibility should exist. Therefore the estimates of sensitivity and specificity of the North American CHCT presented here are the best estimates available.
The CHCT is not a screening test and is poorly predictive of MH susceptibility when applied to the general population because of the low prevalence of the disorder. However, it has excellent predictive value at intermediate levels of probability (25–50%). In families, for example, CHCT results can be used to accurately predict risk of MH susceptibility once the proband has been correctly identified. Similarly, patients suspected of having an MH episode (intermediate pretest probability) should undergo muscle biopsy and contracture testing. Without correctly identifying the proband, many patients and families will be mislabeled as MH susceptible. Such mislabeling may have adverse effects on their ability to obtain optimal medical or dental care, or in career choices such as military service.
In conclusion, we have determined the sensitivity and specificity of the CHCT performed according to the North American Malignant Hyperthermia Group standards. We believe that these results can be used to improve risk assessment in patients and families, and to accurately identify subjects for future studies, especially for molecular genetics. The hope for the development of a less invasive test to screen for MH susceptibility lies in correctly identifying affected persons. This requires muscle biopsy and contracture testing under standardized conditions.
The authors thank Richard Landis, Ph.D., and Tom Hen Have, Ph.D., from the Center for Biostatistics and Epidemiology, for their statistical expertise during preparation of this manuscript; Julian Loke, M.D., and David MacLennan, Ph.D., of the University of Toronto, for their critical reading of the manuscript; and Cindy Brubaker for assistance in preparing the manuscript. The authors also appreciate the insightful discussions that occurred at the 1994 North American Malignant Hyperthermia Group meeting in Chicago that helped to shape this manuscript.
*Larach MG, Landis JR, Shirk SJ, Diaz M, and the North American Malignant Hyperthermia Registry: Prediction of malignant hyperthermia susceptibility in man: Improving sensitivity of the caffeine halothane contracture test. Anesthesiology 1992; 77:1052.