To the Editor:—
Anesthesiology recently published an editorial 1and research article 2on the topic of the design of clinical trials. The journal and its editorial board are to be commended for again calling the attention of its readers to the role of appropriate clinical trial design in the ethical and scientific conduct of clinical trials and the meaningful interpretation of their results.
The scientific study was designed to compare the quality of clinical trials among four journals over 20 yr with the aim of improving the quality of future clinical trials in anesthesia through an analysis of the deficiencies of published studies. Unfortunately, the study contains a number or deficiencies in experimental design and statistical analysis that detract from its message.
Although it was the purpose of the study to compare ten characteristics of quality related to study design among the four journals and between time periods, data from the four journals within each time period were deemed to be “similar” and therefore “pooled” before statistical analysis. Examination of the data in figures 1–4 suggests that the data from the four journals within the various time periods could be shown to be different by standard statistical analysis (this is especially obvious in fig. 4). Even if the data within each period were compared statistically and shown to be “not different,” that would be insufficient justification for “pooling” the data because “not different” is not “the same.”
The authors planned to compare the frequency of reporting each characteristic among journals and between time periods . Had they made all planned statistical comparisons, 280 would have been made. Designating P < 0.01 as the criterion for rejection of the null hypothesis to account for this number of comparisons is insufficient to avoid a Type I error. Using P < 0.01 as the criterion for rejection of the null hypothesis is even insufficient to avoid a Type I statistical error if they ultimately made only 15 comparisons.
Although the authors were interested in evaluating the quality of study design during the 20 yr period from 1981–2000, they were only able to compare results from the period 1981–1985 with those from the period 1991–1995. They were unable to include in the statistical analysis the data from the last period in which they were interested, the first 6 months of 2000, because “the numbers were too small.” That is, the study was underpowered to accomplish its aims because of a deficiency in experimental design leading to a Type II statistical error. Nonetheless, this did not prevent the authors from including the data from the first 6 months of 2000 in the table and figures as though they were included in the statistical analysis.
The author's hypothesis failed to clearly define the primary outcome variable and the magnitude of the change that would be sufficient to conclude that a difference was present. A pilot study of the 10 characteristics of quality related to study design could have provided an estimate of expected scores of studies published at the beginning of the period of interest 1981 and served as the basis of a well-formulated hypothesis and sample size calculation that could have prevented the above statistical errors.
Examination of table 1 suggests that the authors compared the percent of trials for which the criteria were present rather than the frequency of reporting each characteristic. If this was in fact done, it inflates the power of the study because the sample size for each time period is inflated to 100 (because the range would be 0–100) rather than the actual sample size of 80 that the authors studied in the time periods 1981–1985 and 1991–1995. Although the data from the first 6 months of 2000 was not included in the statistical analysis, reporting these data as a percent of the trials misrepresents the sample size of 20 and makes it look equivalent to the sample sizes from the other two periods.
Other problems with the study exist, including the question of the validity of the evaluation instrument and reporting the results the instrument as mean scores (fig. 1).
The point of this communication is that, while the authors are to be commended for their efforts to improve the quality of future clinical trials in anesthesia through an analysis of the deficiencies of published studies, their message would have had a greater impact and their study would have set a better example for others to follow had they avoided the common experimental design errors in their own study.