To the Editor:— The editorial by Nuttall and Houle1on the article by Vincent et al. 2is long on method but short on biology. Nonstatisticians—the majority of readers—will be trying to get the article in clinical context. The editorial does not help them in this, and its pejorative title gets it off to a bad start. Nuttall and Houle1give a useful assessment of propensity scoring (in general) but barely mention the data (in this study), and so risk giving the reader the impression that the content should be given only limited credence. An editorial that gave more prominence to the biology would have collated the evidence and achieved broader perspective.
The article of Vincent et al. 2is a hypothesis-generating study that questions the current consensus on erythrocyte transfusion therapy, in a similar manner to the findings of Connors et al. 3(with respect to pulmonary artery catheterization) and, more recently, Karkouti et al. 4and Mangano et al. 5(with respect to aprotinin in cardiac surgery). The conclusions of Vincent et al. 2may be disturbing, but to summarize the study with the truism “interpret with caution”—on the grounds of methodology—is an incomplete response that serves nobody. The key question is: Do the article’s findings reflect flawed methods, or do they suggest a problem with generalizability (e.g. , might previous data derived from randomized controlled trials (RCTs) be driving current practice inappropriately)? The editorialists omit the latter possibility altogether, which is unfortunate because it may be the most important lesson from the article.
In looking at two studies with disparate results, such as those of Vincent et al. 2and the landmark Transfusion in Critical Care (TRICC) study,6the most useful initial response is to try to understand how they can be reconciled, or how what was apparently true before might not be true now. Transfusion practice has changed as a result of the TRICC trial, and transfusion of leukodepleted erythrocytes is now widespread. If these changes are truly beneficial, we would expect the impact of transfusion decisions to change also, with a reduction in “harmful transfusion.” If the changes in practice had resulted in overly conservative decision making, we might observe an increase in harm from “harmful nontransfusion.” Successful RCTs that are followed by evidence of a “downside” are not novel; the Randomized Aldactone Evaluation Study,7which showed improved survival in patients receiving spironolactone, was followed by observational data suggesting an increase in morbidity and mortality from hyperkalemia.8Although the “harm” component in the TRICC trial seemed to stem from liberal transfusion in younger, healthier patients, later analysis suggested possible harm also from not transfusing in TRICC participants with known coronary artery disease.9
The editorialists are right to address study methodology, but they should not leave the reader with an indictment of propensity scores and, by extension, observational studies. When discussing methodologic issues, we should keep in mind the suggestion that recent high-quality observational studies and RCTs often arrive at similar conclusions,10the fact that highly cited randomized trials may produce incorrect or exaggerated results,11and the suggestion that the durability of medical knowledge is unrelated to methodologic quality.12
Even the best observational study is limited by an inability to draw causal inferences and by the presence of confounders. RCT design takes causality as a given and puts its trust in an ability to minimize—of course it does not eliminate—confounders by randomization. But the problem of “unknown unknowns” remains, and the greater the number of unknown confounders that exist, the greater the likelihood of an imbalance. This problem is common to RCTs and observational studies alike and is probably most likely in small studies where our understanding of disease pathogenesis is limited. In a study with total n ≈ 1,600, where five independent confounders exist, each with an incidence of 20%, the probability of an imbalance for at least one confounder is almost 25%.13So studies A and B might disagree because A has greater balance of unknown confounders than B, and thus a better balance of confounders in a large observational study might “trump” randomization in a small RCT. This does not upgrade the status of observational studies, but it does explain why well-designed observational studies often arrive at similar conclusions relative to RCTs, and why some of the time they will correctly contradict previous RCT data. The controversial articles by Karkouti et al. 4and Mangano et al. 5may exemplify this—as suggested by the results of the recent Blood Conservation Using Antifibrinolytics in a Randomized Trial.14
The article of Vincent et al. discusses whether leukoreduction might account for the findings but provides no data2; the editorial does not mention it.1Neither the original article nor the editorial provides any convincing explanation (i.e. , biologic basis) for the reported effect. We wonder whether additional analysis of the data in the article of Vincent et al. 2might shed light on whether leukoreduction may be responsible for the apparently altered impact of transfusion, as has been suggested previously.15,16
The data of Vincent et al. 2and the recent TRICC reanalysis by Deans et al. 9suggest that outcome is changing over time and that the interpretation of the TRICC trial is more complex than we thought. It will be some time before we get a clearer picture, but in the meantime, we should not treat propensity scoring as a straw man. Reading the article of Vincent et al. ,2we experience the judgment under uncertainty that pervades clinical life. Decisions to transfuse—and not to transfuse—are not made lightly, so it is a truism that these data should be viewed with caution. The function of the article, however, is to make us view with caution things that we think we know.
*St. Vincent’s University Hospital, Dublin, Ireland. jboylan@iol.ie