“The poor reporting quality in preclinical research may reflect many causes, but the consequences of poor reporting can be readily observed.”

Image: ©Thinkstock.

THIS editorial announces requirements for reporting experiments in animals, cells, molecules, or other biological foci that we will term “preclinical” in this editorial. In order for reviewers, editors, and readers to better gauge the quality of research, journals often endorse reporting guidelines developed by consensus methods and promulgated by organizations focused on improving quality of research conduct and reporting, such as Enhancing the QUAlity and Transparency Of health Research (EQUATOR: available at: www.equator-network.org). At their site, you will find consensus recommendations for reporting a wide variety of research designs including randomized clinical trials, observational studies, and systematic reviews. Included in that list, you will find recommendations for reporting preclinical studies as described in Animal Research: Reporting of InVivoExperiments (ARRIVE).1  Based on the ARRIVE guideline, at Anesthesiology we will require all investigators to:

  1. Describe the experiments adequately to allow other researchers to replicate them,

  2. Report whether measures to reduce bias were used, including random allocation and blinding, and how they were performed,

  3. Report how the sample size was determined, and

  4. Report the data analysis plan.

Following are the descriptions of why we require these elements and details for each.

Imagine reading a clinical study where investigators gave patients a drug thought to speed recovery from sedation after anesthesia or a placebo. The description and results of the study in the article include the following statements:

Patients received either study drug (n = 22) or placebo (n = 23), and sedation was assessed using standard questionnaires and a battery of motor tasks known to be affected by sedation at 30 min after admission to the recovery room. The primary outcome was speed to perform a finger-tracking task. Groups were compared using Student’s t test with P value less than 0.05 considered significant. Results showed that patients receiving the study drug recovered significantly faster after anesthesia by the primary outcome (P = 0.048).

What the investigators actually did was the following:

The investigators had not performed these tests before, so they decided to give placebo for the first 20 patients. The results were consistent with other studies of recovery from sedation, so they then gave active drug for the next 20 patients. They examined the results and noted that only one of the outcomes, speed of finger tracking, showed a large but variable drug effect in the anticipated direction, but only at 30 min after surgery (measurements were actually made at 15, 30, 45, and 60 min after surgery). They used several statistical tests to compare groups for this outcome, and the one that was closest to statistical significance showed P = 0.09 after they excluded one patient receiving active study drug who had a longer time than the others. Based on these promising results, the investigators enrolled two more patients per group. This resulted in P = 0.06, so they enrolled one more patient per group and observed a statistically significant effect (P = 0.048); they rejected the null hypothesis and stopped the study.

Had the investigators completely reported their actual methods, it’s unlikely that a journal would accept such an article, or that a reader would put much stock in its results. To ensure adequate reporting for clinical trials, Anesthesiology requires submitted research to conform to the CONsolidated Standards of Reporting Trials (CONSORT) guidelines. Among many reporting elements, CONSORT requires the following: (1) an adequate description of the experiments to allow other researchers to replicate them, (2) report of the measures used to reduce bias, including whether and how random allocation and blinding methods were used, (3) how the sample size was determined, and (4) the data analysis plan. These are the same reporting elements that we will now require in all preclinical studies.

It took decades for clinical investigators to embrace these elements as critical to interpretable, reproducible, and actionable science. Given the extent that modern preclinical research lacks rigor regarding these elements, the reporting quality in such studies is “reminiscent of the situation in clinical research approximately 50 yr ago.”2  Reporting these elements is only present a minority of the time (or not at all for sample size calculations), even in journals that strongly endorse the ARRIVE statement.3,4  The poor reporting quality in preclinical research may reflect many causes, but the consequences of poor reporting can be readily observed. The lack of reporting rigor may underlie the inability of independent industry laboratories to replicate a majority of landmark studies from academic laboratories performing cancer, cardiovascular, and stroke research.2,5,6  Failure of clinical translation and of replication of preclinical research was cited by leaders of the National Institute of Neurologic Diseases and Stroke7  and the National Institutes of Health8  when they called on journals, investigators, and funders to improve education in good scientific design and in transparent reporting of essential research design elements.

Authors are encouraged to review the full ARRIVE guidelines1  (in addition to the citation, they can be directly accessed at www.nc3rs.org.uk/arrive-guidelines) before submission of preclinical studies to Anesthesiology. However, the following items will be particularly scrutinized in research submissions.

  • 1. Describe the experiments adequately to allow other researchers to replicate them

This is unchanged from our current requirement, and investigators are encouraged to report the key aspects of the experiments that would allow an experienced investigator outside of their laboratory to attempt replication of the study. All studies that were performed should be reported, not just those that support the hypothesis, including the number of animals in these studies and the statistical analysis. Pilot studies used to define conditions should be described only to the extent that they would aid in replication.

  • 2. Report whether measures to reduce bias were used, including random allocation and blinding, and how they were performed

Some investigators argue that random allocation is not necessary because they are studying inbred or highly homogeneous animal populations and that blinding is not necessary because the animal is effectively blinded to treatment. However, the need for these procedures is underscored by changes in animal behavior due to seasonal changes in the source of the protein in commercial animal chow9  and large interindividual animal variability in behaviors before and after surgery.10  In addition, we now know that environmental influences can alter subsequent biology and physiology via epigenetic and other mechanisms, despite presumed identical genomes. Similarly, experimenter blinding is essential whenever possible given that unintentional experimenter bias can influence measurements as evidenced by the fact that effect sizes of interventions are lower in studies when blinding is performed.11 

  • 3. Report how the sample size was determined

Although many preclinical articles include multiple experiments, it should be reported for each experiment whether there was an a priori defined primary outcome measure and sample size based on estimates of variance and minimum biologically meaningful effect sizes. We recognize the need for exploratory science, and it is quite likely that unblinded, nonrandomized experiments might be included in an article as preliminary observations. Very small sample sizes in preclinical research may result in a high likelihood of false results and in misestimation of the true effect size, and the ethics of such unreliable research has been questioned.12  Concerns over the unreliability of small sample size have led at least one journal to only accept studies with a minimum sample size of five.13  Thus, in addition to a power calculation, at very small sample sizes, the reliability of the observation should be considered.

  • 4. Report the data analysis plan

Prospective definition of primary outcome(s) and an analysis plan are needed to design a high-quality study that has a good chance of being replicated in future studies. In clinical research, prospective documentation of these design aspects is required through trial registration. Although trial registration is not required for preclinical research, the authors should state whether primary outcomes and an analysis plan were established before the study started and to declare what elements of the analysis were derived after examination of the data (i.e., post hoc). Clinical research investigators report the number of subjects recruited into the trial, randomized into conditions, and the number excluded from the analysis, as well as the reasons for exclusion. This same practice should be reported for each experiment involving animals. Although there may be cases where a majority of animals are excluded from data analysis due to technical failures, providing this information is extremely valuable to other investigators who wish to replicate the experiment or method. Whether any data were excluded as outliers should also be reported, including how outliers were defined and whether this was done prospectively and before unblinding. Often, it is advisable to report the analysis with and without outliers to allow a reader to evaluate the data in both contexts.

As noted, despite journal endorsement of these and other elements of the ARRIVE guidelines for reporting preclinical research, articles in these journals report the elements only a small minority of the time. Furthermore, there has been little improvement in reporting practices over the past 3 yr and little difference between journals with high- or low-impact factors.3,4  For the past several years, Anesthesiology has scanned all clinical trials with custom-designed software to identify elements of CONSORT that are not included, and we will do the same for preclinical research for these elements of ARRIVE. The goal of these efforts is not to reduce the amount of preclinical research we publish or to discourage authors from considering Anesthesiology for publication of their preclinical research. Rather, the goal of these efforts is to enhance trust by our readers in the quality of the science we publish and to enhance trust by investigators that this published work is more likely to replicated and perhaps translated into improved care of patients.

Supported, in part, by grant R37-GM48085 from the National Institutes of Health, Bethesda, Maryland.

Dr. Eisenach is the Editor-in-Chief of Anesthesiology, and his institution receives salary support from the American Society of Anesthesiologists (ASA), Schaumburg, Illinois, for this position. Dr. Houle is the statistical Editor of Anesthesiology, and his institution receives salary support from the ASA for this position. Dr. Warner declares no competing interests.

1.
Kilkenny
C
,
Browne
WJ
,
Cuthill
IC
,
Emerson
M
,
Altman
DG
:
Improving bioscience research reporting: The ARRIVE guidelines for reporting animal research.
PLoS Biol
2010
;
8
:
e1000412
2.
Begley
CG
,
Ellis
LM
:
Drug development: Raise standards for preclinical cancer research.
Nature
2012
;
483
:
531
3
3.
Macleod
MR
,
Lawson McLean
A
,
Kyriakopoulou
A
,
Serghiou
S
,
de Wilde
A
,
Sherratt
N
,
Hirst
T
,
Hemblade
R
,
Bahor
Z
,
Nunes-Fonseca
C
,
Potluru
A
,
Thomson
A
,
Baginskaite
J
,
Egan
K
,
Vesterinen
H
,
Currie
GL
,
Churilov
L
,
Howells
DW
,
Sena
ES
:
Correction: Risk of bias in reports of in vivo research: A focus for improvement.
PLoS Biol
2015
;
13
:
e1002301
4.
Baker
D
,
Lidster
K
,
Sottomayor
A
,
Amor
S
:
Two years later: Journals are not yet enforcing the ARRIVE guidelines on reporting standards for pre-clinical animal studies.
PLoS Biol
2014
;
12
:
e1001756
5.
Prinz
F
,
Schlange
T
,
Asadullah
K
:
Believe it or not: How much can we rely on published data on potential drug targets?
Nat Rev Drug Discov
2011
;
10
:
712
3
6.
Warner
DS
,
James
ML
,
Laskowitz
DT
,
Wijdicks
EF
:
Translational research in acute central nervous system injury: Lessons learned and the future.
JAMA Neurol
2014
;
71
:
1311
8
7.
Landis
SC
,
Amara
SG
,
Asadullah
K
,
Austin
CP
,
Blumenstein
R
,
Bradley
EW
,
Crystal
RG
,
Darnell
RB
,
Ferrante
RJ
,
Fillit
H
,
Finkelstein
R
,
Fisher
M
,
Gendelman
HE
,
Golub
RM
,
Goudreau
JL
,
Gross
RA
,
Gubitz
AK
,
Hesterlee
SE
,
Howells
DW
,
Huguenard
J
,
Kelner
K
,
Koroshetz
W
,
Krainc
D
,
Lazic
SE
,
Levine
MS
,
Macleod
MR
,
McCall
JM
,
Moxley
RT
III
,
Narasimhan
K
,
Noble
LJ
,
Perrin
S
,
Porter
JD
,
Steward
O
,
Unger
E
,
Utz
U
,
Silberberg
SD
:
A call for transparent reporting to optimize the predictive value of preclinical research.
Nature
2012
;
490
:
187
91
8.
Collins
FS
,
Tabak
LA
:
Policy: NIH plans to enhance reproducibility.
Nature
2014
;
505
:
612
3
9.
Shir
Y
,
Ratner
A
,
Seltzer
Z
:
Diet can modify autotomy behavior in rats following peripheral neurectomy.
Neurosci Lett
1997
;
236
:
71
4
10.
Peters
CM
,
Hayashida
K
,
Suto
T
,
Houle
TT
,
Aschenbrenner
CA
,
Martin
TJ
,
Eisenach
JC
:
Individual differences in acute pain-induced endogenous analgesia predict time to resolution of postoperative pain in the rat.
Anesthesiology
2015
;
122
:
895
907
11.
Sena
E
,
van der Worp
HB
,
Howells
D
,
Macleod
M
:
How can we improve the pre-clinical development of drugs for stroke?
Trends Neurosci
2007
;
30
:
433
9
12.
Button
KS
,
Ioannidis
JP
,
Mokrysz
C
,
Nosek
BA
,
Flint
J
,
Robinson
ES
,
Munafò
MR
:
Power failure: Why small sample size undermines the reliability of neuroscience.
Nat Rev Neurosci
2013
;
14
:
365
76
13.
Curtis
MJ
,
Bond
RA
,
Spina
D
,
Ahluwalia
A
,
Alexander
SP
,
Giembycz
MA
,
Gilchrist
A
,
Hoyer
D
,
Insel
PA
,
Izzo
AA
,
Lawrence
AJ
,
MacEwan
DJ
,
Moon
LD
,
Wonnacott
S
,
Weston
AH
,
McGrath
JC
:
Experimental design and analysis and their reporting: New guidance for publication in BJP.
Br J Pharmacol
2015
;
172
:
3461
71