Ecologists and evolutionary biologists can and should pre-register their research

I wrote a draft of this post a few weeks ago, and now seems like a good time for it to see the light of day given the great new pre-print just posted on OSF Preprints by Brian Nosek, David Mellor, and co-authors. They describe the utility of pre-registration across a variety of circumstances. I do something similar here, though I focus on ecology and evolutionary biology and I don’t try to be as thorough as Nosek et al.. For greater depth of analysis, check out their paper. On to my post…

Transparency initiatives are gaining traction in ecology and evolutionary biology. Some of these initiatives have become familiar – data archiving is quickly becoming business as usual – though others are still rare and strange to most of us. Pre-registration is squarely in this second category. Although I know a number of ecologists / evolutionary biologists who are starting to pre-register their work (and I’ve participated in a few pre-registrations myself), I would guess that most eco/evo folks don’t even know what pre-registration is, and many who do know probably wonder if it would even be worth doing. My goals here are to explain what pre-registration is, why it’s useful, and why most ecologists and evolutionary biologists could be using it on a regular basis.

 

-What is pre-registration?

At its most thorough, a pre-registration involves archiving a hypothesis and a detailed study design, including a data analysis plan, prior to gathering data. However, as you’ll read below, the data analysis plan is typically the core element of a useful pre-registration, and a pre-registration can happen after data gathering as long as the analysis plan is declared without knowledge of the outcome of the analysis or its alternatives.  Pre-registrations are archived in a public registry (the Open Science Framework, OSF, for example) so that they can later be compared to the analysis is ultimately conducted. Depending on the pre-registration archive, the pre-registration may be embargoed to maintain confidentiality of a research plan until it is completed. Once a pre-registration is filed, it cannot be edited, though it could potentially be updated with further pre-registrations. When a pre-registered study is published, the paper should cite (or better yet, link to) the pre-registration to show the extent to which the plan was followed.

 

-Why is pre-registration a useful component of transparency?

People (and, including all of us) are worryingly good at filtering available evidence so that they end up seeing the world that they expect to see rather than the world as it actually is. In other circumstances, after noticing a pattern, we readily convince ourselves that we predicted (or would have predicted) that particular outcome. All the while, we fool ourselves into believing we’re being unbiased. Science is all about avoiding these biases and taking honest stock of available evidence, but in the absence of adequate safeguards, there is good evidence that scientists can fall prey to cognitive biases (for a striking example, see van Wilgengurg and Elgar 2013). Pre-registration is one of a number of tools that helps scientists take a clear eyed view of evidence, and it helps those of us reading scientific papers to identify evidence that is less likely to have been run through a biased filter.  When scientists fiddle with analyses and can see how that fiddling impacts results, there is a great temptation to choose the analyses that produce the most desirable outcome. If this biased subset of results gets published and other results go unreported, we get a biased understanding of the world. In my ignorant past I’ve conducted and presented analyses this way, and nearly every other ecologist and evolutionary biologist I’ve talked to about this admits to doing this sort of thing at least once. For this and other reasons (Fidler et al. 2016, Parker et al. 2016), I think this problem is common enough to reduce the average reliability of the published literature. Pre-registration could improve average reliability of this literature and help us identify papers that are less likely to be biased.

 

-Why is pre-registration a viable tool for ecologists and evolutionary biologists?

I’ve written this section as a series of hypothetical concerns or questions from ecologists or evolutionary biologists, followed by responses to those concerns / questions.

 

“I work in the field and I have to refine my methods, or even my questions, over weeks or months through trial and error”

You can pre-register after your methods are finalized. When starting work in a new system or with a new method, you generally won’t be ready to complete a particularly useful pre-registration until you’ve gotten your hands dirty. You’ll need to figure out what works and what doesn’t work through trial and error. Unless you have excellent guidance from experts in the system / method, you probably want to hold off finalizing your pre-registration until you’ve been in the field and landed on a method that works. It would still be good to think long and hard about the project before heading to the field. Develop as detailed a methodological plan as is reasonable (in many cases, you’ll have done this already at the proposal stage) and talk to a statistician to develop a tentative analysis plan. Once you’ve begun to implement a set of methods you feel good about, then complete your pre-registration.

 

“What if I have to change my methods part way through the project?”

Of course, even if you go through the trouble of field testing your methods before finalizing your pre-registration, things still might change. You might come back a second year to find that conditions demand a revised protocol. If you have to scrap your first year’s data because you can’t continue, then you probably want to create an entirely new pre-registration based on your new methods. On the other hand, if your data from last year are still usable and you’ve just had to make modest changes, then you have some choices. You could just wait until you write the manuscript to explain why your data gathering methods changed, or you could file a new pre-registration that acknowledges (and links to) the earlier protocol but also introduces the new methods. The old protocol won’t disappear, but the evolution of your project is now transparent.

 

“I work with existing data (e.g., from long-term projects, from existing citizen science projects, from my own metaphorical file drawer, for meta-analysis, etc.), so I can’t pre-register prior to data gathering.”

Pre-registration can be useful at any point before you start to examine your data for biologically relevant patterns either through examining data plots or through initial statistical analyses. If you haven’t peaked at the data yet, go for it. Pre-register a detailed analysis plan.

 

“What if I see patterns in my data that I want to follow-up on with analyses that I didn’t pre-register?”

Not a problem. Just distinguish your post hoc analyses from your preregistered analyses in your paper. Ideally you’d also report all your post hoc exploration and declare that you have done so. If you have too many to report in your paper, present them in supplementary material or even in a data repository.

 

“I focus on discovery. I don’t typically have a priori hypotheses when I start a project.”

Pre-registration can still be for you. The primary purpose of pre-registration is to promote transparency. Exploratory work is vital. We just want to know that we’re not being shown a biased subset of your exploratory outcomes. Thus if you have a study and analysis plan, you pre-register it, and then present results from the full set of analyses you presented in your pre-registration, we know we’re not getting a biased subset.

 

“I don’t develop an analysis plan until I have my data so that I can see how they are distributed and how viable different modeling alternatives are with the real data”

There are several options here. You could develop a decision tree that anticipates modeling decisions you will need to make and lays out criteria for making those decisions. Other options include working with some form of your actual data in a trial phase. For instance, you could sacrifice a portion of your data for model exploration, select a set of models to test, pre-register those, and then assess them with your remaining (unexplored) data. Alternatively you could scramble your full data set, or add some sort of noise, refine your analysis plan with these ‘fake’ data, then pre-register and re-run the analysis with the real data.

 

“I don’t want to develop a detailed analysis plan. There are too many unforeseen circumstances and I’m bound to ultimately deviate from my plan”

I have two responses to this concern. The first is to see my previous reply – there are ways to pre-register after you have your data and have confirmed that an analysis is likely to be appropriate with your data. My second point is that, just as field methods change in response to circumstances, so do statistical methods. A pre-registration doesn’t prevent us from changing an analysis, it just helps us be transparent about these changes. Among other things, this transparency probably helps us make sure that when we do change our plan, we’re doing so for a good reason.

 

“If I can just pre-register an analysis plan after collecting my data, why should I bother to pre-register the other portions of my study methods?”

Although I think it’s much better to pre-register an analysis plan than to not pre-register at all, pre-registering the whole study design is helpful for a variety of reasons. For one, pre-registering prior to completion of data gathering (or better yet, before data gathering), help makes it clear that your pre-registered analysis plan could not have been influenced by any knowledge (conscious or unconscious) about patterns in the data. Early pre-registration also facilitates transparency about the project as a whole. Later when you publish the results, other researchers can understand the scope of your work and can be shown (hopefully), that you’re not just publishing subset (potentially a biased subset) of the project. And if you never publish your work, then your pre-registration is evidence that someone at least considered doing this project at some point, and this could be useful information to other researchers down the line. A well-executed pre-registration might also help set expectations for the role of individual collaborators.

 

“Pre-registration is just extra work”

In most cases, pre-registration should not dramatically change workload. If you’ve written a grant proposal, much of the work of pre-registration will already be done. If your grant proposal doesn’t include a detailed analysis plan, presumably the manuscript you write to report your results will include a detailed explanation of your analytic methods, and so a pre-registration just shifts the timing of this writing. Likewise, if this isn’t grant funded research, some other parts of your methods, and presumably parts of your introduction, will be ready and waiting in draft form when you complete your pre-registered study and go to write it up. To the extent that you end up writing more about your analyses in a pre-registration than you would have in a paper that reported only a subset of your analyses, this is the price for doing transparent and reliable science. You should have been reporting all this information somewhere anyway.

 

“If I pre-register, I might be scooped”

You can embargo your pre-registration so that it’s private until you choose to share it. Pre-registrations on the site AsPredicted can remain private indefinitely. On the OSF, embargos are limited to four years.

 

“I’m a student just starting a project and so I don’t know enough about my system to pre-register”

If you’re mentored by someone familiar with this system, then you’ll want to work closely with your mentor to develop your pre-registration. If this isn’t possible, read through my suggestions above. There are various paths forward, from waiting until you’ve worked out the kinks in your methods to various ways of pre-registering after you have data. Think carefully and identify the path that’s best for you.

 

If you have other concerns or questions about how you could apply pre-registration to your work, I’d love to hear about them. Let’s have a discussion.

Not all work needs to be pre-registered, but most work could be pre-registered. And this is important because pre-registration will help ecologists and evolutionary biologists improve transparency and thus, I expect, reduce bias in a wide array of circumstances.

 

Ecological Society of America Ignite Session on Replication in Ecology

by Hannah Fraser

Fiona Fidler and Tim Parker organized an Ignite session on Replication in Ecology at the Ecological Society of America Annual Meeting 2017 in Portland, U.S.A a few weeks ago. Ignite sessions start with a series of 5 minute talks on a similar topic that are followed by a panel discussion. At Fiona and Tim’s session more than 50 attendees listed to talks by Fiona Fidler, Clint Kelly, Kim LaPierre, David Mellor, Emery Boose, and Bill Michener.

Tim introduced the session by describing how it had arisen from discussions with journal editors. Tim and his colleagues have recently been successful in encouraging editors of many journals in ecology and evolutionary biology to support the Transparency and Openness Promotion guidelines but one of these guidelines – the one which encourages the publication of articles that replicate previously published studies – has proven unpalatable to a number of journal editors. The purpose of the Ignite session was to discuss the purpose and value of replication studies in ecology, to raise awareness of the thoughts shared by members of Transparency in Ecology and Evolution, and to take initial steps towards developing a consensus regarding the role of replication in ecology.

Fiona Fidler

Fiona spoke first, describing the tension between the importance of replication and the perception that it is boring and un-novel. Replication is sometimes viewed as the cornerstone of science (following Popper): without replicating studies it is impossible to either falsify or verify findings. In contrast replication attempts are deemed boring if they find the same thing as the original study “someone else has already shown this”, and meaningless if they find different things “there could be millions of reasons for getting a different result”. However, there are actually a range of different types of replication studies which differ in their usefulness in terms of falsification, their novelty and the amount of resources required to achieve them. Fiona broke this down into two scales 1) using the same exact data collection procedure or completely different data collection procedures and 2) using the same exact analysis to using completely different analyses. A study that uses completely different data collation and analysis methods to investigate the same question is often termed a conceptual replication. Conceptual replication is reasonably common in ecology: people investigating whether an effect is true in a new context. However, there are very few studies that attempt to more directly replicate studies (i.e. by using the exact same data collection procedures and data analyses). However, finding a different result in these contexts doesn’t result in falsification or even, in many cases, scepticism about the findings of the original study because there are often so many uncontrollable differences between the two studies and any of these could have caused the studies to find different results. Fiona suggested that one way to enhance the relevance of all replications, but particularly these conceptual replications, could be to adopt a proposal from psychology and include a statement in every article about the constraints on generality. If all articles described the circumstances in which the authors would and would not expect to find the same patterns it becomes possible to use conceptual replications to falsify studies, or further delimit their relevance.

Clint Kelly

Previous work has shown that 1% of studies in psychology are replication studies. Clint described some of his recent work aimed at determining how many replication studies there have been in ecology. He text-mined open access journals on PubMed for papers with the word ‘replic*’ anywhere in their full text. He found that only a handful of studies attempted to replicate a previous study’s findings, and of these, only 50% claimed to have found the same result as the original study. These analyses suggest that there are many fewer replication studies occurring in ecology than in psychology. However, this value only accounts for direct replications that discuss the fact that they are replications of previous work. Conceptual replications are not included in this because it is not common practice to mention that they are replications, possibly because it makes the paper seem less novel. However, this valuable work suggests that the rate at which direct replication studies in ecology is abysmally low.

Kim LaPierre

Kim discussed the Nutrient Network (NutNet) project which she described as a coordinated, distributed experiment but which could equally be seen as concurrent direct replications of the same experiment. The NutNet project aims to “collect data from a broad range of sites in a consistent manner to allow direct comparisons of environment-productivity-diversity relationships among systems around the world”. The same experimental design is used at 93 sites all continents except Antarctica. It’s a massive effort that is achieved with almost no funding as the participating researchers conduct the experiment using their existing resources.

David Mellor

David is from the Center for Open Science and discussed how to guarantee that results of replication studies are meaningful regardless of findings. Like Fiona, David advocated using constraints on generality statements in articles to describe the situations which you would reasonably expect your results to extend to. The majority of David’s talk, however was about preregistration which can be used for replication studies but is actually useful in many types of research. The idea is that, before you start your study, you ‘preregister’ your hypotheses (or research questions) and the methods you intend to use to collect and analyse the relevant data. This preregistration is then frozen in time and referenced in the final paper to clearly delineate what the original hypotheses were (ruling out HARKing – Hypothesizing After Results are Known) and which tests were planned (ruling out p-hacking). The Center for Open Science is currently running a promotion called the Preregistration Challenge, in which the authors of the first 1000 articles that pre-register and get published receive $1000.

Emery Boose

Emery discussed RDataTracker, a package that he helped develop that aids in creating reproducible workflows in R. RDataTracker gives information on data provenance including information on the hardware used to run the analyses and the versions of all relevant software. The package allows you to see what every step of the analysis does; what intermediate values are produced at all points of the analyses. This can be really useful for de-bugging your own code as well as determining whether someone else’s code is operating correctly.

Bill Michener

Bill Michener is from DataOne, an organization that aims to create tools to support data replication. They have developed workflow software (DMPtool.org) that make it easier to collate metadata along with datafiles and analyses. The software links with Ecological Metadata, github, dryad and the Open Science Framework among other programs.

 

The talks were compelling and most attendees stayed to listen and take part in discussions afterwards. Although the session was planned as a forum for discussing opposing views, there was no outright resistance to replication. It is probably the case that the session attracted people who were already in support of the idea. However, it’s also possible that strong opinions expressed in favour of replication made people reluctant to raise critical points or (perhaps wishfully) maybe the arguments made in the Ignite session were sufficiently compelling to convince the audience of the importance of its importance. In any case, it was inspiring to be surrounded by so many voices expressing support for replication. The possibility of replication studies becoming more common in ecology feels real!