by Hannah Fraser
Fiona Fidler and Tim Parker organized an Ignite session on Replication in Ecology at the Ecological Society of America Annual Meeting 2017 in Portland, U.S.A a few weeks ago. Ignite sessions start with a series of 5 minute talks on a similar topic that are followed by a panel discussion. At Fiona and Tim’s session more than 50 attendees listed to talks by Fiona Fidler, Clint Kelly, Kim LaPierre, David Mellor, Emery Boose, and Bill Michener.
Tim introduced the session by describing how it had arisen from discussions with journal editors. Tim and his colleagues have recently been successful in encouraging editors of many journals in ecology and evolutionary biology to support the Transparency and Openness Promotion guidelines but one of these guidelines – the one which encourages the publication of articles that replicate previously published studies – has proven unpalatable to a number of journal editors. The purpose of the Ignite session was to discuss the purpose and value of replication studies in ecology, to raise awareness of the thoughts shared by members of Transparency in Ecology and Evolution, and to take initial steps towards developing a consensus regarding the role of replication in ecology.
Fiona spoke first, describing the tension between the importance of replication and the perception that it is boring and un-novel. Replication is sometimes viewed as the cornerstone of science (following Popper): without replicating studies it is impossible to either falsify or verify findings. In contrast replication attempts are deemed boring if they find the same thing as the original study “someone else has already shown this”, and meaningless if they find different things “there could be millions of reasons for getting a different result”. However, there are actually a range of different types of replication studies which differ in their usefulness in terms of falsification, their novelty and the amount of resources required to achieve them. Fiona broke this down into two scales 1) using the same exact data collection procedure or completely different data collection procedures and 2) using the same exact analysis to using completely different analyses. A study that uses completely different data collation and analysis methods to investigate the same question is often termed a conceptual replication. Conceptual replication is reasonably common in ecology: people investigating whether an effect is true in a new context. However, there are very few studies that attempt to more directly replicate studies (i.e. by using the exact same data collection procedures and data analyses). However, finding a different result in these contexts doesn’t result in falsification or even, in many cases, scepticism about the findings of the original study because there are often so many uncontrollable differences between the two studies and any of these could have caused the studies to find different results. Fiona suggested that one way to enhance the relevance of all replications, but particularly these conceptual replications, could be to adopt a proposal from psychology and include a statement in every article about the constraints on generality. If all articles described the circumstances in which the authors would and would not expect to find the same patterns it becomes possible to use conceptual replications to falsify studies, or further delimit their relevance.
Previous work has shown that 1% of studies in psychology are replication studies. Clint described some of his recent work aimed at determining how many replication studies there have been in ecology. He text-mined open access journals on PubMed for papers with the word ‘replic*’ anywhere in their full text. He found that only a handful of studies attempted to replicate a previous study’s findings, and of these, only 50% claimed to have found the same result as the original study. These analyses suggest that there are many fewer replication studies occurring in ecology than in psychology. However, this value only accounts for direct replications that discuss the fact that they are replications of previous work. Conceptual replications are not included in this because it is not common practice to mention that they are replications, possibly because it makes the paper seem less novel. However, this valuable work suggests that the rate at which direct replication studies in ecology is abysmally low.
Kim discussed the Nutrient Network (NutNet) project which she described as a coordinated, distributed experiment but which could equally be seen as concurrent direct replications of the same experiment. The NutNet project aims to “collect data from a broad range of sites in a consistent manner to allow direct comparisons of environment-productivity-diversity relationships among systems around the world”. The same experimental design is used at 93 sites all continents except Antarctica. It’s a massive effort that is achieved with almost no funding as the participating researchers conduct the experiment using their existing resources.
David is from the Center for Open Science and discussed how to guarantee that results of replication studies are meaningful regardless of findings. Like Fiona, David advocated using constraints on generality statements in articles to describe the situations which you would reasonably expect your results to extend to. The majority of David’s talk, however was about preregistration which can be used for replication studies but is actually useful in many types of research. The idea is that, before you start your study, you ‘preregister’ your hypotheses (or research questions) and the methods you intend to use to collect and analyse the relevant data. This preregistration is then frozen in time and referenced in the final paper to clearly delineate what the original hypotheses were (ruling out HARKing – Hypothesizing After Results are Known) and which tests were planned (ruling out p-hacking). The Center for Open Science is currently running a promotion called the Preregistration Challenge, in which the authors of the first 1000 articles that pre-register and get published receive $1000.
Emery discussed RDataTracker, a package that he helped develop that aids in creating reproducible workflows in R. RDataTracker gives information on data provenance including information on the hardware used to run the analyses and the versions of all relevant software. The package allows you to see what every step of the analysis does; what intermediate values are produced at all points of the analyses. This can be really useful for de-bugging your own code as well as determining whether someone else’s code is operating correctly.
Bill Michener is from DataOne, an organization that aims to create tools to support data replication. They have developed workflow software (DMPtool.org) that make it easier to collate metadata along with datafiles and analyses. The software links with Ecological Metadata, github, dryad and the Open Science Framework among other programs.
The talks were compelling and most attendees stayed to listen and take part in discussions afterwards. Although the session was planned as a forum for discussing opposing views, there was no outright resistance to replication. It is probably the case that the session attracted people who were already in support of the idea. However, it’s also possible that strong opinions expressed in favour of replication made people reluctant to raise critical points or (perhaps wishfully) maybe the arguments made in the Ignite session were sufficiently compelling to convince the audience of the importance of its importance. In any case, it was inspiring to be surrounded by so many voices expressing support for replication. The possibility of replication studies becoming more common in ecology feels real!