All posts by Tim Parker

Prior probability and reproducibility

Physicist Carl Sagan famously said “Extraordinary claims require extraordinary evidence.”  I think its useful to extend this to the distinctly less elegant “surprising findings are less likely to be true, and thus require a higher standard of evidence.”

I started thinking more about what influences the reliability of a scientific result when analyses of my post-doc data weren’t lining up with published findings from other studies of the same species. When I encountered this problem with reproducibility, the causes I first focused on were old standbys like multiple tests of the same hypothesis driving up type I error and the flexibility to interpret an array of different results as support for a hypothesis.  What I wasn’t thinking about was low prior probability – if we test an unlikely hypothesis, support for that hypothesis (e.g., a statistically significant result) is more likely to be a false positive than if we’re testing a likely hypothesis. Put another way, a hypothesis that would be surprising if true is, in fact, less likely to be true if it contradicts well-supported prior empirical understanding or if it is just one of many plausible but previously unsupported alternate hypotheses. Arguments that I’ve heard against taking prior probability into account are that it isn’t ‘fair’ to impose different standards of evidence on different hypotheses, and that it introduces bias. I think the risk of bias is real (we probably overestimate the probability of our own hypotheses being true), but I think the argument about fairness is misleading. Let’s consider an example where we have a pretty good idea of prior probability.

A couple of months ago, I saw some photos on Twitter of a cat in the genus Lynx running down the street in Moscow, Idaho, a university town not that far from where I live in nearby Washington State. The tweet asked ‘bobcat or lynx?’ Bobcats (Lynx rufus) are fairly common in this part of North America, but Canada lynx (Lynx canadensis) are extremely rare, and it was exciting to contemplate the possibility that there was a Canada lynx right in the in the midst of Moscow. Most of the folks who replied to the tweet looked at the photos, decided that it looked a bit more like a lynx (spots somewhat indistinct, hind legs possibly longer, tail tip possibly lacking white tuft) and thus voted ‘lynx’. But these folks seemed to be ignoring prior probability. Here in Washington State, there are probably something on the order of 1000 bobcats for every lynx, so let’s assume that a few miles across the border in Idaho it is still approximately 1000 times more likely that a bobcat is running down the streets of Moscow than a lynx. If the people weighing in on Twitter are good at distinguishing bobcats from lynx and only mistakenly call a bobcat a lynx 1 in 100 times, that means out of 1000 bobcat photos they might see, they’re going to mistakenly call 10 of those bobcats ‘lynx’. However, if bobcats and lynx are photographed in proportion to their abundance, they are going to see only one lynx photo for those 1000 bobcat photos.   Thus they’re going to make 10 times the number of false positive ‘lynx’ identifications than actual lynx identifications (I’m ignoring the possibility of a false negative, but the results are about the same assuming people are good at identifying lynx). To make the same number of false positive ‘lynx’ identifications as actual lynx identifications, they’d have to mistakenly call a bobcat a ‘lynx’ only 1 in 1000 times. In other words, if they were 99.9% reliable, they’d still have only a 50:50 chance of being correct when their call was ‘it’s a lynx’. So, when someone comes to me in Washington State and say’s “I saw a bobcat”, I’m more inclined to believe them than if someone comes to me and says “I saw a lynx.” I have different standards of evidence for these two outcomes, and that’s the way it should be. If I didn’t, I’d think there were far more lynx than there actually are. This mistake is what caused the California Department of Fish and Wildlife to conclude they had a healthy population of wolverines when in fact they had none.

So, if we have a notion of prior probability, it’s appropriate to adjust our confidence in results accordingly when we’re evaluating evidence, for instance, while reading scientific papers. If we encounter a finding that makes us say ‘that’s surprising’, then we ought to demand more evidence than if we encounter a finding that makes us say ‘that’s what we should have expected based on all this other stuff we’ve seen already’.

The problem, of course, is that unlike the lynx and bobcat situation, we rarely have a precise prior probability. We don’t even know what the typical range of prior probabilities is in ecology and evolutionary biology research, and since we have reasons to believe that there’s quite a bit of bias in the literature, we can’t easily figure this out. Further, I suspect that prior probabilities vary among sub-fields. My feeling is that in behavioral ecology, my primary subfield, we probably test hypotheses with lower prior probabilities than some other subfields of ecology or evolutionary biology. Where does that feeling come from? I like the ‘clickable headline’ test. The more ‘clickable’ the idea, the more likely it is to be surprising, and things that surprise us are presumably (on average) ideas that contradict existing evidence or understanding. This sort of clickable headline seems common (not ubiquitous, just common) in behavioral ecology, but I’m quick to admit that this is just an untested opinion. I’d like to see data. And regardless, it’s not wrong to test clickable hypotheses. If we didn’t test surprising hypotheses, we wouldn’t push the boundaries of our knowledge. However, I think we’d be better off if we treated clickable findings more cautiously. As authors, we can acknowledge the need for more evidence when we have a surprising finding, and as reviewers we can ask authors to provide these caveats. Further, before we re-direct our research program or advocate for major policy change in response to some new surprising result, we should accumulate particularly robust evidence, including one or more pre-registered replication studies (ideally registered replication reports, which are available to ecologists and evolutionary biologists at Royal Society Open Science).

p.s. Talking about prior probabilities means I should probably be explicitly discussing Bayes. I haven’t done so mostly because I’m not very knowledgeable about Bayesian statistics, but also because I think that just taking a step towards awareness of this issue can improve our inferential practices even if we’re using frequentist tools.

An iconic finding in behavioral ecology fails to reproduce

Just how reproducible are studies in ecology and evolutionary biology? We don’t know precisely, but a new case study in the journal Evolution shows that even textbook knowledge can be unreliable. Daiping Wang, Wolfgang Forstmeier, and co-authors have convinced me of the unreliability of an iconic finding in behavioral ecology, and I hope their results brings our field one step closer to a systematic assessment of reproducibility.

When I was doing my PhD, one of the hottest topics in behavioral ecology was the evolutionary origin of sexual ornaments. A tantalizing clue was the existence of latent female preferences – preferences that females would express if a mutation came along that produced the right male proto-ornament. One of the first hints of latent preferences was detected by Nancy Burley in female zebra finches by fitting male finches with leg bands of different colors. It turned out that a red band was attractive, a green band unattractive.  Multiple studies appeared to support the original, and the story entered textbooks.

But now it’s non-reproducible textbook knowledge. Wang et al. report on multiple robust replication attempts that failed to reproduce this effect. So where does this leave us? It could be that the original effect was real, but contingent on some as-yet-undiscovered moderator variable. That hypothesis can never be disproven, but if someone wants to make that argument, it’s on them to identify the mysterious moderator and show how the color leg band effect can be reproducible. Until then, I’m adding the color band attractiveness effect to the list of things I learned in graduate school that were wrong.

By the way, in this case, ‘not reproducible’ means an average effect size that approximates zero. This is not just a case of one study crossing a significance threshold and another failing to cross the threshold. The sum of these replications looks exactly like the true absence of an effect.

It’s also worth noting that the distribution of published results from the lab that originally discovered the color band effect follows the pattern expected from various common research practices that unintentionally increase the publication of false positives and inflated effect sizes. I don’t mention this as an accusation, but rather as a reminder to the community that if we don’t take deliberate steps to minimize bias, its likely to creep in and reduce our reproducibility.

A conversation: Where do ecology and evolution stand in the broader ‘reproducibility crisis’ of science?

In this post, I float some ideas that I’ve had about the ‘reproducibility crisis’ as it is emerging in ecology and evolutionary biology, and how this emergence may or may not differ from what is happening in other disciplines, in particular psychology. Two other experts on this topic (Fiona Fidler and David Mellor) respond to my ideas, and propose some different ideas as well. This process has led me to reject some of the ideas I proposed, and has led me to what I think is a better understanding of the similarities (and differences) among disciplines.

Here’s why my co-authors are experts on this topic (more so than I am):

Fiona’s PhD thesis was about explaining disciplinary differences between psychology, ecology and medicine in their responses to criticism of null hypothesis significance testing, and she’s been interacting with researchers from multiple disciplines for 20 years. She often works closely with ecologists, but she has the benefit of an outsider’s perspective.

David has a PhD in behavioral ecology, and now works at the Center for Open Science, interacting on a daily basis with researchers from a wide range of disciplines as journals and other institutions adopt transparency standards.

TP: Several years ago, Shinichi Nakagawa and I wrote a short opinion piece arguing that ecology and evolutionary biology should look to other disciplines for ideas to reduce bias and improve the reliability of our published literature. We had become convinced that bias was common in the literature. Evidence of bias was stacking up in other disciplines as well, and the risk factors in those discipline seemed widespread in ecology and evolution. People in those other disciplines were responding with action. In psychology, these actions included new editorial policies in journals, and major efforts to directly assess reproducibility with large-scale replications of published studies. Shinichi and I were hoping to see something similar happen in ecology and evolutionary biology.

To an important extent ecologists and evolutionary biologists have begun to realize there is a problem, and they have started taking action. Back in 2010, several journals announced they would require authors to publicly archive the data behind their reported results. This wasn’t a direct response to concerns about bias, but it was an important step towards ecologists and evolutionary biologists accepting the importance of transparency. In 2015 representatives from about 30 major journals in ecology and evolutionary biology joined advocates for increased transparency to discuss strategies for reducing bias. From this workshop emerged a consensus  that the recently-introduced TOP (Transparency and Openness Promotion) guidelines would be a practical way to help eco-evo journals implement transparency standards. Another outcome was TTEE (Tools for Transparency in Ecology and Evolution), which were designed to help journals in ecology and evolutionary biology implement TOP guidelines. A number of journals published editorials stating their commitment to TOP. Many of these journals have now also updated their editorial policies and instructions to authors to match their stated commitments to transparency. A few pioneering journals, such as Conservation Biology, have instituted more dramatic changes to ensure, to the extent possible, that authors are fully transparent regarding their reporting. A handful of other papers have also been published, reviewing evidence of bias or making recommendations for individual or institutional action.

Despite this long list of steps towards transparency, it seems to me that the groundswell seen in psychology has not yet transpired in ecology and evolution. For instance, only one ecology or evolution journal (BMC Ecology) has yet adopted registered reports (the most rigorous way to reduce bias on the part of both authors and journals), and there has been only one attempt to pursue a major multi-study replication effort and it has not yet gained major funding.

FF: At this point I feel the need to add that what Tim wrote above does describe an incredible amount of action in a short time in the disciplines of ecology and evolution. It might be harder to see the change when you’ve made it yourself 🙂

TP: I agree that there have been important changes, but it seems to me that many ecologists and evolutionary biologists remain unconvinced or unaware of the types of problems that led Shinichi and me to try to kick start this movement in the first place. A few months ago the Dynamic Ecology blog conducted an informal survey asking “What kind of scientific crisis is the field of ecology having?” Only about a quarter of those voting were convinced that ecology was having a crisis, and only about 40% of respondents thought a reproducibility crisis was the sort of crisis ecology was having or was most likely to have in the future. So, ecologists (at least those who fill out surveys on the Dynamic Ecology blog), aren’t convinced there is a crisis, and even if there is a crisis, they’re not convinced that it’s in the form of the ‘reproducibility crisis’ discussed so much recently in psychology, medicine, economics, and some other disciplines. Of course not everyone in psychology thinks there’s a crisis either, but my sense is that the notion of a crisis is much more widely accepted there.

So why aren’t ecologists and evolutionary biologists more concerned? We’ve got the risk factors for a reproducibility crisis in abundance. What’s different about perceptions in ecology and evolutionary biology? I don’t claim to know, but I entertain several hypotheses below.

It seems highly plausible to me that many in ecology and evolution have simply not seen or appreciated the evidence needed to convince them that there is a problem. In psychology, one of the catalysts of the ‘crisis’ was the publication of an article in a respected journal claiming to have evidence that people could see into the future. The unintended outcome of this article, the conclusions of which were largely rejected by the field, was that many researchers in psychology realized that false results could emerge from standard research practices, and this was unsettling to many. In ecology and evolution, we haven’t experienced this sort wake-up call.

DM: I think that was a huge wake-up call, that something so unlikely could be presented with the same standard techniques that every study used. In eco/evo, the inherent plausibility (dare I say, our priors), may more difficult to judge, so a wild claim presented with flimsy evidence is not as easily spotted as being so wild.

However, I think a major underlying cause is the lack of value given to direct replication studies. Direct replications are the sad workhorse of science: they’re the best way to judge the credibility of a finding but virtually no credit is given for conducting them (and good luck trying to get one funded!). I think that a subset of psychological research was fairly easy to replicate using inexpensive study designs (e.g. undergraduate or online research participants), and so some wild findings were somewhat easy to check with new data collection.

In ecology, there are certainly some datasets that can be fairly easily re-collected, but maybe not as many. Furthermore, I sense that ecologists have an easier time attributing a “failure to replicate” to either 1) as of yet unknown moderating variables or 2) simple environmental change (in field studies). So the skepticism may be less sharp on published claims.

FF: At the moment, my research group is analysing data from a survey we did of over 400 ecology and evolution researchers, asking what they think about the role of replication in science. So far our results suggest that the vast majority of researchers think replication is very important. We’ve been a bit surprised by the results. We were expecting many more researchers to be dismissive of direct replication in particular, or to argue that it wasn’t possible or applicable in ecology. But in our survey sample, that wasn’t a mainstream view. Of course, it’s hard to reconcile this with its virtual non-existence of direct replication in the literature. We can really only explain the discrepancy by appealing to institutional (e.g., editorial and grant policies) and cultural norms (e.g., what we believe gets us promoted). In ecology, neither have been broken to the extent that they have in psychology, despite individual researchers having sound intuitions about the importance of replication.

TP: Another possibility to explain why so many ecologists and evolutionary biologists remain unconvinced that there is a replication crisis is that bias may actually be less widespread in ecology and evolutionary biology than in psychology. Let me be clear. The evidence that bias is a serious problem in ecology and evolutionary biology is compelling. However, this bias may be less intense on average than in psychology, and it may be that bias varies more among sub-disciplines within eco-evo, so there may be some ecologists and evolutionary biologists who can, with good reason, be confident in the conclusions drawn in their subdiscipline.

FF: Hmm, I think it’s more likely that psychologists are simply more accepting that bias is a real thing that’s everywhere, because they are psychologists and many study bias as their day job.

TP: OK, I buy that psychologists may be more open to the existence of bias because its one of the things psychologists study. However, I’d like to at least consider some possibilities of differences in bias and some other differences in perception of bias.

For instance, maybe in subdisciplines where researchers begin with strong a priori hypotheses, they are more likely to use their ‘researcher degrees of freedom’ to explore their data until they find patterns consistent with their hypothesis. This is a seriously ironic possibility, but one I’ve warmed to. The relevant flip side to this is that many researchers in ecology and evolution (though I think more often in ecology) often conduct exploratory studies where they have no reason to expect or hope for one result over another, and readily acknowledge the absence of strong a priori hypotheses. This could lead to less bias in reporting, therefore greater reliability of literature, and more of a sense that the literature is reliable. I should point out, though, that bias can still emerge in the absence of a priori hypotheses if researchers are not transparent about the full set of analyses they conduct, and I know this happens at least some of the time.

FF:  So there are two claims. First, that if you have strong a priori hypotheses you might be more likely to use researcher degrees of freedom. This certainly seems plausible. You really want your hypotheses to be true, so you’re more inclined to make it so. Second, researchers in ecology and evolution are less likely to have strong a priori hypotheses than researchers in psychology. The latter is a disciplinary difference I just don’t see, but it’s an empirical question. It’s a great sociology of science question.

TP: Well,I like empirical questions, and I’d certainly like to know the answer to that one.

Moving on to throw out yet another hypothesis, it is my relatively uniformed perception that there is probably much more heterogeneity in methods across ecology and evolutionary biology than across psychology. If some methods present fewer ‘researcher degrees of freedom’, then bias may be less likely in some cases.

FF: This reminds me of older attempts to demonstrate grand differences between the disciplines. For example, there’s a common perception that the difference between hard and soft sciences is that physics etc are more cumulative than psychology and behavioural sciences. But attempts to pin this down, like this one from Larry Hedges, shows there are more similarities than differences. I’m generally pretty skeptical about attributing differences in research practice to inherent properties of what we study. They usually turn out to be explained by more mundane institutional and social factors.

TP: Well, this next idea is subject to the same critique, but I’ll present it anyway. Statistical methods may be much more heterogeneous across sub-disciplines, and even across studies within subdisciplines of ecology and evolution. This could mean that some researchers are conducting analyses in ways that are actually less susceptible to bias. It could also mean that researchers fail to recognize the risks of bias in whatever method they are using because they focus on the differences between their method and other more widespread methods. In other words, many ecologists and evolutionary biologists may believe that they are not at risk of bias, even if they are.

FF: If you look at very particular sub-fields you may well find differences, but my bet is these can be explained by the cultural norms of a small group of individuals (e.g., the practices in particular labs that have a shared academic lineage).

TP: There certainly are some sub-disciplines where a given stats practice has become the norm, such as demographers studying patterns of survival by comparing and averaging candidate models using AIC and the ‘information theoretic’ approach. I’m not prepared to say how common this sort of sub-field standardization is, however.

Again, on to another hypotheses. Some ecologists and evolutionary biologists test hypotheses that are likely to be true, and some test hypotheses that are unlikely to be true. It is not widely recognized, but it is easily shown that testing unlikely hypotheses leads to a much higher proportion of observed relationships being due to chance (when real signal is rare, most patterns are just due to noise). It may be that unlikely hypotheses are more common in psychology, and thus their false positive rate is higher on average than what we experience in ecology and evolutionary biology. I strongly suspect that the likelihood of hypotheses varies a good bit across ecology and evolutionary biology, but certainly if you’re in a subdiscipline that mostly tests likely hypotheses, it would be reasonable to have more confidence in that published literature.

FF: I don’t really know what to say about this. It could be that better researchers test more hypotheses that are likely. Or maybe not. Maybe crummy researchers do, because they just go for low-hanging fruit. I concede that the a prior likelihood of a hypothesis being true would definitely be be correlated to something, but not that it would be a property of a discipline.

TP: Well, I’m not quite done with my ‘property of a discipline’ hypotheses, so here’s another. In some subfields of psychology, conducting a publishable study requires substantially less work than in many subfields of ecology and evolutionary biology. For instance, as David mentioned earlier, papers in psychology are sometimes based on answers to a few hundred surveys administered to undergraduate students (a resource that’s not in short supply in a university). If studies are easy to come by, then opting not to publish (leaving a result in the proverbial file drawer) is much cheaper. In eco/evo, gathering a comparable amount of data might take years and lots of money, so it’s not so easy to just abandon an ‘uninteresting’ result and go out and gather new data instead.

FF: It’s not clear to me how big the file drawer problem is in any discipline. To be clear, I’m not saying publication bias isn’t a problem. We know it is. But are whole studies are really left in file drawers, or are they cherry picked and p-hacked back into the literature? There is a little less publication bias in ecology (~74% of papers publish ‘positive’ results compared to psychology’s ~92%) but there is probably also slightly lower statistical power. Tim’s explanation is not implausible, but I doubt we currently have enough evidence to say either way.

TP: As David mentioned briefly above, in ecology and evolutionary biology, dramatic differences among study systems (different species, different ecosystems, even stochastic or directional change over time in the ‘same’ system) make it easy to believe that differences in results among studies are due to meaningful biological differences among these studies. It seems that we do not take the inevitability of sampling error seriously, and thus rarely seriously consider the fact than many reported findings will be wrong (even WITHOUT the bias that we know is there and that should be elevating the rate of incorrect findings).

DM: This is related to the fact that in ecology and evolutionary biology, there’s no culture of direct replication. If most studies are conducted just once, there’s no reliable way to assess their credibility. If a study is replicated, it’s usually couched as a conceptual replication with known differences in the study. That new twist is the intellectual progeny of the author. If the results aren’t the same as the original, chalk it up to whatever those differences were. However, direct replications, where the expectation is for similar results, are the best way to assess credibility empirically.

This lack of direct replication has led to plausible deniability that there is any problem. And since there is no perceived problem, there is no need to empirically look for a problem (only a real troublemaker would do that!).

TP: We are clearly in agreement here, David. Now we just need to figure out how to establish some better institutional incentives for replication.

While we’re planning that, I’ll throw out my last hypothesis, which if right, would mean that all my other hypotheses were largely unnecessary. Psychology is a much larger discipline than ecology and evolutionary biology. Because of this, it may be that the number of people actively working to promote transparency in psychology is larger overall, but is a similar proportion to the number working in ecology and evolutionary biology.

FF: This seems very likely to me, and also something we should calculate sometime.

What I found in my PhD research on attempts to reform statistical practices through the 1970s-2000s (i.e., to get rid of Null Hypothesis Significance Testing) was the medicine banned it (and it snuck back in), psychology showed some progress, and ecology was behind at that time. But almost all disciplinary differences turn out to be institutional and social/cultural, rather than an inherent property of studying that particular science.

This scientific reform about reproducibility differs from the NHST one because the main players are much more aware of best behaviour change practices. The NHST reform was lead by cranky old men (almost exclusively!) writing cranky articles that often insulted researchers intelligence and motives. This new reform has by and large been led by people who know how to motivate change. (There are some early exceptions here.) Psychologists should be ahead of this game, given their core business.

DM: I think psychologists are certainly aware of bias, but ecologists are too. I suspect that a missing element is one of those outstanding claims that deserves to be checked. Results that seem “too good to be true” probably are, and identifying those will likely be the first step to assessing credibility of a field’s body of work through direct replication.

TP: Thanks to Fiona and David for engaging in this discussion. Here some brief take-homes:

  1. It may be that psychologists are NOT considerably more concerned about the replication crisis than are ecologists and evolutionary biologists. Instead it may be that the much larger number of psychology researchers means there are more concerned psychologists only in absolute numbers, but similar numbers proportionally.


  1. To the extent that psychologists may have greater levels of concerns about reproducibility, much of this may be attributable to a single major event in psychology in which a result widely believed to be false was derived through common research practices and published in a respectable journal. It may also be that psychologists tend to be more comfortable with the idea that they have biases that could influence their research.


  1. Ecologists may recognize the value of replication, but their use of replication to assess validity of earlier conclusions is too rare to have led them to see low rates of replicability.


  1. Some of the other ideas we discussed above may be worth empirical exploration, but we should be aware that hypotheses rooted in fundamental differences between disciplines have often not been strongly supported in the past.

guest post: Reproducibility Project: Ecology and Evolutionary Biology

Written by: Hannah Fraser

The problem

As you probably already know, researchers in some fields are finding that it’s often not possible to reproduce others’ findings. Fields like psychology and cancer biology have undertaken large-scale coordinated projects aimed at determining how reproducible their research is. There has been no such attempt in ecology and evolutionary biology.

A starting point

Earlier this year Bruna, Chazdon, Errington and Nosek wrote an article citing the need to start this process by reproducing foundational studies. This echoes early research undertaken in psychology and cancer biology reproducibility projects attempting to reproduce the fields’ most influential findings. Bruna et al’s focus was on tropical biology but I say why not the whole of ecology and evolutionary biology!

There are many obstacles to this process, most notably obtaining funding and buy-in from researchers, but it is hard to obtain either of these things without a clear plan of attack. First off, we need to decide on which ‘influential’ findings we will try to replicate and how we are going to replicate them.

Deciding on what qualifies as an influential finding is tricky and can be controversial. In good news, this year an article came out that has the potential to (either directly or indirectly) answer this question for us. Courchamp and Bradshaw (2017)’s “100 articles every ecologist should read” provides a neat list of candidate influential articles/findings. There are some issues with biases in the list which may make it unsuitable for our purposes but at least one list is currently being compiled with the express purpose of redressing these biases. Once this is released it should be easy to use some combination of the two lists to identify – and try to replicate – influential findings.

What is unique about ecology and evolutionary biology?

In psychology and cancer biology where reproducibility has been scrutinised, research is primarily conducted inside and based on experiments. Work in ecology and evolutionary biology is different in two ways: 1) it is often conducted outside, and 2) a substantial portion is observational.

Ecology and evolutionary biology are outdoor activities

Conducting research outdoors means that results are influenced by environmental conditions. Environmental conditions fluctuate through time, influencing the likelihood reproducing a finding in different years. Further, climate change is causing directional changes in environmental conditions, which may mean that you might not expect to reproduce a finding from 20 years ago this year. I’ve talked to a lot of ecologists about this troublesome variation and have been really interested to find two competing interpretations:

1) trying to reproduce findings is futile because you would never know whether any differences were reflective of the reliability of the original result or purely because of changes in environmental conditions

2) trying to reproducing findings is vital because there is so much environmental variation that findings might not generalise beyond the exact instance in space and time in which the data were collected – and if this is true the findings are not very useful.

Ecology and evolutionary biology use observation

Although some studies in ecology and evolutionary biology involve experimentation, many are based on observation. This adds even more variation and can limit and bias how sites/species are sampled. For example, in a study on the impacts of fire, ‘burnt’ sites are likely to be clustered together in space and share similar characteristics that made them more susceptible to burning that the ‘unburnt’ sites, biasing the sample of sites. Also, the intensity of the fire may have differed even within a single fire, introducing uncontrolled variation. In some ways, the reliance on observational data is one of the greatest limitations in ecology and evolutionary biology. However, I think it is actually a huge asset because it could make it more feasible to attempt reproducing findings.

Previous reproducibility projects in experimental fields have either focussed on a) collecting and analysing the data exactly according to the methods of the original study, or b) using the data collected for the original analysis and re-running the original analysis. While ‘b’ is quite possible in ecology and evolutionary biology, this kind of test can only tell you whether the analyses are reproducible… not the pattern itself. Collecting the new data required for ‘a’ is expensive and labour intensive. Given limited funding and publishing opportunities for these ‘less novel’ studies, it seems unlikely that many researchers will be willing or able to collect new data to test whether a finding can be reproduced. In an experimental context, examining reproducibility is tied to these two options. However, in observational studies there is no need to reproduce an intervention, so only the measurements and the context of the study need to be replicated. Therefore, it should be possible to use data collected for other studies to evaluate how reproducible a particular finding is.

Even better, many measurements are standard and have already been collected in similar contexts by different researchers. For example, when writing the lit review for my PhD I collated 7 Australian studies that looked at the relationship between the number of woodland birds and treecover, collected bird data using 2ha 20 minute bird counts and recorded the size of the patches of vegetation. It should be possible to use the data from any one of these studies to test whether the findings of another study are reproducible.

Matching the context of the study is a bit more tricky. Different inferences can be made from attempts to reproduce findings in studies with closely matching contexts than those conducted in distinctly different contexts. For example, you might interpret failure to reproduce a finding differently if it was in a very similar context (e.g. same species in the same geographic and climatic region) than if the context was more different (e.g. sister species in a different country with the same climatic conditions). In order to test the reliability of a finding you should match the context closely. In order to test the generalisability of a finding should match the context less closely. However, determining what matches a study’s context is difficult. Do you try to match the conditions where the data were collected or the conditions that the article specifies it should generalise to? My feeling is that trying to replicate the latter is more relevant but potentially problematic.

In a perfect world, all articles would provide a considered statement about which conditions they would expect their results to generalise to (Simons et al 2017). Unfortunately, many articles overgeneralise to increase their probability of publication which may mean that findings appear less reproducible than they would have if they’d been more realistic about their generalisability.

Where to from here?

This brings me to my grand plan!

I intend to wait a few months to allow the competing list (or possibly lists) of influential ecological articles to be completed and published.

I’ll augment these lists with information on the studies’ data requirements and (where possible) statements from the articles about the generalisability of their findings. I’ll share this list with you all via a blog (and a page that I will eventually create on the Open Science Framework).

Once that’s done I will call for people to check through their datasets to see whether they have any data that could be used to test whether the findings of these articles can be reproduced. I’m hoping that we can all work together to arrange reproducing these findings (regardless of whether you have data and/or the time and inclination to re-analyse things).

My dream is to have the reproducibility of each finding/article tested across a range of datasets so that we can 1) calculate the overall reproducibility of these influential findings, 2) combine them using meta-analytic techniques to understand the overall effect, and 3) try to understand why they may or may not have been reproduced when using different datasets. Anyway, I’m very excited about this! Watch this space for further updates and feel free to contact me directly if you have suggestions or would like to be involved. My email is

Replication: step 1 in PhD research

Here are a few statements that won’t surprise anyone who knows me. I think replication has the potential to be really useful. I think we don’t do nearly enough of it and I think our understanding of the world suffers from this rarity. In this post I try to make the case for the utility of replication based on an anecdote from my own scientific past.

A couple of years ago Shinichi Nakagawa and I wrote a short opinion piece about replication in ecology and evolutionary biology. We talked about why we think replication is important and how we can interpret results from different sorts of replications, and we also discussed a few ideas for how replication might become more common. One of those ideas was for supervisors to expect graduate students to replicate part of the previously published work that inspired their project. When Shinichi and I were writing that piece, I didn’t take the time to investigate the extent to which this already happens, or even to think of examples of it happening.

Then out of the blue the other day, it occurred to me that I’d seen this happen up-close with one of my own findings. First some background. Bear with me and I’ll try to be brief. When I was a naïve master’s student (with a hands-off adviser who had at least one foot in retirement), I decided to test Tom Martin’s ideas about nest predators shaping bird species co-existence, but in a new study system: the shrub nesting bird community at Konza Prairie in Kansas (by the way, this anecdote is NOT about my choice to do a conceptual replication for my MSc work). Anyway, I was gathering all the data myself, trying to find as many nests as I could from multiple species, monitoring those nests to determine predation outcomes, and measuring vegetation around each nest. I bit off more than I could chew, but I wanted to be done in one field season. I was in a hurry for some reason – not a recipe for sufficient statistical power. Instead, it was a recipe for an ambiguous test of the hypothesis since I didn’t find many nests for most bird species. I did, however, find a decent number of nests of one species: Bell’s vireo. Among the more than 60 vireo nests I found, I noticed something striking – brood parasitic cowbirds laid eggs in many of them, and if a cowbird egg hatched in a vireo nest, all vireo chicks were outcompeted and died. What was really interesting is that vireos abandoned many parasitized nests before cowbird eggs hatched and these vireos appeared to re-nest up to seven times in a season. I first thought this was evidence of an adaptation in Bell’s vireos to avoid parasitism by cowbirds via re-nesting (that’s another story), but I ended up publishing a paper that pointed out that the number of vireo eggs in the nest (rather than the number of cowbird eggs) was the best predictor of vireo nest abandonment. Thus it seemed like a response to egg loss (cowbirds remove host eggs) by Bell’s vireos might explain their nest abandonment and therefore how they could persist despite high brood parasitism. Now on to the heart of the story.

Several years later, after doing a PhD elsewhere, I found myself back in Kansas. A new K-State PhD student (Karl Kosciuch – who was one of Brett Sandercock’s first students) arrived and was excited about the Bell’s vireo –cowbird results I had reported. Looking back on it, this is a textbook case of how exploratory work and replication should go together. I found a result I wasn’t looking for. Someone else came along and thought it was interesting and wanted to build on it but decided to replicate it first. Karl did several things for his PhD, but one of them was simply to replicate my observational data set with an even bigger sample. He found the same pattern, thus dramatically strengthening our understanding of this system, and strongly justifying follow-up experiments. I actually joined Karl for one of these experiments, and it was very satisfying behavioral ecology. It turned out that it really is loss of their own eggs that induce Bell’s vireos to abandon and that cowbird eggs do not induce nest abandonment on their own.

This study had a happy ending for all involved, but what if Karl’s replication of my correlative study had failed to support my result? Well, for one it hopefully would have saved Karl the trouble of pursuing an experiment based on a pattern that wasn’t robust. Such an experiment would presumably have failed to produce a compelling result, and then would have left Karl wondering why. Were the experimental manipulations flawed? Was his sample size too small? Was there some unknown environmental moderator variable? Further, although the population of Bell’s vireo we studied is not endangered, the sub-species in Southern California is and one of the primary threats to that endangered population has been cowbird parasitism. My result had been discussed as evidence that Bell’s vireo populations might be able to evolve nest abandonment as an adaptive response to cowbird parasitism. If no replication had been conducted and only an unconvincing experiment had been produced, this flawed hypothesis might have persisted with harmful outcomes to management practices of Bell’s vireo in California.

I think there’s a clear take-home message here. Students benefit from replicating previously published studies that serve as the basis for their thesis research. Of course it’s not just students who can benefit here – anyone who replicates foundational work will reduce their risk of building on an unreliable foundation. And what’s more we all benefit when we can better distinguish reliable and repeatable results from those which are not repeatable.

I’m curious to hear about other replications of previously published results that were conducted as part of the process of building on those previously published results.

Is overstatement of generality an Open Science issue?

I teach an undergraduate class in ecology and every week or two I have the students in that class read a paper from the primary literature. I want them to learn to extract important information and to critically evaluate that information. This involves distinguishing evidence from inference and identifying assumptions that link the two. I’m just scratching the surface of this process here, but the detail I want to emphasize in this post is that I ask the students to describe the scope of the inference. What was the sampled population? What conclusions are reasonable based on this sampling design? This may seem straightforward, but students find it difficult, at least in part because the authors of the papers rarely come right out and acknowledge limitations on the scope of their inference. Authors expend considerable ink arguing that their findings have broad implication, but in so doing they often cross the line between inference and hypothesis with nary a word. This doesn’t just make life difficult for undergraduates. If we’re honest with ourselves, we should admit that it’s sloppy writing, and by extension, sloppy science. That said, I’m certainly guilty of this sloppiness, and part of the reason is that I face incentives to promote the relevance of my work. We’re in the business of selling our papers (for impact factors, for grant money, etc.). Is this sloppiness a trivial outcome or a real problem of the business of selling papers? I think it may lean towards the latter. Having to train students to filter out the hype is a bad sign. And more to the point of this post, it turns out that our failure to constrain inferences may hinder interpretation of evidence that accumulates across studies.

For years my work to encourage recognition of constraints on inference has been limited to my interaction with students in my class. That changed recently when I heard about a movement to promote the inclusion of ‘Constraints on Generality’ (COG) statements in research papers. My colleagues Fiona Fidler and Hannah Fraser made the jaunt from Melbourne over to the US to attend ESA in August (to join me in promoting and exploring replication in ecology), but they first flew to Virginia to attend the 2nd annual SIPS (Society for the Improvement of Psychological Science) conference where they heard about COG statements (there’s now a published paper on the topic by Daniel Simons, Yuichi Shoda, and Stephen Lindsay). In psychology there’s a lot of reflection and deliberation regarding reducing bias and improving empirical progress, and the SIPS conference is a great place to feel that energy and to learn about new ideas. The idea for a paper on COG statements apparently emerged from the first SIPS meeting, and the COG statement pre-print got a lot of attention in the 2nd meeting this year. It’s easy to see the appeal of a COG statement from the standpoint of clarity. But there’s more than just clarity. One of the justifications for COG statements comes from a desire to more readily interpret replication studies. A perennial problem with replications is that if the new study appears to contradict the early study, the authors of the earlier study can point to the differences between the two studies and argue that the second study was not a valid test of the conclusions of the original. This may seem true. After all, whenever conditions differ between two studies (and conditions ALWAYS differ to some extent), we can’t eliminate the possibility that the differences between the two studies result from the differences in conditions. However, we’re typically going to be interested in a result only if generalizes beyond the narrow set of conditions found in a single study. In a COG statement, the authors state the set of conditions under which they expect their finding to apply. The COG statement then sets a target for replication. With this target set, we can ask: What replications are needed to assess the validity of the inference within the stated COG? What work would be needed to expand the boundaries of the stated COG? As evidence accumulates, we can then start to restrict or expand the originally stated generality.

In a COG statement, authors will face conflicting incentives. Authors will still want to sell the generality of their work, but if they overstate the generality of their work, they increase the chance of being contradicted by later replication. That said, it’s important to note that a COG doesn’t simply reflect the whims of the authors. Authors need to justify their COG with explicit reference to their sampling design and to existing theoretical and experimental understanding. A COG statement should be plausible to experts in the field.

I started this post by discussing the scope of inference that’s reasonable from a given study, but although this is clearly related to the constraints on generality, a COG statement could be broader than a statement about the scope of inference. Certainly as presented by Simons et al., COG statements will typically expand the scope of generality beyond the sampled population. I haven’t yet resolved my thinking on this difference, but right now I’m leaning towards the notion that we should include both a scope of inference statement and a constraints on generality statement in our papers, and that they should be explicitly linked. We could state the scope of our inference as imposed by our study design (locations, study taxa, conditions, etc.), but then we could argue for a broader COG based on additional lines of evidence. These additional lines of evidence might be effects reported by other studies of the same topic, or might be qualitatively different forms of evidence, for instance based on our knowledge of the biological mechanisms involved. Regardless, more explicit acknowledgements of the constraints on our inferences would clearly make our publications more scientific. I’d love to have some conversations on this topic. Please share comments below.

Before signing off, I want to briefly mention practical issues related to the adoption of COG (and/or scope of inference) statements. Because scientists face an incentive to generalize, it seems that a force other than just good intentions of scientists may be required for this practice to spread. This force could be requirements by journals. However, many journals also face incentives to promote over-generalization from study results. That said, there are far fewer journals than there are scientists, so it might be within the realm of possibility to convince editors, in the name of scientific quality, to add requirements for COG statements. I can think of roles that funders could play here too, but these would be less direct and maybe less effective than journal requirements. I’m curious what other ideas folks have for promoting COG / scope of inference statements. Please share your thoughts!

Ecologists and evolutionary biologists can and should pre-register their research

I wrote a draft of this post a few weeks ago, and now seems like a good time for it to see the light of day given the great new pre-print just posted on OSF Preprints by Brian Nosek, David Mellor, and co-authors. They describe the utility of pre-registration across a variety of circumstances. I do something similar here, though I focus on ecology and evolutionary biology and I don’t try to be as thorough as Nosek et al.. For greater depth of analysis, check out their paper. On to my post…

Transparency initiatives are gaining traction in ecology and evolutionary biology. Some of these initiatives have become familiar – data archiving is quickly becoming business as usual – though others are still rare and strange to most of us. Pre-registration is squarely in this second category. Although I know a number of ecologists / evolutionary biologists who are starting to pre-register their work (and I’ve participated in a few pre-registrations myself), I would guess that most eco/evo folks don’t even know what pre-registration is, and many who do know probably wonder if it would even be worth doing. My goals here are to explain what pre-registration is, why it’s useful, and why most ecologists and evolutionary biologists could be using it on a regular basis.


-What is pre-registration?

At its most thorough, a pre-registration involves archiving a hypothesis and a detailed study design, including a data analysis plan, prior to gathering data. However, as you’ll read below, the data analysis plan is typically the core element of a useful pre-registration, and a pre-registration can happen after data gathering as long as the analysis plan is declared without knowledge of the outcome of the analysis or its alternatives.  Pre-registrations are archived in a public registry (the Open Science Framework, OSF, for example) so that they can later be compared to the analysis is ultimately conducted. Depending on the pre-registration archive, the pre-registration may be embargoed to maintain confidentiality of a research plan until it is completed. Once a pre-registration is filed, it cannot be edited, though it could potentially be updated with further pre-registrations. When a pre-registered study is published, the paper should cite (or better yet, link to) the pre-registration to show the extent to which the plan was followed.


-Why is pre-registration a useful component of transparency?

People (and, including all of us) are worryingly good at filtering available evidence so that they end up seeing the world that they expect to see rather than the world as it actually is. In other circumstances, after noticing a pattern, we readily convince ourselves that we predicted (or would have predicted) that particular outcome. All the while, we fool ourselves into believing we’re being unbiased. Science is all about avoiding these biases and taking honest stock of available evidence, but in the absence of adequate safeguards, there is good evidence that scientists can fall prey to cognitive biases (for a striking example, see van Wilgengurg and Elgar 2013). Pre-registration is one of a number of tools that helps scientists take a clear eyed view of evidence, and it helps those of us reading scientific papers to identify evidence that is less likely to have been run through a biased filter.  When scientists fiddle with analyses and can see how that fiddling impacts results, there is a great temptation to choose the analyses that produce the most desirable outcome. If this biased subset of results gets published and other results go unreported, we get a biased understanding of the world. In my ignorant past I’ve conducted and presented analyses this way, and nearly every other ecologist and evolutionary biologist I’ve talked to about this admits to doing this sort of thing at least once. For this and other reasons (Fidler et al. 2016, Parker et al. 2016), I think this problem is common enough to reduce the average reliability of the published literature. Pre-registration could improve average reliability of this literature and help us identify papers that are less likely to be biased.


-Why is pre-registration a viable tool for ecologists and evolutionary biologists?

I’ve written this section as a series of hypothetical concerns or questions from ecologists or evolutionary biologists, followed by responses to those concerns / questions.


“I work in the field and I have to refine my methods, or even my questions, over weeks or months through trial and error”

You can pre-register after your methods are finalized. When starting work in a new system or with a new method, you generally won’t be ready to complete a particularly useful pre-registration until you’ve gotten your hands dirty. You’ll need to figure out what works and what doesn’t work through trial and error. Unless you have excellent guidance from experts in the system / method, you probably want to hold off finalizing your pre-registration until you’ve been in the field and landed on a method that works. It would still be good to think long and hard about the project before heading to the field. Develop as detailed a methodological plan as is reasonable (in many cases, you’ll have done this already at the proposal stage) and talk to a statistician to develop a tentative analysis plan. Once you’ve begun to implement a set of methods you feel good about, then complete your pre-registration.


“What if I have to change my methods part way through the project?”

Of course, even if you go through the trouble of field testing your methods before finalizing your pre-registration, things still might change. You might come back a second year to find that conditions demand a revised protocol. If you have to scrap your first year’s data because you can’t continue, then you probably want to create an entirely new pre-registration based on your new methods. On the other hand, if your data from last year are still usable and you’ve just had to make modest changes, then you have some choices. You could just wait until you write the manuscript to explain why your data gathering methods changed, or you could file a new pre-registration that acknowledges (and links to) the earlier protocol but also introduces the new methods. The old protocol won’t disappear, but the evolution of your project is now transparent.


“I work with existing data (e.g., from long-term projects, from existing citizen science projects, from my own metaphorical file drawer, for meta-analysis, etc.), so I can’t pre-register prior to data gathering.”

Pre-registration can be useful at any point before you start to examine your data for biologically relevant patterns either through examining data plots or through initial statistical analyses. If you haven’t peaked at the data yet, go for it. Pre-register a detailed analysis plan.


“What if I see patterns in my data that I want to follow-up on with analyses that I didn’t pre-register?”

Not a problem. Just distinguish your post hoc analyses from your preregistered analyses in your paper. Ideally you’d also report all your post hoc exploration and declare that you have done so. If you have too many to report in your paper, present them in supplementary material or even in a data repository.


“I focus on discovery. I don’t typically have a priori hypotheses when I start a project.”

Pre-registration can still be for you. The primary purpose of pre-registration is to promote transparency. Exploratory work is vital. We just want to know that we’re not being shown a biased subset of your exploratory outcomes. Thus if you have a study and analysis plan, you pre-register it, and then present results from the full set of analyses you presented in your pre-registration, we know we’re not getting a biased subset.


“I don’t develop an analysis plan until I have my data so that I can see how they are distributed and how viable different modeling alternatives are with the real data”

There are several options here. You could develop a decision tree that anticipates modeling decisions you will need to make and lays out criteria for making those decisions. Other options include working with some form of your actual data in a trial phase. For instance, you could sacrifice a portion of your data for model exploration, select a set of models to test, pre-register those, and then assess them with your remaining (unexplored) data. Alternatively you could scramble your full data set, or add some sort of noise, refine your analysis plan with these ‘fake’ data, then pre-register and re-run the analysis with the real data.


“I don’t want to develop a detailed analysis plan. There are too many unforeseen circumstances and I’m bound to ultimately deviate from my plan”

I have two responses to this concern. The first is to see my previous reply – there are ways to pre-register after you have your data and have confirmed that an analysis is likely to be appropriate with your data. My second point is that, just as field methods change in response to circumstances, so do statistical methods. A pre-registration doesn’t prevent us from changing an analysis, it just helps us be transparent about these changes. Among other things, this transparency probably helps us make sure that when we do change our plan, we’re doing so for a good reason.


“If I can just pre-register an analysis plan after collecting my data, why should I bother to pre-register the other portions of my study methods?”

Although I think it’s much better to pre-register an analysis plan than to not pre-register at all, pre-registering the whole study design is helpful for a variety of reasons. For one, pre-registering prior to completion of data gathering (or better yet, before data gathering), help makes it clear that your pre-registered analysis plan could not have been influenced by any knowledge (conscious or unconscious) about patterns in the data. Early pre-registration also facilitates transparency about the project as a whole. Later when you publish the results, other researchers can understand the scope of your work and can be shown (hopefully), that you’re not just publishing subset (potentially a biased subset) of the project. And if you never publish your work, then your pre-registration is evidence that someone at least considered doing this project at some point, and this could be useful information to other researchers down the line. A well-executed pre-registration might also help set expectations for the role of individual collaborators.


“Pre-registration is just extra work”

In most cases, pre-registration should not dramatically change workload. If you’ve written a grant proposal, much of the work of pre-registration will already be done. If your grant proposal doesn’t include a detailed analysis plan, presumably the manuscript you write to report your results will include a detailed explanation of your analytic methods, and so a pre-registration just shifts the timing of this writing. Likewise, if this isn’t grant funded research, some other parts of your methods, and presumably parts of your introduction, will be ready and waiting in draft form when you complete your pre-registered study and go to write it up. To the extent that you end up writing more about your analyses in a pre-registration than you would have in a paper that reported only a subset of your analyses, this is the price for doing transparent and reliable science. You should have been reporting all this information somewhere anyway.


“If I pre-register, I might be scooped”

You can embargo your pre-registration so that it’s private until you choose to share it. Pre-registrations on the site AsPredicted can remain private indefinitely. On the OSF, embargos are limited to four years.


“I’m a student just starting a project and so I don’t know enough about my system to pre-register”

If you’re mentored by someone familiar with this system, then you’ll want to work closely with your mentor to develop your pre-registration. If this isn’t possible, read through my suggestions above. There are various paths forward, from waiting until you’ve worked out the kinks in your methods to various ways of pre-registering after you have data. Think carefully and identify the path that’s best for you.


If you have other concerns or questions about how you could apply pre-registration to your work, I’d love to hear about them. Let’s have a discussion.

Not all work needs to be pre-registered, but most work could be pre-registered. And this is important because pre-registration will help ecologists and evolutionary biologists improve transparency and thus, I expect, reduce bias in a wide array of circumstances.


Ecological Society of America Ignite Session on Replication in Ecology

by Hannah Fraser

Fiona Fidler and Tim Parker organized an Ignite session on Replication in Ecology at the Ecological Society of America Annual Meeting 2017 in Portland, U.S.A a few weeks ago. Ignite sessions start with a series of 5 minute talks on a similar topic that are followed by a panel discussion. At Fiona and Tim’s session more than 50 attendees listed to talks by Fiona Fidler, Clint Kelly, Kim LaPierre, David Mellor, Emery Boose, and Bill Michener.

Tim introduced the session by describing how it had arisen from discussions with journal editors. Tim and his colleagues have recently been successful in encouraging editors of many journals in ecology and evolutionary biology to support the Transparency and Openness Promotion guidelines but one of these guidelines – the one which encourages the publication of articles that replicate previously published studies – has proven unpalatable to a number of journal editors. The purpose of the Ignite session was to discuss the purpose and value of replication studies in ecology, to raise awareness of the thoughts shared by members of Transparency in Ecology and Evolution, and to take initial steps towards developing a consensus regarding the role of replication in ecology.

Fiona Fidler

Fiona spoke first, describing the tension between the importance of replication and the perception that it is boring and un-novel. Replication is sometimes viewed as the cornerstone of science (following Popper): without replicating studies it is impossible to either falsify or verify findings. In contrast replication attempts are deemed boring if they find the same thing as the original study “someone else has already shown this”, and meaningless if they find different things “there could be millions of reasons for getting a different result”. However, there are actually a range of different types of replication studies which differ in their usefulness in terms of falsification, their novelty and the amount of resources required to achieve them. Fiona broke this down into two scales 1) using the same exact data collection procedure or completely different data collection procedures and 2) using the same exact analysis to using completely different analyses. A study that uses completely different data collation and analysis methods to investigate the same question is often termed a conceptual replication. Conceptual replication is reasonably common in ecology: people investigating whether an effect is true in a new context. However, there are very few studies that attempt to more directly replicate studies (i.e. by using the exact same data collection procedures and data analyses). However, finding a different result in these contexts doesn’t result in falsification or even, in many cases, scepticism about the findings of the original study because there are often so many uncontrollable differences between the two studies and any of these could have caused the studies to find different results. Fiona suggested that one way to enhance the relevance of all replications, but particularly these conceptual replications, could be to adopt a proposal from psychology and include a statement in every article about the constraints on generality. If all articles described the circumstances in which the authors would and would not expect to find the same patterns it becomes possible to use conceptual replications to falsify studies, or further delimit their relevance.

Clint Kelly

Previous work has shown that 1% of studies in psychology are replication studies. Clint described some of his recent work aimed at determining how many replication studies there have been in ecology. He text-mined open access journals on PubMed for papers with the word ‘replic*’ anywhere in their full text. He found that only a handful of studies attempted to replicate a previous study’s findings, and of these, only 50% claimed to have found the same result as the original study. These analyses suggest that there are many fewer replication studies occurring in ecology than in psychology. However, this value only accounts for direct replications that discuss the fact that they are replications of previous work. Conceptual replications are not included in this because it is not common practice to mention that they are replications, possibly because it makes the paper seem less novel. However, this valuable work suggests that the rate at which direct replication studies in ecology is abysmally low.

Kim LaPierre

Kim discussed the Nutrient Network (NutNet) project which she described as a coordinated, distributed experiment but which could equally be seen as concurrent direct replications of the same experiment. The NutNet project aims to “collect data from a broad range of sites in a consistent manner to allow direct comparisons of environment-productivity-diversity relationships among systems around the world”. The same experimental design is used at 93 sites all continents except Antarctica. It’s a massive effort that is achieved with almost no funding as the participating researchers conduct the experiment using their existing resources.

David Mellor

David is from the Center for Open Science and discussed how to guarantee that results of replication studies are meaningful regardless of findings. Like Fiona, David advocated using constraints on generality statements in articles to describe the situations which you would reasonably expect your results to extend to. The majority of David’s talk, however was about preregistration which can be used for replication studies but is actually useful in many types of research. The idea is that, before you start your study, you ‘preregister’ your hypotheses (or research questions) and the methods you intend to use to collect and analyse the relevant data. This preregistration is then frozen in time and referenced in the final paper to clearly delineate what the original hypotheses were (ruling out HARKing – Hypothesizing After Results are Known) and which tests were planned (ruling out p-hacking). The Center for Open Science is currently running a promotion called the Preregistration Challenge, in which the authors of the first 1000 articles that pre-register and get published receive $1000.

Emery Boose

Emery discussed RDataTracker, a package that he helped develop that aids in creating reproducible workflows in R. RDataTracker gives information on data provenance including information on the hardware used to run the analyses and the versions of all relevant software. The package allows you to see what every step of the analysis does; what intermediate values are produced at all points of the analyses. This can be really useful for de-bugging your own code as well as determining whether someone else’s code is operating correctly.

Bill Michener

Bill Michener is from DataOne, an organization that aims to create tools to support data replication. They have developed workflow software ( that make it easier to collate metadata along with datafiles and analyses. The software links with Ecological Metadata, github, dryad and the Open Science Framework among other programs.


The talks were compelling and most attendees stayed to listen and take part in discussions afterwards. Although the session was planned as a forum for discussing opposing views, there was no outright resistance to replication. It is probably the case that the session attracted people who were already in support of the idea. However, it’s also possible that strong opinions expressed in favour of replication made people reluctant to raise critical points or (perhaps wishfully) maybe the arguments made in the Ignite session were sufficiently compelling to convince the audience of the importance of its importance. In any case, it was inspiring to be surrounded by so many voices expressing support for replication. The possibility of replication studies becoming more common in ecology feels real!