Just how reproducible are studies in ecology and evolutionary biology? We don’t know precisely, but a new case study in the journal Evolution shows that even textbook knowledge can be unreliable. Daiping Wang, Wolfgang Forstmeier, and co-authors have convinced me of the unreliability of an iconic finding in behavioral ecology, and I hope their results brings our field one step closer to a systematic assessment of reproducibility.
When I was doing my PhD, one of the hottest topics in behavioral ecology was the evolutionary origin of sexual ornaments. A tantalizing clue was the existence of latent female preferences – preferences that females would express if a mutation came along that produced the right male proto-ornament. One of the first hints of latent preferences was detected by Nancy Burley in female zebra finches by fitting male finches with leg bands of different colors. It turned out that a red band was attractive, a green band unattractive. Multiple studies appeared to support the original, and the story entered textbooks.
But now it’s non-reproducible textbook knowledge. Wang et al. report on multiple robust replication attempts that failed to reproduce this effect. So where does this leave us? It could be that the original effect was real, but contingent on some as-yet-undiscovered moderator variable. That hypothesis can never be disproven, but if someone wants to make that argument, it’s on them to identify the mysterious moderator and show how the color leg band effect can be reproducible. Until then, I’m adding the color band attractiveness effect to the list of things I learned in graduate school that were wrong.
By the way, in this case, ‘not reproducible’ means an average effect size that approximates zero. This is not just a case of one study crossing a significance threshold and another failing to cross the threshold. The sum of these replications looks exactly like the true absence of an effect.
It’s also worth noting that the distribution of published results from the lab that originally discovered the color band effect follows the pattern expected from various common research practices that unintentionally increase the publication of false positives and inflated effect sizes. I don’t mention this as an accusation, but rather as a reminder to the community that if we don’t take deliberate steps to minimize bias, its likely to creep in and reduce our reproducibility.