All posts by Shinichi Nakagawa

Open Science – what’s the way (for Australia)? – Notes from a panel discussion

by Malgorzata (Losia) Lagisz, i-deel, EERC, UNSW, Australia

We all heard about Open Science, and particularly Plan S, which has been announced in Europe last year (read more here). On 14th February 2019, I had an opportunity to be a panellist during discussion on what it all could mean for Australia. The panel discussion was organised by Springer Nature as a part of the ALIA conference, which is the main meeting for the librarians and information specialists in Australia and New Zealand (I realised these are mostly lovely middle-aged ladies, although they said more men are starting to join this profession with the new technologies, closing the “gender gap”…)
 
The discussion panel itself was made of different stakeholders and actors in scholarly communication, including: Director of Policy and Integrity ARC, Institutional Engagement Manager and Head of Data Publishing from Springer Nature from Springer Nature, Associate Librarian from Scholarly Information Services & Campus Libraries VU, and me as “the centrally important view of the researcher” (that’s at least how my presence was justified…)
 
After a short introduction we had to answer three pre-determined questions:

  1. “Is Plan S the right plan for ANZ in the short term? And long term?”
  2. “What is the role of institutional repositories in a scholarly publishing system that is moving towards gold open access as the preferred model for funders and authors?”
  3. “Outline an example of an open scholarship or open data initiative and why this underlines the benefits of open research?”


We did not reach a strong conclusion on any of these questions, but there were a few emerging insights (at least for me):

  • “Open access” is a general term encompassing publications that are freely available and permanently archived in a public repository or from more traditional publishers and licensed in a way that allows broad use and reuse. While Plan S requires all publicly funded research outcomes to be immediately made available to the public and for the benefit of the public, it also places restrictions where and how these outcomes have to be published.
  • Depositing in institutional repositories is currently mandatory in most of the Australian research institutions and is also the preferred option for freely sharing the research outputs. Plan S acknowledges the role of repositories for archiving research but does not see it as the main publishing venue. Importantly, institutional repositories and many other free non-profit repositories (like preprint servers) are not likely to be compliant with Plan S requirements.
  • Another concern is related to academic freedom – the right of academics to decide where to publish their findings and the impacts on the researchers themselves. Especially in the early stage of Plan S, there will be not many journals fully compliant with the requirements, and thus reduced the choice of publishing venues. This may mean having to publish in less reputable or less impactful journals than the researchers would otherwise submit their work to. Such restrictions may strain international collaborations and also affect the career prospects of researchers.
  • Plan S would also negatively affect research societies and their role in fostering good quality research and research careers. Many societies earn the bulk of their income from the subscription-based journals they publish. Flipping to publishing in a full open access model requires significant financial resources, and many societies will not be able to afford this. If they don’t flip, they may lose on submissions, reputation and income (for more details read this opinion).
  • It is not clear whether under Plan S there will be savings in the overall research costs. If not-for-profit publishers and repositories get marginalised, the overall bill might be actually higher, with the costs shifting from the reading/access fees to publishing fees (and it is not quite clear how the later will be covered).
  • Finally, change to Open Access and Open Science should not be rushed. Taking time will allow figuring out the safest path for transition for the publishers and researchers. Changing the mindsets of academics via education, not enforcement, will be an important factor. It will also be easier with a new generation of young scientists joining academia, more ready to embrace open science.

EcoEvoRxiv launched!

I am very excited to announce the launch of  EcoEvoRxiv – a preprint server where ecologists and evolutionary biologists can upload their forthcoming papers. I am aware that many ecologists and evolutionary biologists already use the preprint service, bioRxiv and that’s great! I have used bioRxiv several times myself. EcoEvoRxiv is a more targeted server, and it is convenient because a preprint at EcoEvoRxiv can seamlessly integrate a project that makes use of the services at the Open Science Framework (OSF). My group, like others, uses OSF for project management so this is a great feature of EcoEvoRxiv.

There are several reasons I have taken on the challenge of kickstarting and leading EcoEvoRxiv with my colleagues, others than the reasons I already mentioned.

1. Having preprints online and citable is especially wonderful for my students and postdocs (and any other young scientists out there). This is because their potential employers can immediately read their work online. Last year, I did a reference for the Human Frontier Science Program (HFSP), and they asked whether the candidate has preprints in addition to published papers (a very nice change).

2. It is a part of Transparency in Ecology and Evolution (TEE) movement, so I’ve got a lot of support from Fiona and Tim (the co-founders of TEE). We believe that EvoEvoRxiv will not only raise the awareness of preprint servers (including bioRxiv) but also of other transparency activities as part of the credibility revolution.

3. The biggest reason is probably that I just cannot say NO when I get asked by people (but in 2019, I will be saying a record number of NOs – I am making a tally chart so that I can report to my mother, who skypes me from Japan regularly, at the end of the year). Nonetheless I am very glad to say YES to EcoEvoRxiv.

We hope EcoEvoRxvi will encourage more ecologists and evolutionary biologists to put their preprints online. We have more information at a dedicated information website (ecoevorxiv.com). As you will find out, we have a wonderful team of committee members and ambassadors from 11 different countries, helping me to launch EcoEvoRixv. EcoEvoRxiv wants your preprints (and also postprints)!

Here I would like to acknowledge people from the Center of Open Science (COS; especially, Rusty, Rebecca, David and Matt thank you) for their support in launching EcoEvoRxiv.

Join the Credibility Revolution!

Last week (14-15 Nov), I went to Melbourne for a workshop (“From Replication Crisis to Credibility Revolution”). The workshop was hosted by my collaborator and “credibility revolutionary” Fiona Fidler.

I suspect many workshops and mini-conferences of this nature are popping out all over the world as many researchers are very much aware of “reproducibility crisis”. But what was unique about this one is its interdisciplinary nature; we had philosophers, psychologists, computer scientists, lawyers, pharmacologists, oncologists, statisticians, ecologists and evolutionary biologists (like myself).

I really like the idea of calling “reproducibility crisis” “credibility revolution” (hence the title). A speaker at the workshop, Glenn Begley, wants to call it “innovation opportunity” (he wrote this seminal comment for Nature). What a cool idea! And these re-namings make things a lot more positive than a bit of doom-and-gloom feel of “replicability crisis”. Indeed, there are a lot of positive movements toward Open Science and Transparency, happening to remedy the current ‘questionable’ practice.

Although I live in Sydney, I was also in Melbourne early last month (4-5 Oct) for a small conference. This is because Tom Stanly invited me over, as an evolutionary biologist, to give a talk on meta-analysis to a bunch of economists who love meta-analyses. To my surprise, I had a lot of fun chatting with meta-enthusiastic economists.

Tom is not only an economist but also a credibility revolutionary, like Fiona. He has asked me to invite ecologists and evolutionary biologists to comment on his blog about a credibility revolution. It is an excellent read. And if you can make comments to join the conversation, Tom will appreciate it a lot, and get conversations going. Disciplines need to unite together to make this revolution successful or make the most of this innovation opportunity. So join the credibility revolution! (meanwhile, I am now off to Japan to talk more about meta-analysis, and sample nice food – will joint the revolution once I am back).

Why ‘MORE’ published research findings are false

In a classic article titled “Why most published research findings are false”,  John Ioannidis explains 5 main reasons for just that. These reasons are largely related to large ‘false positive reporting probabilities’ (FPRP) in most studies, and ‘researcher degrees of freedom’, facilitating the practice as such ‘p-hacking’. If you aren’t familiar with these terms (FPRP, researcher degrees of freedom, and p-hacking), please read Tim Parker and his colleagues’ paper.

I would like to add one more important reason why research findings are often wrong (thus, the title of this blog). Many researchers simply get their stats wrong. This point has been talked about less in the current discussion of the ‘reproducibility crisis’. There are many ways to getting stats wrong, but I will discuss a few examples here.

In our lab’s recent article, we explore one way that biologists, especially when statistically accounting for body size, can produce unreliable results. Problems arise when a researcher divides a non-size trait measurement by size (e.g., food intake by weight), and uses this derived variable in a statistical model (a worryingly common practice!).Traits are usually allometrically related to each other, meaning, for example, food intake will not increase linearly with weight. In fact, food intake increases slower than weight. The consequence of using the derived variable is that we may find statistically significant results where no actual effect exists. (see Fig 1). An easy solution for this issue is to log-transform and fit the trait of interest as a response with size as a predictor (i.e., allometrically related traits are log-linear to each other).

But surprisingly, even this solution can lead to wrong conclusions. We discussed a situation where an experimental treatment affects both a trait of interest (food intake) and size (weight). In such a situation, size is known as an intermediate outcome, and fitting size as a predictor could result in wrongly estimating an experimental effect. I have made similar mistakes because it’s difficult to know when and how to control for size. It depends on both your question and the nature of relationships between the trait of interest and size. For example, if the experiment affected both body size and appetite and also body size influences appetite as well, then, you do not want to control for body size. This is because the effect of body size on appetite is due to the experimental effect (complicated!).

Although I said, ‘getting stats wrong’ is less talked about, there are exceptions. For instance, the pitfalls of pseudoreplication (statistical non-independence) have been known for many years, but researchers continue to overlook this problem. Recently my good friend Wolfgang Forstmeier and his colleagues devoted part of a paper on avoiding false positives (Type I error) to explaining the importance of accounting for pseudo-replication in statistical models. If you work in a probabilistic discipline, this article is a must read! As you will find out, not all pseudoreplication is obvious. Modelling pseudoreplication properly can reduce Type I dramatically (BTW, we recently wrote about statistical non-independence and the importance of sensitivity analysis).

What can we do to prevent these stats mistakes? Doing stats right is more difficult than I used to think. When I was a PhD student, I thought I was good at statistical modelling, but I made many statistical mistakes including ones mentioned above. Statistical modelling is difficult because we need to understand both statistics and biological properties of a system under investigation. A statistician can help with the former but not the latter. If statisticians can’t recognize all our potential mistakes, I think this means that we as biologists should become better statisticians.

Luckily, we have some amazing resource. I would recommend all graduate students read Gelman and Hill’s book. Also, you will learn a lot from Andrew Gelman’s blog where he often talks about common statistical mistakes and scientific reproducibility. Although no match to Gelman and Hill, I am doing y part to educate biology students about stats by writing a new kind of stats book, which is based on conversations, i.e. a play!

I would like to thank Tim Parker for detailed comments on an earlier version of this blog.