The open science story of Justin van der Hooft
In the new series Open Science Stories, WUR researchers and instructors tell how they contribute to Open Science & Education. What are the benefits and challenges and what is needed to bring about the open science culture change at WUR? First in the series is Dr Justin van der Hooft, Assistant Professor in Computational Metabolomics at the Bioinformatics group.
Justin van der Hooft. Foto Eric Scholten
What's your greatest accomplishment when it comes to open science?
Together with researchers from the Netherlands eScienceCenter (Amsterdam) and the University of California San Diego (USA), my team has developed the Paired Omics Data Platform (PoDP). This platform brings together two data types, genomic and metabolomic, in both a human and a computer readable platform. The platform ensures data interoperability and is based on FAIR data principles. This makes it easier to develop new algorithms that automatically link gene clusters and metabolite spectra, thus allowing us to link spectra of previously unknown bioactive (antibiotic) metabolites to the genetic machinery that produces them. It is a perfect example of community-driven Open Science. Many labs from various countries have already shared data on the platform.
How do you motivate researchers to share their data?
The importance of sharing research data is widely endorsed, but, in practice, researchers run into several barriers. Thus, data sharing is not yet as mainstream as I’d like it to be. One of the barriers is time. You need to make it as easy as possible for people to share their research data. At the PoDP, we’ve limited the number of required metadata to keep the time investment manageable.
Sometimes people are reluctant to share data because they’re afraid that others will run off with the data and "scoop” them. Although I do take this concern seriously, the chances that another group could repeat your entire study in a few months or years are generally very small, since they don’t have the same biochemical expertise as you.
However, we do need to be aware that the stakes can be influenced by the stage of your career. I can imagine that PhD candidates and early career researchers, especially in disciplines with high publication pressure, are not eager to share their data during their studies. It would help if journals agreed that if someone comes up with a study based on your data while your paper is under review, your submitted publication will still be considered new and original. Then a “scoop” by other researchers with your data will have much less impact: instead, one can see it as a sign that your data is valuable and leads to new insights.
What are the benefits of sharing data?
There are plenty of well-known benefits. By sharing your data, more research can be done with the same data, research can be repeated, and studies can be more easily compared. For research groups like mine, it’s crucial that people share their data; we need the data to improve and optimise our tools, which is an advantage for the data generators who are end-users of our tools as well.
I think we should more strongly emphasize the personal benefits of sharing data, such as the fact that mistakes are more easily filtered out and that your research data in a repository is also preserved for the long term, so you can always access it as well.
What can WUR do to encourage researchers?
This is where the Recognition and Reward programme comes in.
Providing reusable and accessible data files or building the tools for that should be formally appreciated given the efforts that go into this if you want to get it right. This is especially the case if these activities generate successes, such as publications or new research by third parties.
If you could invite three scientists (living or dead) to dinner to discuss open science, who would you invite and why?
It would be wonderful to meet Charles Darwin and Marie Curie and tell them about the Open Science movement! I imagine that they would be very surprised and enthusiastic. Actually, I’d very much like to go back in time myself and find out how they shared their knowledge in those days. Would it be possible to do anything at all with Open Science? What impact might it have had? Sure, we’re never going to find out, but given the travels Darwin made, some incredible datasets could have emerged!
At the present time, I would invite Pieter Dorrestein from UCSD University of California, San Diego. I collaborate with him and his team intensively. His Global Natural Products Social Molecular Networking (GNPS) platform is something everyone can use. As such, he is a great role model for me. I’d like to discuss with him how to shift to even better data descriptions to enable and promote the reuse of data. How can we make it easy for researchers to deposit their data but also ensure well-curated metadata and reliable annotations?
What can WUR focus on to make it easier?
Many tools and templates are already available to the community. It’d be good to have more discipline-specific tutorials so that groups can register FAIR data faster and meet the requirements of funders.
In my opinion, the success of Open Science ultimately depends on both the intrinsic motivation of and clear rewards for researchers. We should keep emphasizing the benefits, both for the scientific community as well as for individual researchers.