Advancements in capturing and mining mass spectrometry data are transforming natural products research

Jarmusch, Scott A.; Hooft, Justin J.J. van der; Dorrestein, Pieter C.; Jarmusch, Alan K.


Covering: 2016 up to 2021 Mass spectrometry (MS) is an essential technology in natural products research with MS fragmentation (MS/MS) approaches becoming a key tool. Recent advancements in MS yield dense metabolomics datasets which have been, conventionally, used by individual labs for individual projects; however, a shift is brewing. The movement towards open MS data (and other structural characterization data) and accessible data mining tools is emerging in natural products research. Over the past 5 years, this movement has rapidly expanded and evolved with no slowdown in sight; the capabilities of today vastly exceed those of 5 years ago. Herein, we address the analysis of individual datasets, a situation we are calling the '2021 status quo', and the emergent framework to systematically capture sample information (metadata) and perform repository-scale analyses. We evaluate public data deposition, discuss the challenges of working in the repository scale, highlight the challenges of metadata capture and provide illustrative examples of the power of utilizing repository data and the tools that enable it. We conclude that the advancements in MS data collection must be met with advancements in how we utilize data; therefore, we argue that open data and data mining is the next evolution in obtaining the maximum potential in natural products research.