Computational metabolomics

Molecules are everywhere. The diverse structures and functions of small and specialized molecules have always fascinated me. Specialized molecules play key roles in nature as inter-species and even inter-kingdom messengers or as chemical weapons to deter or eliminate­ competing organisms nearby. Furthermore, these specialized molecules are also important components of food often determining crop growth and fitness and their nutritional value. Knowing all small molecules and their functions would thus be of tremendous aid in scientific areas such as antibiotics discovery and the growing of healthy crops. However, these small molecules are seldom on their own and usually part of a complex metabolite mixture with molecules derived from various organisms and the environment. Thus, methods that quickly capture a complete picture of the small molecule profile of complex metabolite mixtures are needed. I recognise three paths to better understand complex metabolite mixtures: i) increased metabolite annotation power, ii) chemically-informed comparative metabolomics, and iii)  linked metabolomics profiles to function and genotype/genomic information.

Metabolomics is the scientific field that aims to map all molecules in an organism and is thus perfectly positioned to map this structural diversity. Fortunately, technical advances in analytical chemical equipment have boosted mass spectrometers to obtain information-dense metabolic profiles. However, these profiles typically consist of mass spectra rather than structures. Spectral libraries containing spectra of known structures are growing but currently cover about 2.5% of the known natural products and typically 2 – 25% of experimental data gets annotated through library matching - there is thus a lot of hidden information in metabolomics data waiting to be uncovered. My research vision is therefore to close the gap between what we can see in metabolomics and what we can actually learn from it. This will enable biochemical interpretation of spectral data obtained from complex metabolite mixtures through structural and functional annotations. This will depend on finding out: i) which structural information is encoded in metabolomics data; ii) how novel chemistry can be recognised in spectral data, and iii) how to effectively identify relevant metabolite groups in metabolomics profiles of complex metabolite mixtures? 

My research agenda is therefore to develop algorithms and models to improve structural annotation of metabolite features and to obtain direct biochemical knowledge from metabolomics profiles. In my group, I will develop computational metabolomics approaches inspired by two other fields - that of natural language processing (NLP) and genomics.  For example, I have demonstrated the use of topic modeling NLP algorithms to discover substructures from metabolomics profiles, and I am currently pioneering the use of word embedding NLP approaches to aid in metabolomics analyses. Furthermore, genomics analyses tools have rapidly expanded over the last decade and by making metabolomics data fit to those tools, it will be possible to exploit a large range of tools originally developed for genomics data and make improved use of the increasing amount of the available consistent curated sample information. In addition, linking the outcome of genomics and metabolomics data mining tools will accelerate the natural products discovery field by making connections between biosynthetic gene clusters and spectra of the molecular structures they encode for - thus learning who can produce these molecules as well as offering the option to transfer structural information back and forth. I will use the plant root microbiome and human food metabolome as prime applications since they represent complex metabolite mixtures full of yet unknown metabolic matter that once elucidated will boost our insights in molecular mechanisms underpinning the regulation of growth, development, and health.