Automatic Chemical Structure Annotation of an LC-MSn Based Metabolic Profile from Green Tea

Ridder, L.O.; Hooft, J.J.J. van der; Verhoeven, S.; Vos, C.H. de; Bino, R.J.; Vervoort, J.


Liquid chromatography coupled with multistage accurate mass spectrometry (LC–MSn) can generate comprehensive spectral information of metabolites in crude extracts. To support structural characterization of the many metabolites present in such complex samples, we present a novel method ( to automatically process and annotate the LC–MSn data sets on the basis of candidate molecules from chemical databases, such as PubChem or the Human Metabolite Database. Multistage MSn spectral data is automatically annotated with hierarchical trees of in silico generated substructures of candidate molecules to explain the observed fragment ions and alternative candidates are ranked on the basis of the calculated matching score. We tested this method on an untargeted LC–MSn (n = 3) data set of a green tea extract, generated on an LC-LTQ/Orbitrap hybrid MS system. For the 623 spectral trees obtained in a single LC–MSn run, a total of 116¿240 candidate molecules with monoisotopic masses matching within 5 ppm mass accuracy were retrieved from the PubChem database, ranging from 4 to 1327 candidates per molecular ion. The matching scores were used to rank the candidate molecules for each LC–MSn component. The median and third quartile fractional ranks for 85 previously identified tea compounds were 3.5 and 7.5, respectively. The substructure annotations and rankings provided detailed structural information of the detected components, beyond annotation with elemental formula only. Twenty-four additional components were putatively identified by expert interpretation of the automatically annotated data set, illustrating the potential to support systematic and untargeted metabolite identification.