Ultrafast PubChem Searching Combined with Improved Filtering Rules for Elemental Composition Analysis

Lommen, A.


A new and improved software tool for elemental composition annotation of molecular ions detected in mass spectrometry, based on improved filtering rules followed by ultrafast querying in publicly available compound databases, is provided. Pubchem is used as a general source of 1.3 million unique chemical formulas. A plant metabolomics database containing ca. 100¿000 formulas is used as a source of naturally occurring compounds. Four modes with different sets of rules for heuristic filtering of candidate formulas coming from elemental composition analysis are incorporated and tested on both databases. The elemental composition analysis is then coupled to ultrafast PubChem searching based on a mass-indexed intermediate system. The performance of the filters is compared and discussed. When reactive compounds are assumed not to be present, 99.95% of the 1.3 million PubChem formulas is correctly found, while ca. 30% less formulas per mass are given compared to previously published rules. For the ca. 100¿000 plant metabolomics based formulas, 100% fit the improved rules.