Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

Blin, Kai; Kim, Hyun Uk; Medema, M.H.; Weber, Tilmann


Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.