Computational identification of co-evolving multi-gene modules in microbial biosynthetic gene clusters

Carratore, Francesco Del; Zych, Konrad; Cummings, Matthew; Takano, Eriko; Medema, Marnix H.; Breitling, Rainer


The biosynthetic machinery responsible for the production of bacterial specialised metabolites is encoded by physically clustered group of genes called biosynthetic gene clusters (BGCs). The experimental characterisation of numerous BGCs has led to the elucidation of subclusters of genes within BGCs, jointly responsible for the same biosynthetic function in different genetic contexts. We developed an unsupervised statistical method able to successfully detect a large number of modules (putative functional subclusters) within an extensive set of predicted BGCs in a systematic and automated manner. Multiple already known subclusters were confirmed by our method, proving its efficiency and sensitivity. In addition, the resulting large collection of newly defined modules provides new insights into the prevalence and putative biosynthetic role of these modular genetic entities. The automated and unbiased identification of hundreds of co-evolving group of genes is an essential breakthrough for the discovery and biosynthetic engineering of high-value compounds.