Machine Learning-Guided Optimization of p-Coumaric Acid Production in Yeast

Moreno Paz, S.; Van der Hoek, Rianne; Eliana, Elif; Zwartjens, Pricilla; Gosiewska, Silvia; Martins dos Santos, V.A.P.; Schmitz, Joep; Suarez Diez, M.


Industrial biotechnology uses Design–Build–Test–Learn (DBTL) cycles to accelerate the development of microbial cell factories, required for the transition to a biobased economy. To use them effectively, appropriate connections between the phases of the cycle are crucial. Using p-coumaric acid (pCA) production in Saccharomyces cerevisiae as a case study, we propose the use of one-pot library generation, random screening, targeted sequencing, and machine learning (ML) as links during DBTL cycles. We showed that the robustness and flexibility of the ML models strongly enable pathway optimization and propose feature importance and Shapley additive explanation values as a guide to expand the design space of original libraries. This approach allowed a 68% increased production of pCA within two DBTL cycles, leading to a 0.52 g/L titer and a 0.03 g/g yield on glucose.