Thesis subject

Bayesian approaches to –omics data integration


In many studies, genome-wide measurements (omics) are performed at various levels of cellular organization: DNA (genome), mRNA (transcriptome), proteins (proteome), metabolites (metabolome) and phenotypes (phenome) [1]. Such data gives views of the states of the cell at various levels. These states are of course interdependent: the DNA codes for proteins through mRNAs, metabolites interact with proteins, proteins interact with each other and proteins and metabolites in the end cause certain observed phenotypes. But while many of these interactions have been described, the integrated analysis of omics data is tremendously challenging, due to the heterogeneous, noisy, biased and sometimes contradictory nature of the data.

In this project, the goal is to investigate fundamental ways of integrating prior knowledge and omics data to obtain a full picture of cellular activity. In particular, we intend to look at Bayesian methods, equipped to deal with the inherent uncertainty in the data (e.g. [2]). The desired outcome is a method to predict a phenotype given one or more omics data sets, optimally exploiting all available knowledge. Such a tool is not only useful for predicting a phenotype, but mainly to prioritize possible mechanisms underlying the observed phenotype.

[1] A.R. Joyce and B.. Palsson (2006) The model organism as a system: integrating 'omics' data sets. Nature Reviews Molecular Cell Biology 7:198-210. [2] E.M. Jennings et al. (2013) Bayesian methods for expression-based integration of various types of genomics data. EURASIP Journal on Bioinformatics and Systems Biology 2013:13.

Used skills: Programming, statistics

Requirements: INF-22306 Programming in Python, SSB-30306Molecular systems biology, MAT-20306 Advanced statistics or ABG-30806 Modern statistics for the life sciences