Thesis subject

Exploring sequence features related to protein turnover

In many cases, measured levels of mRNA correlate poorly with levels of the corresponding proteins. This indicates that many protein levels are post-transcriptionally regulated. The means by which the cell regulates protein levels are poorly understood. To address this question, a recent study measured turnover of more than 600 proteins in yeast, finding remarkable differences: some proteins have a half-life of less than 30 minutes, others over 20 hours [1].

Mechanisms that actively degrade specific proteins are likely based on the sequence or structure of these proteins. In this project, we want to explore whether certain features in the sequence of proteins predict their half-life. The dataset obtained [1] will be the starting point; machine learning methods can be used to mine the sequences for motifs or other properties predictive of half-life, similar to [2]. Knowledge obtained from literature on protein degradation may be used to guide this sequence mining. The desired outcome is a better understanding, ideally in the form of a simple model, of the process of protein degradation.

[1] A.O. Helbig et al. (2011). The diversity of protein turnover and abundance under nitrogen-limited steady-state condition in Saccharomyces cerevisiae. Molecular Biosystems 7(12):3316- 26. [2] B.A. van den Berg et al. (2012). Exploring sequence characteristics related to high-level production of secreted proteins in Aspergillus niger. PLoS ONE 7(10):e45869.

Used skills: Programming, statistics

INF-22306 Programming in Python
BIF-30806 Advanced bioinformatics
MAT-20306 Advanced statistics or BRD-31806 Parameter estimation and model structure ident. or ABG-30806 Modern statistics for the life sciences