BSc and MSc Thesis Subjects of the Bioinformatics Group

On this page you can find an overview of the BSc and MSc thesis topics that are offered by our group. Please contact the project supervisor when you would like to learn more about a specific project.

MSc thesis: In the Bioinformatics group, we offer a wide range of MSc thesis projects, from applied bioinformatics to computational method development. Here is a list of available MSc thesis projects. If you consider doing your thesis project in our group, please contact Dick de Ridder (MSc thesis coordinator).

BSc thesis: As a BSc student you will work as an apprentice alongside one of the PhD students or postdocs in the group. You will work on your own research project, closely guided by your supervisor. You will be expected to work with several tools and/or databases, be creative and potentially overcome technical challenges. Below you will find short descriptions of the research projects of our PhDs and Postdocs. In addition you can take a look at the list of MSc thesis projects above. Please contact Sandra Smit (BSc thesis coordinator) to discuss your interests.

BSc thesis topics

Linking the metabolome and genome

Justin van der Hooft
The central theme of my research is the integration of metabolome and genome mining tools to accelerate and improve functional annotations of biosynthesis genes and specialized molecules. One the one hand, I am working on improving workflows to maximize the structural information obtained from mass spectrometry fragmentation data. On the other hand, I will develop and extend existing workflows that recognize patterns of co-localized genes in predicted biosynthesis genes clusters. Both the metabolomics and genomics workflows focus on the recognition of molecular substructures as building blocks of more complex natural products. As these tools will provide complementary structural data on the specialized molecules produced by microbes, fungi, and plants, I will finally integrate those workflows to boost natural product discovery.

Biosynthetic Gene Clusters in the human microbiome

Victòria Pascal Andreu
The human body is colonized by billions of highly diverse microorganisms, collectively referred to as the human microbiota. Having co-evolved with the human hosts, it is known that they play key roles in maintaining healthy conditions through several intrinsic mechanisms: metabolizing and digesting food, synthetizing essential vitamins and nutrients, providing protection against pathogens and priming the immune response. While many efforts have been done to catalogue the human microbiota, less is known about the molecular mechanisms behind the beneficial or detrimental phenotypes it governs. Nonetheless, several recent studies indicate that human-associated bacteria produce a wide diversity of small molecules in high concentrations in vivo, many of which are likely to mediate specific interactions with the host and other microbes. In this project, we propose to develop and utilize computational methods to acquire a comprehensive understanding and gain insights into the small molecules by which different microbial communities can alter the host condition, with a special emphasis to anaerobic biosynthetic gene pathways.

Pangenomics for crops

Siavash Sheikhizadeh Anari
In this project, I aim to develop novel computational methods for integrating genomic information found in a population of closely related species into a graph structure, called a pan-genome, which will be exploited for doing comparative analysis. Currently, pan-genome consist of sequence, annotation and function layers which is stored in a graph database. Sequence layer is a graph representing multiple nucleotide sequences, annotate layer consist of genomic features of those sequences like genes and proteins and function layer determines genes with the same function. We believe, pan-genomes will take over the role of linear genomes in near future.

Biosynthetic Gene Clusters in plants

Hernando Suarez
Recently, it has been found that the genes of some specialized metabolic pathways are physically clustered in plants, sometimes separated or flanked by other genes which may not serve a purpose in the pathway. These operon-like structures have been dubbed Biosynthetic Gene Clusters (BGC) and so far more than 20 have been discovered (Nützmann & Osbourn, 2014), paving a method for natural product discovery that had not been previously used in plants before: via BGC prediction.
Plants, however, are very complex organisms, and unlike in bacteria and fungi, the existence of a BGC doesn’t mean there’s a pathway associated with it. For this, physical clustering alone cannot positively predict specialized metabolic pathways nor the metabolites that they produce. For this purpose, my project focuses on the development of a method for specialized metabolic pathway prediction by integrating transcriptomics, metabolomics and comparative genomics.

Evolution of interaction specificity of complex protein-protein interaction networks

Miguel Correa
MADS-box proteins are a large, ancient family of transcription factors. They are involved in flower development, where MADS-box complexes of different composition lead to the development of specific floral organs. Changes in interaction patterns have been linked to flower evolution. Thus, how MADS-box proteins select their interaction partners is crucial to their function; however, it is poorly understood. We seek to comprehend how protein sequence determines interaction specificity by exploiting sequence and interaction data with novel machine learning approaches, with the ultimate goal of obtaining insight into the intimately tied history of flowers and these intriguing proteins.

Detecting copy-number variation in plant genomes

Raúl Wijfjes
In recent years, we have sequenced a increasing number of plant species such as A. thaliana, maize and potato. From this data, it is becoming increasingly clear that differences in the number of genes between plant genomes, known as copy number variation, is used by plants to adapt to unfavourable environmental conditions. For instance, it was found that A. thaliana plants which were more resistant to salt stress contained a larger number of salt-resistance genes compared to non-resistant ones. It would be great if we could exploit this feature to breed stress-tolerant crop species, such as potatoes which are resistant to cold. However, bioinformatics methods which can detect gene duplication within plant genomes were mainly developed for human genomes and have not been properly tested on plant data. In this project, we aim to solve this problem by applying available copy number variation tools to plant datasets and assess which of them are the most suitable for our purposes. In this way, we hope to add copy number variation to the plant breeder's toolbox as a way to create crops which are more tolerant towards extreme conditions.

Novel Enzymes for Fragrance and Flavour

Janani Durairaj
Terpenoids represent a vast and diverse group of natural compounds produced by many plants which usually have a distinctive smell associated with them. This makes them valuable in natural flavour and fragrance products such as orange flavouring and sandalwood scents. Of late, such compounds are being produced by microbial production platforms. The enzymes responsible for the production of terpenoids, the terpene synthases, are very diverse in sequence and there is a great scope for improving their efficiency and specificity to produce compounds of interest.
This project aims to explore properties and mechanisms of terpene synthases, and improve overall catalytic efficiency as well as specificity for certain terpenoid products using machine learning techniques. Specifically, different features detailing the functionally important motifs or residues in each sequence can be found which are useful for predicting either the product catalyzed, or the catalytic efficiency of the enzyme. Good predictors trained using these features will then allow us to make sequences predicted to form a certain product and test their catalytic activity experimentally.

Biosynthetic Gene Clusters in metagenomics communities

Vittorio Tracanna
From soil to ocean, from plant roots to animal guts, the ecosystems in which natural products are found are highly diverse. Also, within these ecosystems the diversity is enormous: a gram of soil is estimated to contain hundreds to thousands of different species that form an extremely intertwined society. The metabolic potential hidden in those communities is immense, and systematic analysis of soils across the globe shows very little overlap between the secondary metabolite repertoire of similar soils. Most of these compounds, however, are locked in unculturable bacteria making standard approaches unfit for their investigation. Metagenomics can overcome culture restrictions by sampling material directly from the environment of interest. In my work, I study biosynthetic gene clusters found in metagenomics communities and attempt to prioritize/characterize them to find new useful molecules.