BSc and MSc Thesis Subjects of the Bioinformatics Group

On this page you can find an overview of the BSc and MSc thesis topics that are offered by our group. Please contact the project supervisor when you would like to learn more about a specific project.

MSc thesis: In the Bioinformatics group, we offer a wide range of MSc thesis projects, from applied bioinformatics to computational method development. Here is a list of available MSc thesis projects. Besides the fact that these topics can be pursued for a MSc thesis, they can also be pursued as part of a Research Practice. If you consider doing your thesis project in our group, please email the thesis coordinators at

BSc thesis: As a BSc student you will work as an apprentice alongside one of the PhD students or postdocs in the group. You will work on your own research project, closely guided by your supervisor. You will be expected to work with several tools and/or databases, be creative and potentially overcome technical challenges. Below you will find short descriptions of the research projects of our PhDs and Postdocs. In addition you can take a look at the list of MSc thesis projects above. Please contact the thesis coordinators at to discuss your interests.

BSc thesis topics

Meiotic recombination in crops

Roven Fuentes
Meiotic recombination is a fundamental biological process that ensures balanced chromosome distribution and the formation of new allelic combinations. Breeders rely on this mechanism to develop new variety of crops with a collage of different alleles encoding for higher and better yield, tolerance to certain diseases and stresses, and resilience to the outcomes of climate change. Understanding the genomic features that influence the non-uniform distribution of crossovers gives insights on possible barriers or promoters of recombination, aiding introgression hybridization and precision breeding. For example, a large inversion result to unsuccessful chromosome synapse, consequently preventing recombination in the inverted region. We aim to develop a high-throughput and cost-effective approach of profiling crossovers and identify predictive features of crossovers on different crops. This profiling is particularly important for long-generation crops because it may reveal in advance issues like linkage drags or low recombination frequency in regions of interest, saving precious time and allowing breeders to address them earlier. We also plan to develop a system for breeders that predicts the possible landscape of recombination for a specific hybrid cross without the need for actual crossing.

From gene to networks: bridging the gap between two temperature-regulated plant reproductive traits

Aalt-Jan van Dijk

Both ambient temperature-controlled flowering time and seed dormancy setting are temperature-regulated traits, which have been recently proposed to be interconnected. Plant reproductive success is the result of the evolution of complex gene regulatory networks. A challenge is to interconnect the variety of publicly available information, such as transcriptomic and genome-wide chromatin immunoprecipitation data sets. We hypothesize that the plant reuses the same regulatory modules throughout development. For instance, the same transcription factor, can positively or negatively regulate a set of genes depending on its differential interaction with other proteins.
In this project, you will perform a network analysis starting from a set of 306 genes known to be relevant for flowering, and from publicly accessible gene expression data of Arabidopsis fruit tissue. Initially, a search for available high quality data-sets will be performed, followed by network analyses. Prior knowledge will be used to evaluate predictions and to improve the accuracy in estimating the gene network structure. The ultimate goal is to build a highly connected network and to identify potential hubs in it. Potentially, once the network is fully built and validated, it can be further used for predicting relevant genes and their functions.

Finding genes for traits using systems genetics

Margi Hartanto
Genome editing promises to revolutionize plant breeding because it allows accurate and efficient modification of genes to improve crop traits. Both for large-scale plant phenotyping and genotyping a range of high-throughput methods are becoming available, but there are no systematic methods to subsequently link the genes to traits, to find the targets for modification. A method potentially capable of this is Quantitative Trait Locus (QTL) analysis, which is used to identify genomic regions affecting a 'continuous' trait (like plant height, or seed size). However, two main issues prevent QTL analysis from being used systematically: first, its low resolution, with identified DNA regions that can span hundreds of genes; and second, its lack of power when dealing with complex traits affected by many genes with possibly small effects. In this project we develop and apply systems genetics approaches to integrate QTL analysis with various kinds of molecular interaction data. By combining gene annotation and genetic variation with gene expression and phenotype measurements, we identify molecular networks underlying plant traits. These serve to identify key regulatory genes and predict the effects of naturally occurring genetic variants. The methods and predictions will be made available in the AraQTL workbench at

Structure/function prediction of lipopeptides

Barbara Terlouw
In nature, microbes such as fungi and bacteria produce a vast range of secondary metabolites to gain a selective advantage over other organisms. I am working on a specific group of metabolites called lipopeptides, and am particularly interested in their antibiotic potential. Several lipopeptide antibiotics have already been discovered, and the immense structural diversity of lipopeptides suggests that there may be many as yet undiscovered lipopeptide antibiotics out there. Lipopeptides are often produced from biosynthetic gene clusters; groups of physically clustered genes that together encode a pipeline responsible for the production of a secondary metabolite.  Unfortunately, it is still difficult to predict the structure and the function of a lipopeptide from the DNA sequence of a biosynthetic gene cluster. Therefore, my research attempts to first predict the structure of a lipopeptide from its biosynthetic gene cluster sequence, and then from the structure infer its function. This will help with the discovery and possibly engineering of novel lipopeptide antibiotics.​

Genome-guided discovery and structure prediction of novel bio-surfactants

Mohammad Alanjary
Surfactants are integral compounds found in cosmetics, industrial cleaners, and food. Mounting pressures to replace synthetically derived surfactants with environmentally friendly, low toxicity, bio-surfactants have steadily grown in various applications (e.g. dispersants used in oil spill cleanup). Lipopeptides are naturally occurring compounds that show great promise for sustainable bio-surfactants and are found in a breath of bacterial species including well-studied Bacillus strains. This project aims to chart the diversity of lipopeptides from bacterial genomes and to further develop structure prediction methods to generate targeted leads with desirable properties. Comparative analysis of the genetic diversity of lipopeptides will also aid re-engineering efforts for effective production at industrial scales and reduce dependence on fossil fuels.

Linking the metabolome and genome

Justin van der Hooft
The central theme of my research is the integration of metabolome and genome mining tools to accelerate and improve functional annotations of biosynthesis genes and specialized molecules. One the one hand, I am working on improving workflows to maximize the structural information obtained from mass spectrometry fragmentation data. On the other hand, I will develop and extend existing workflows that recognize patterns of co-localized genes in predicted biosynthesis genes clusters. Both the metabolomics and genomics workflows focus on the recognition of molecular substructures as building blocks of more complex natural products. As these tools will provide complementary structural data on the specialized molecules produced by microbes, fungi, and plants, I will finally integrate those workflows to boost natural product discovery.

Pangenomic applications for plants and pathogens

Eef Jonkheer
As a result of the advances in NGS sequencing there is a gradual shift from representing a species by a single reference genome sequence to representing it by a pangenome. A pangenome is a data structure containing all genomic variation in a species or population. In my project, we aim to develop pangenomic applications for plants and pathogens to demonstrate the advantages of pangenomes: pangenome-based discoveries and improved efficiency and/or accuracy. We will build upon the existing pangenome framework of PanTools, and exploit its existing features to develop new computational methods for highly efficient comparative analysis of large numbers of related genomes. One line of research focusses on gene-level analyses in large sets of genomes. For example, inferring the evolutionary relationships across thousand bacterial strains or identifying a subset of genes in tomato which contribute to a particular trait like drought tolerance. The other line of research is aimed at discovery and exploration of genome-wide variation, from single-nucleotide polymorphisms to large structural variation.

Linking metagenomics and metatranscriptomics to study the endophytic root microbiome

Lotte Pronk
Like most higher organisms, plants have a microbiome that protects them against diseases and stimulates their growth. In this project, we will study the microbial community that lives inside plant roots. By linking metagenomics and metatranscriptomics, we attempt to find out what the function of these microbes is and how they interact with each other, with pathogens and with the plant. Microbes and plants produce a wide array of secondary metabolites, which are used to interact with other organisms in their environment. Secondary metabolites can function as antibiotics, pigments, and effectors. Additionally, they can help with nutrient acquisition and communication. The genes encoding the pathways for secondary metabolites are often co-localised in clusters on the genome. Over 700 of these so-called biosynthetic gene clusters (BGCs) have been predicted to be present in the endophytic root microbiome. We want to know which BGCs are involved in pathogen suppression and what their exact function is.

Exploiting variation in lettuce and its wild relatives

Dirk-Jan van Workum
Advancing in the genome era, sequencing one’s plants for the identification of variation underlying traits of interest is more accessible than ever. Comparing results between species has therefore become important as well (comparative genomics). In this research, which is part of the LettuceKnow project ( ), we use a pangenomic approach to search for genomic and transcriptomic variation across species within the genus Lactuca that can explain traits lettuce breeders are interested in. A pangenome is a hypothetical construct in which multiple fully sequenced genomes are combined into one representative reference genome. Specifically, we aim to develop new functionalities within the pangenomic framework PanTools (which is being developed at Wageningen University) to integrate genomic, transcriptomic and phenotypic data for accurate interspecies comparisons. These functionalities are immediately used for the analysis of the large amounts of sequencing and phenotype data within LettuceKnow, which will be a starting point for collaboration with molecular biologists.