Structural variation between genomes is usually assessed by comparing the finished genomes. However, when comparing many genomes of an organism it is not affordable to obtain sufficient next-generation sequencing (NGS) data to assemble novel genomes for each individual. In such cases, the novel genomes are resequenced at lower depth and the resulting reads are mapped to a reference genome . A problem is that novel genomic content will not be taken into account, and that comparison between two individuals will always have to go through a (potentially dissimilar) reference genome.
A solution is to combine NGS data and create a colored co-assembly graph. In such graphs, nodes represent contigs, edges links between these contigs and colors indicate which nodes and edges are present in which organism . Structural variation between genomes leads to particular structures in such graphs, such as branches, bubbles and cross-links . The goal of this project is to explore methods for mining such graphs for interesting structures, which can be related to structural variation between genomes. The desired outcome is a method that takes assembly graphs as input and produces an annotated list of structural variation detected.
 P. Medvedev et al. (2009). Computational methods for discovering structural variation with next-generation sequencing. Nature Methods 6(11 Suppl), S1320.  Z. Iqbal et al. (2012). De novo assembly and genotyping of variants using colored De Bruijn graphs. Nature Genetics 44(2), 22632.  J.F. Nijkamp et al. (2013). Exploring variation aware contig graphs for (comparative) metagenomics using MaryGold, Bioinformatics 29(22):2826-34.
Used skills: Genomics, programming.
Requirements: INF-22306 Programming in Python, BIF-30806 Advanced bioinformatics, ABG-30306 Genomics