Using graphical models for meta-analysis to study epistasis in Arabidopsis

Graphical model is a powerful statistical model to study the dependency structures with the help of a graph that describes conditional independence relationships between random variables. These graphs (or networks) present a set of nodes and edges, where nodes represent variables and edges represent conditional dependency relations.

In this project, we aim to reconstruct pairwise marker-marker associations networks in six different F2 populations in A.thaliana, which each derived from a cross between different accessions and a common parent (Col-0). Beside this, to gain more power, we combine these F2 populations to reconstruct a “global” marker-marker associations networks in A.thaliana. And simultaneously we correct for heterozygous populations in our network analysis. The aim of this project is to implement undirected graphical models to detect combinations of alleles that do not work well when brought together in the genome of progeny. Using graphical models to jointly studying subpopulations and integrated data may help us to understand the process involved in epistasis in A.thaliana as a model organism in plant biology.

The steps of this MSc project is to:

1. Download different A.thaliana genotype datasets,

2. Combine different A.thaliana datasets from step 1 as a single dataset,

3. Implement a network reconstruction algorithm to individual A.thaliana     populations (from step 1),

4. Implement a network reconstruction algorithm to the combined data from step 2,

5. Compare individual networks from step 3 to the estimated network from step 4 (similarities and differences between networks),

6. Use the estimated networks to detect potential loci in the genome of A.thaliana that cause epistasis,

7. Compare/confirm our findings using literature.