Storage of research data according to internationally accepted standards and annotations (ontologies) will allow meaningful integration of these data. Existing datasets can then be integrated more easily and this can subsequently be utilized more effectively to uncover new relations within research data.

The main objective of this project, executed within DLO, is to design and test an infrastructure to allow genotype-phenotype analysis for crop plants based on the widest available public datasets.

This work will facilitate the analysis of many of phenotypes against large panels of crop accessions through the aggregation of locally held data; and thereby, enable more powerful association analysis, opening the way to understanding of function, candidate gene prioritization, and improved crop breeding. For this, we will develop the infrastructure according to the FAIR principles (Findable, accessible, interoperable and re-usable). Working on exemplar species such as tomato and potato, we will establish a sustainable model for the interaction of distributed phenotypic repositories with defined genomic and sample reference data, in which organizations can expose data to the system through conformity with standards for annotation and interface, allowing the subsequent expansion of the approach to other species and domains.

In addition, the objectives of this project are within the core of the DLO strategic theme on Big Data, namely on the topic of access to data.

Output and impact

  1. Make data interoperable in accordance with the FAIR principles through the development of controlled vocabularies and standardised APIs, proving the concept of a common phenotypic API through which any participant in an open network can advertise the availability of their data in a common domain.
  2. Annotate and submit key exemplar datasets to relevant public archives.
  3. Engage industry in defining priorities in genotype/phenotype annotations.