Global access to plant breeding data resources

Use case

Global access to plant breeding data resources (BrAPI)

Plant breeding is a complex field that integrates data from many disciplines, each with their own standards and data structures. The size of datasets used in plant breeding research is growing and the types of data used is expanding. This is due to the increased use of technologies such as genomics, climate and high-throughput imaging from drones and satellites, but also data from the processing industry to the consumer. Plant breeding is one of the approaches widely recognised as crucial to feeding a rapidly growing population.

A major challenge is to assess the response of current food crop cultivars to different climates and to utilise the phenotypic and genotypic observations made on these cultivars effectively in order to translate this into knowledge to breed more effective cultivars which can produce optimally in a changing environment. Doing so requires standardised access to breeding data, which is only possible through strong international collaboration.

Our approach: BrAPI

WUR is part of a strong international network that seeks to define a uniform language to exchange plant breeding data types. To provide uniform access to plant breeding data, a global consortium consisting of the leading plant breeding research institutes has defined a standardised communication language known as the breeding API (BrAPI). Standardising data is a lengthy process that is accelerated during week-long hackathons. This is the only way to define the semantics of each attribute, which often have to be revisited once or twice before they are fully understood and accepted by the community, for instance through open source platforms and open discussions like hackathons and the GIT repository. BrAPI re-uses existing data standards as much as possible and is compatible with both the FAO Multi-Crop Passport Descriptors and the minimum information standards for plant phenotyping. BrAPI specifies a standard interface for plant phenotype/genotype databases to serve their data to crop breeding applications. It is a shared, open API to be used by all interested data providers and data consumers.

BrAPI aims to solve more than just theoretical problems, developing the API standard was driven by the implementation of several use cases and (br)apps. Examples of these use cases include a full-text search mechanism of data available via BrAPI (via the H2020 funded project ELIXIR, including WUR), Visualise geographic information about germplasm and trails by plotting data on a map (HIDAP, CIP), genotype visualisation by the stand alone application Flapjack (James Hutton Institute) and a module to load data from any BrAPI data source right into the statistical analysis software R (CIP and WUR). BrAPI is the main data transfer communication protocol in the integrated breeding platform, which is used by various institutes, including several CGIAR centres.

(Expected) impact of the approach

BrAPI has introduced a standard exchange format for plant breeding data and has been implemented in more than thirty databases for a number of different crops, including tomato, rice and cassava. We described several examples of applications capable of using the data from these BrAPI-enabled databases and tools. In the short term, we expect to see the first examples of pipelines developed to carry out field trials, to perform genomics selection or even to support breeding decisions, allowing the seamless integration of data from multiple BrAPI endpoints. The ability to easily combine data with the development of standardised analysis pipelines will allow breeding research and the breeding industry to develop more stable crop yields in regions with suboptimal growing conditions.

Picture onions pbdr.jpg

Next steps

  • Formalise the API using a proper ontology/data model to further decrease misunderstandings among implementing partners
  • Expand the API to include novel datatypes, such as images from drones
  • Enhance the current API by providing these semantics in a JSON-LD context
  • Add the missing metadata layers to the API to become fully compliant with the FAIR data principles
  • Develop the tools and strategies to allow the discovery of appropriate data and integrate them seamlessly; further showcase added value of this integration

Facts & Figures

Passport data (description of the biological material, including name, origin, collection date, etc.), phenotype data (which trait was observed, how was it observed and what is the value of the observation? E.g. grain yield or fruit colour) and genotype data (plant variation at the molecular level in terms of single nucleotide polymorphisms; the technology used to determine SNP variation).

  • BrAPI community: 25 institutes/projects
  • 17 organisations with BrAPI endpoints
  • 15 tools/brapps available from the BrAPI community
  • 69 different calls

Cooperation with partners

  • WUR Plant Science Group (Richard Finkers / Eliana Papoutsoglou / Maikel Verouden)
  • Bioversity International
  • Boyce Thompson Institute, Cornell University
  • Crop Ontology
  • Diversity Arrays Technology
  • Earlham Institute
  • CGIAR – Excellence in Breeding Platform
  • INRA
  • ICRISAT
  • CIMMYT
  • CIP
  • IRRI
  • CIRAD
  • IPK
  • VIB
  • iBET
  • GOBii
  • USDA-ARS
  • University of Arizona
  • Leafnode
  • John Innes Centre
  • Kansas State University
  • BMS
  • The James Hutton Institute
  • Bill & Melinda Gates foundation
  • H2020 ELIXIR program (research infrastructure)
  • H2020 Emphasis
  • H2020 G2P-SOL project (test of application on genebank collections; approx. 10K accession for each of the following crops: tomato, potato, eggplant, pepper.