Use case

Improving access and interoperability of open data for the agri-food sector with the AgroDataCube

Many valuable open data sources are available for the Netherlands that can improve data science and decision making in agriculture and food. However, these data sources are still scattered and are published using a range of different, standardized and non-standardized formats and protocols. This means that substantial efforts are required to find, collect and combine such data over and over again, to feed the many applications that use such data. The AgroDataCube functions as a hub that brings together these heterogeneous data streams, enriches them, adding in-house analytics, and publishes the result as harmonized, up-to-date, standardized datasets accessible through an open REST API (agrodatacube.wur.nl).

In 2018, version 2 of the AgroDataCube has been developed. Through integration with “GroenMonitor”, the AgroDataCube now also provides a remote sensing based vegetation index (NDVI) at sub-parcel resolution. Such vegetation indices are used for research (e.g. crop modelling and yield forecasting), by farmers to monitor the development of their crops,
or to monitor agricultural practice, e.g. complying with CAP regulations.

Our approach: Merge, harmonize and publish

Many distributed data services relevant for ththe agri-food domain already feed into the AgroDataCube. These sources are heterogeneous with regard to different aspects. While for instance remote sensing data or weather data are voluminous, available on a daily basis and are
processed near-real time, soil data and parcel data are smaller and relatively static. The AgroDataCube automatically structures and harmonizes the incoming data streams, and links their spatial and temporal dimensions. This means that for example time-series of weather data or NDVI (Normalized Difference Vegetation Index) data can be retrieved on the level of agricultural parcels. Data is delivered in a standardized format and therefore easily reusable, for instance in data analytics tools and decision support systems.

AgroDataCube currently provide data services that publish spatially and temporally explicit data from the following resources:
-      Agricultural parcels and parcel attributes (parcel geometries and crop information from BRP, AAN)
-      Soil data (Soil map 1:50.000, BOFEK)
-      Weather data (observations from KNMI stations)
-      Vegetation data (NDVI, based in-house analytics of data from different satellites)
-      Elevation (AHN)
-      Administrative regions (NUTS and postal codes)

Impact of the approach

The AgroDataCube has gradually evolved as a bottom-up, grassroots driven initiative (with contributions via GitHub and Gitter), taking stock of and implementing the requirements of a range of research and commercial projects and hackathon communities. In its first year of operation it had already attracted about 1,000 individual users. Its data services are now used by several research projects and commercial and non-commercial applications. They already benefit from easy access to harmonized, curated and structured data provided by the AgroDataCube as a one-stop-shop for agri-food data. For example, the AkkerWeb farm management system uses AgroDataCube NDVI data to inform farmers remote sensing based information on crop development. In European research, the AgroDataCube is used to feed large scale crop phenology algorithms and crop growth models. As the amount and quality of the services is growing, new applications are under
development and interest to reuse the data is rising, we expect that the use of AgroDataCube services will increase further in the near future.

Next steps

In the future, datasets will be continuously updated and new, domain-relevant datasets added. Ambitious steps are taken to improve the usability of the AgroDataCube and broaden the range of applications that can use it. To increase accessibility and to facilitate the use of AgroDataCube data with other data services while preserving privacy and data ownership, access to the AgroDataCube will be integrated with an authorisation register (JoinData). Demonstrators will be developed to show how open and private data can be safely and securely merged and reused, which can again be taken as templates for new applications.

To prepare for future usage scenarios, the AgroDataCube is prepared to be able to handle higher loads, and the infrastructure is optimized for use in data science and big data applications, e.g. large scale modelling and machine learning. It is deployed in a new
cloud-based infrastructure, using among others cluster and container technology and high-performance computing to ensure better scalability, availability and performance. Better-suited paradigms and technologies (MapReduce, Spark, NoSQL databases, etc.) will then allow developing more efficient big data analytics and contribute to a future proof infrastructure for agri-food research.

The AgroDataCube offers its data services through an API. It provides a standardized interface and data format (JSON/GeoJSON) that easily
integrates with the analytical, geospatial and software development tools commonly used by data scientists and application developers. AgroDataCube adopts an open innovation approach. Data can be accessed and reused without costs, but through a usage model, large-scale commercial users will become paid contributors in the coming years.

Tools used:

Apache Spark
Apache Cassabdra
GeoMesa, GeoTrellis
Java, Scala, Akka framework
Python, Jupyter Notebooks
PostgreSQL / PostGIS
Microsoft Azure
Standards: HTTP REST, JSON, GeoJSON

Contact:
Rob Lokers (rob.lokers@wur.nl)