Identifying added value for data-driven consumer science by FAIR-ification of WUR consumer databases in the European Consumer Data Platform
Part of the Food, Nutrition and Health Research Infrastructure
Problem statement / motivation
Although the relation between food, health and sustainability is a very hot topic, finding or collecting the right data on food consumption and its socio-psychological determinants is still a hard job, either by re-using data or collecting data yourself. Data are often fragmented, national and project-driven, not standardised or harmonised across countries and therefore of limited value. Consequently, data is difficult to find and/ or data from different domains cannot be combined. Hence important insights are being missed and important social issues are not resolved or delayed.
Data on food consumption often tends to focus on either dietary intake or socio-psychological determinants rather than a combination of both research domains. Furthermore, they are often collected in experimental settings instead of in real life settings. And currently, many problems arise in dealing with consumer research data that conflict with the FAIR data principles. In fact different research groups or even research projects within the same research group may effectively act as research data silo’s that prohibit data sharing and reuse.
Solution / approach
In order to solve these issues we develop a Consumer Data Platform (CDP) to apply FAIR data principles as standard common practice in consumer science by a hybrid multi-actor approach. The development of this CDP aims to establish an intermediate layer between researchers and their data sets. The development of the CDP requires expertise from different domains: consumer science, IT development expertise, business intelligence and data science. The CDP delivers services to enable FAIR consumer data management. Via the intermediate infrastructural layer online services are provided to the researchers for structured capturing and retrieving data. It is designed in such a way that FAIR data principles continuously guide the process.
A proof of principle was designed for realising desired services from the consumer researcher requirements analysis while implementing technical components from the technical reference framework. In this proof of principle, two datasets were used. The criteria for the selection of these datasets were that the datasets are available to the project team, are comparable (but not identical), have different data formats and different languages. These criteria are driven by the challenges that we expect to face when implementing the CDP in a European setting. To compare health motives from both datasets, responses had to be processed to create homogenous answering categories.
Expected impact of the approach
The proof of principle identified some benefits and opportunities for improvements. The provision of datasets of various investigations via a joint data platform offers the following benefits: datasets can be found and used through a uniform web-interface and also multiple datasets can be used to answer broad research questions – provided that the metadata are described well and that classification elements are linked to common concepts. By using tools as Power BI and Excel, data can be accessed without any knowledge of underlying data sources.
The services that are provided for consumer researchers have to be more easy to use and ensure the highest degree of data safety and security while protecting the rights of the individual data suppliers. The next step will be an advanced authorisation and authentication module, required to ensure data safety and security. This functionality is essential when a minimal viable product is to be established. Examples of other services that are on the shortlist for further development are: - A quick search engine for smart searching through linked datasets - Tooling for easy ontology creation and updating - A permission registry module for setting data access rules - A module for questionnaire management