Thesis subject

Application of NLP for a question-answering system for novice farmers

Agriculture is an important sector of the economy of almost every country. The Netherlands exported over €90 billion worth of agricultural goods in 2019 accounting for around one-fifth of the economy. In many African countries, agriculture is the largest sector of the economy and provides livelihood to most of the population. In the middle east, where most of the countries depended on imports for their food until recently, there is an increasing demand for producing food locally.

In almost all these regions, farming skills are generally inherited from parent-to-child and developed over a long period of mentoring. However, a new generation of young farmers who may have a modern educational training but no prior practical mentoring may lack detailed knowledge about specific farming practices such as dairy farming or vegetable production. These potential new generation modern farmers can benefit from machine learning tools that can provide answers to questions they come across in routine farming activities.

Background

Recently machine learning techniques have provided promising results in answering questions through Questions Answering (QA) systems and systems based on the “man-in-the-middle” intelligent systems. These systems are mainly built using natural language processing (NLP) techniques. QA systems are built with different components such as document processing, query reformulation, passage retrieval, and answer selection. Recently, with the advent of transformer models such as BERT and RoBERTa, researchers are building an end-to-end QA system by fine-tuning the transformer models. Such systems require sample questions and paragraphs where the beginning and the end of the answer text is marked, such as the Stanford Question Answering Dataset (SQuAD).


    Objectives

    There is limited overview and scant understanding about the state-of-the-art knowledge and the state-of-the-practice of question-answering systems in the agriculture and food domains and little is known how NLP is applied in the agri-food sector. Therefore, this study aims to review the existing literature on this topic and demonstrate the value of NLP through a case study. The contribution of the study will be the following:

    • It will explore the state-of-the-art on QA systems for the agri-food domain.
    • It will compile existing datasets and resources that can be used for agri-food QA systems. This might include the development of new datasets or systems that can lead to the quick development of such datasets with minimum effort.
    • It will design a working QA system based on the recent trends in QA, particularly using end-to-end neural network-based QA applications.
    • It will explore and recommend a multi-lingual and cross-modal QA system for agri-food QA system.
      A Cross-modal QA system is aimed to facilitate interacting with the QA system using different modalities such as voice and image-based information access. This provides less literate farmers an easy-to-use interface to the QA system.

      Literature

      • Kung, Hsu‐Yang, Ren‐Wu Yu, Chi‐Hua Chen, Chan‐Wei Tsai, and Chia‐Yu Lin. "Intelligent pig‐raising knowledge question‐answering system based on neural network schemes." Agronomy Journal (2020).
      • S. Gaikwad, R. Asodekar, S. Gadia and V. Z. Attar, "AGRI-QAS question-answering system for agriculture domain," 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India, 2015, pp. 1474-1478, doi: https://sci-hub.se/10.1109/ICACCI.2015.7275820.
      • Gangadharan, Veena, and Deepa Gupta. "Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques." Procedia Computer Science 171 (2020): 1337-1345. -Agriculture related papers in ACL: Mukda Suktarachan, https://www.aclweb.org/anthology/people/m/mukda-suktarachan/
      • Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin. End-to-End Open-Domain Question Answering with BERTserini. https://www.aclweb.org/anthology/N19-4013.pdf
      • Anusri Pampari, Preethi Raghavan, Jennifer Liang, Jian Peng (2018). emrQA: A Large Corpus for Question Answering on Electronic Medical Records.


      Requirements

        Theme(s): Machine learning, Farm MIS, Natural language processing

        Contact person(s)

        • Ayalew Kassahun (ayalew.kassahun@wur.nl)
        • Seid Muhie Yimam (yimam@informatik.uni-hamburg.de)
        • Cagatay Catal (cagatay.catal@wur.nl)