Thesis subject

Predicting Disease Outbreaks through media articles with machine learning models (MSc)

When a new infectious disease emerges in a country, it can cause significant harm to both public health and the national economy. As a result, it is crucial to identify potential new diseases and develop strategies to mitigate their impact. Various infectious diseases are currently posing a threat to global public health, underscoring the importance of identifying patterns of emerging infectious diseases.

Short description

Identifying and comprehending the patterns associated with newly emerging infectious diseases in a country is crucial due to the profound impact they have on both human health and the national economy. Anticipating and making preparations for these diseases is of utmost importance. The world at large is currently witnessing the emergence of diverse infectious diseases, posing a significant threat to the well-being of individuals worldwide. Therefore, it is imperative to recognize and gain insights into the patterns linked to these emerging infectious diseases.
This study will investigate the feasibility of using AI algorithms, such as natural language processing (NLP) and machine learning, to analyze online media data and develop predictive models for disease outbreaks. With the increasing availability of real-time information on the internet, social media, and news articles, harnessing this data can provide valuable insights into the spread of diseases and enable proactive measures for disease prevention and control.


Objectives

  1. Conduct a comprehensive literature review on AI techniques, disease surveillance, and prediction models.
  2. Identify, collect and preprocess relevant online media data, including news articles, social media posts, and health-related websites.
  3. Develop a data mining framework to extract relevant features from online media data, such as disease keywords, locations, and sentiment analysis.
  4. Explore different AI algorithms, including NLP techniques, machine learning models (e.g., classification, regression, clustering), and deep learning architectures, for predicting disease outbreaks.

    Tasks

    The work in this master thesis entails:

    • Literature review: Conduct a review of existing research studies, on AI techniques, disease surveillance, and prediction models. This will provide a foundation of knowledge and identify research gaps.
    • Data collection and preparation: Gather relevant data, collect and preprocess online media data related to disease outbreaks, (if possible )utilizing appropriate APIs, web scraping techniques, and data cleaning methods.
    • AI models development: Develop AI models capable of predicting disease outbreaks using online media data.
    • Results reporting and documentation: Prepare a comprehensive report summarizing the research methodology, results, and conclusions.


    Literature

    • Kim, J., Ahn, I. Infectious disease outbreak prediction using media articles with machine learning models. Sci Rep 11, 4413 (2021). https://doi.org/10.1038/s41598-021-83926-2
    • Marvin, H. J., Hoenderdaal, W., Gavai, A. K., Mu, W., van den Bulk, L. M., Liu, N., Frasso, G., Ozen, N., Elliott, C., & Manning, L. Bouzembrak, Y. Global media as an early warning tool for food fraud; an assessment of MedISys-FF. https://doi.org/10.1016/j.foodcont.2022.108961
    • Gavai, A. K., Bouzembrak, Y., van den Bulk, L. M., Liu, N., van Overbeeke, L. F., van den Heuvel, L. J., Mol, H., & Marvin, H. J. (2021). Artificial intelligence to detect unknown stimulants from scientific literature and media reports. Food Control, 130, 108360.


    Requirements

    • Courses: Programming in Python (INF-22306), Data Science Concepts (INF-34306) or Machine Learning (FTE-35306)
    • Required skills/knowledge: Food and health, Machine Learning

      Key words: Artificial Intelligence, diseases detection, early warning systems, food and health

      Contact person(s)

      Dr. Yamine Bouzembrak (yamine.bouzembrak@wur.nl)
      Prof. Bedir Tekinerdogan (bedir.tekinerdogan@wur.nl)