Thesis subject

MSc thesis topic: Topic Modeling of MGI MSc Research on the Dutch Landscape

Wageningen University’s (WU) Master’s in Geo-information (MGI) Program has, over the past 25 years, produced a large amount of research analyzing the Dutch landscape from a variety of different perspectives and with different methods. Successfully defended MSc theses are available in an electronic format via WU’s MSc Theses online database. Currently, the thesis repository lists more than 360 MSc theses for the Laboratory of Geo-information Science and Remote Sensing (GRS). In this thesis project, we will train a machine learning algorithm to identify themes, topics, and spatiotemporal distribution of thesis research produced by students of the MGI program.

In natural language processing, Latent Dirichlet Allocation (LDA) is a topic modeling approach used to identify latent thematic structures from a large text dataset (also called a corpus). A topic is a set of terms that, in combination, describe a theme. In this research, we will focus on MSc theses, which fall into the category of grey literature. Grey literature is often not included in formal reviews, yet has a significant potential to contribute to knowledge generation. We will first build a corpus from the successfully defended MSc theses. Then, we will transfer a topic modeling workflow developed by Wimhurst et al. (in prep) to analyze which topics have been explored for which locations, and how these topics have changed over time. By doing so, we will have created a systematic overview of how MSc research has contributed to geospatial analysis of the Dutch landscape.

Objectives

  • Build a corpus for the application of an LDA analysis
  • Transfer an existing topic modeling workflow to a new domain

Research Questions

  • Which topics have been studied in GRS MSc research?
  • What are the spatial and temporal trends observed in MSc research?
  • Which topics remain underanalysed in GRS MSc research?

Requirements

  • Geo-information Tools (optional)
  • Geo Scripting (required)
  • Strong Python scripting skills

Literature and information

Expected reading list before starting the thesis research

  • Wimhurst et al. (in preparation) Identifying the Spatiotemporally Common and Unique Research Priorities of the Mississippi River Basin Using a Topic Modeling Approach.

Theme(s): Modelling & visualisation & Human – space interaction