Thesis subject
MSc thesis topic: Topic Modeling of MGI MSc Research on the Dutch Landscape
Wageningen University’s (WU) Master’s in Geo-information (MGI) Program has, over the past 25 years, produced a large amount of research analyzing the Dutch landscape from a variety of different perspectives and with different methods. Successfully defended MSc theses are available in an electronic format via WU’s MSc Theses online database. Currently, the thesis repository lists more than 360 MSc theses for the Laboratory of Geo-information Science and Remote Sensing (GRS). In this thesis project, we will train a machine learning algorithm to identify themes, topics, and spatiotemporal distribution of thesis research produced by students of the MGI program.
In natural language processing, Latent Dirichlet Allocation (LDA) is a topic modeling approach used to identify latent thematic structures from a large text dataset (also called a corpus). A topic is a set of terms that, in combination, describe a theme. In this research, we will focus on MSc theses, which fall into the category of grey literature. Grey literature is often not included in formal reviews, yet has a significant potential to contribute to knowledge generation. We will first build a corpus from the successfully defended MSc theses. Then, we will transfer a topic modeling workflow developed by Wimhurst et al. (in prep) to analyze which topics have been explored for which locations, and how these topics have changed over time. By doing so, we will have created a systematic overview of how MSc research has contributed to geospatial analysis of the Dutch landscape.
Objectives
- Build a corpus for the application of an LDA analysis
- Transfer an existing topic modeling workflow to a new domain
Research Questions
- Which topics have been studied in GRS MSc research?
- What are the spatial and temporal trends observed in MSc research?
- Which topics remain underanalysed in GRS MSc research?
Requirements
- Geo-information Tools (optional)
- Geo Scripting (required)
- Strong Python scripting skills
Literature and information
- Chang et al. (2009) Reading Tea Leaves – How Humans Interpret Topic Models. Neural Information Processing Systems, pp. 9.
- Mohr & Bogdanov (2013) Introduction—Topic models: What they are and why they matter. Poetics 41(6), pp.545-569.
- Wimhurst, Koch & McPherson (2025) OSF Repository: Topic Modeling - Mississippi River Basin Literature.
Expected reading list before starting the thesis research
- Wimhurst et al. (in preparation) Identifying the Spatiotemporally Common and Unique Research Priorities of the Mississippi River Basin Using a Topic Modeling Approach.
Theme(s): Modelling & visualisation & Human – space interaction