Thesis subject

MSc thesis topic: Agent-based modelling of historical language contact

When speakers of different languages interact, the resulting language contact is known to influence the languages themselves. Words or other language features (e.g. grammatical structures) are transferred from one language to the other. Over decades, centuries and millennia, this may cause geographic areas to become similar in their features. Detecting these contact areas is crucial to understand the interaction of languages and trace back human history.

However, when we find language features of one language back in another one, this does not automatically imply that there has been language contact. Two other processes may cause languages to share features: inheritance and universal preference. Inheritance happens when, over many generations, a language splits into dialects and eventually into new languages, while keeping features from the original language. Universal preference occurs because our brains confine the processing of thoughts and the expression of such thoughts with our speech and gesture systems. As a result, languages may share a feature without any contact or inheritance.

It is not straightforward to attribute shared language features to contact, inheritance, and universal preference, as the related processes interact, and on top of that, communities migrate. Ranacher et al. (2021) have developed a method, sBayes, to estimate the relative role of language contact, as opposed to the other two processes, in creating similarities between languages. The method promises to identify contact areas from empirical data using (Bayesian) inference.

Ranacher et al. (2021) tested their approach first on simulated data (951 languages randomly assigned to locations in space) and then on two case studies to reveal language contact in South America and the Balkans. Yet, neither of these experiments can fully validate the approach, because the actual contributions of language contact, inheritance and universal preference are not known, even not in the simulated data, because also these are based on existing languages with unknown histories.

The aim of this thesis is to validate sBayes with artificial languages of which the full evolution is known. An agent-based model (ABM) is to be developed for this purpose. Agents in this model represent language communities with language features as their attributes. The agents are initialized with some common language features to represent universal preference. When agents interact, features are exchanged with predefined probabilities, leading to contact areas. Over time, new agents are created that inherit features from their ‘parents’. Furthermore, agents may migrate. Model runs where these three processes are switched on and off, one at a time, are used to generate datasets on which sBayes is validated. That is, sBayes is expected to identify the process(es) and their effects, including the corresponding areas of interaction, when turned on, and attribute a low or zero probability to them when turned off.

Relevance to research/projects

This project is a collaboration with Peter Ranacher and Robert Weibel (University of Zürich, Switzerland). These researchers can help you to conceptualize language contact for the ABM and to set up sBayes.

On a broader level, one can generalize the project to simulating and finding past traces of interaction in human evolution. Agents and their attributes represent culture, of which language is one example. Other examples are music, beliefs, social norms, agricultural practices, art, tools, cuisine, or even the prevalence of specific pathogens. Agents interact, which makes their attributes similar, corresponding to cultural accommodation. At the same time, confounding effects also shape the agents’ attributes, introducing additional similarities and dissimilarities. In the case of language evolution, the confounders are inheritance and universal preference. Other confounders could be climate or the environment. The project shows how and to what extent subtle traces of past interaction can be inferred from entangled similarities in cultural data even in the presence of confounders, with potentially significant implications for reconstructing human history.


  • Develop an agent-based model of language communities.
  • Systematically perform model runs to serve as validation data for sBayes.
  • Validate sBayes with the generated datasets.


  • Ranacher, P., Neureiter, N., van Gijn, R., Sonnenhauser, B., Escher, A., Weibel, R., Muysken, P., Bickel, B. (2021). Contact-tracing in cultural evolution: a Bayesian mixture model to detect geographic areas of language contact. Journal of the Royal Society Interface, 18(181): 20201031.
  • Bowern, C., & Evans, B. (Eds.). (2015). The Routledge handbook of historical linguistics. Routledge.
  • Civico, M. (2019). The Dynamics of Language Minorities: Evidence from an Agent-Based Model of Language Contact. JASSS 22 (4) 3. DOI: 10.18564/jasss.4097


  • Passed the course Spatial Modelling & Statistics (30306), or another course in which agent-based modelling is taught.
  • Conceptual-thinking skills.

Theme(s): Modelling & visualisation; Human – space interaction