Modelling Word Order and Cognitive Constraints

Posted on: 27/09/2024 28/09/2024
Author:

RA position
Amsterdam
Posted 2 years ago

Website Institute for Logic, Language and Computation, University of Amsterdam

The languages of the world show a wide range of variety, but this variety is often constrained. It has been hypothesized that various cognitive biases shape these constraints. One example of such a bias is Uniform Information Density – it has been hypothesized that information that is being transmitted should be evenly distributed throughout an utterance, maximizing efficient communication. With large language models, we can now easily calculate to what extent linguistic structures adhere to this hypothesized principle, as the surprisal values that a language model yields when processing text has been found to correlate with the information load that humans experience when processing those texts.

While we use language models, the research question in this project is fundamentally about human language. You will use large language models to experiment with the information density of linguistic structures, either using artificial language or natural language data from corpora. In particular, we are interested in cases where a language’s grammar allows for multiple word orders to be used – do speakers tend to choose the one that optimizes uniform information density? Another interesting avenue would be to explore data from languages with typological properties that are quite different from English, as this has not really been studied much yet. Work on other cognitive biases that have been theorized to shape word order is possible as well within this project. So if you happen to speak such a language, please do apply to this project!

You will be working in the context of a four-year project that started in Sept 2024. Research Assistantship funding is available for this project for particularly excellent candidates. The project takes place at the UvA in an interdisciplinary setting with PIs Jelke Bloem (NLP expertise) and Marieke Schouwstra (Linguistics expertise), and PhD student Maria Tepei (Linguistics & NLP). You will join the NLP & Digital Humanities group at the ILLC, where, in addition to the weekly supervision meetings, we have regular academic events such as the Computational Linguistics Seminar. In addition, you will have opportunities to interact with linguists from the linguistics department, for example in the context of the ACLC research group “Computational and Corpus-based Approaches to Language and Literature”.

This is an academic research project, so we expect a scientific attitude towards the topic. Some background in linguistics and an interest in Natural Language Processing (NLP) are beneficial, and Python programming skills are a requirement for this project. If you speak a language that has not been studied yet in this context, that would be great, but it is not a requirement – English and Dutch have plenty of weird word order phenomena that can be investigated in this way as well.

To apply for this job email your details to j.bloem@uva.nl