The aim of my 8 week visit to Austin was to collaborate with an expert on computational linguistics, Kyle Mahowald (pictured holding his dog ‘Fogo’). In this blog post, I'll discuss why computational modelling is useful, what large language computational models are, and how we used them to test our theory (described in part 1 of my blog), as well as sharing some of the things I got up to in Austin.
Why use computational modelling?
A computational model uses computer programs to simulate and study complex systems, such as human language. If we use computational models that are trained on huge amounts of text alone, and they demonstrate the same broad patterns of human interpretations of pronouns, then it supports our theory because it indicates there is no need for a model to acquire pronoun constraints with any in-built grammatical knowledge.
Why large language computational models are more austin-ishing than traditional computational models.
When we read or listen to language, we need to use contextual information to understand what words like pronouns mean. Whist “traditional” computational models are very capable of processing and generating sequences of text (that is, they can make sense of a bunch of words that are connected to each other, like in a story or a conversation), a new generation of computational models has emerged that are better at handling language context capturing relationships between words that are far apart in the text sequence. These models are called "large language" models and use "transformers" to process text. Transformers help the model break up the text into small segments, then process each segment separately using attention mechanisms that help the model focus on the most relevant parts of the text. This allows large language models to capture language context and understand the relationships between different parts of the text more effectively than traditional models. In turn, a further but crucial advantage of the greater power of large language models is that their training on text data is not really restricted, and so that comes in the form of millions of books and websites.
The current state-of-the-art in computational modelling.
Enter the family of large language model called “GPT-3”, which have been getting released in various forms since 2020. The coolest thing about these models for us regular folk is that they can show off their language knowledge in language tasks (e.g., filling in the blanks or question answering), without really requiring much under the hood “fine-tuning” of their architecture if at all. That is, they are becoming more “off-the-shelf" and accessible to the public. It’s timely here to mention where “Chat GPT” fits in to all of this – if you haven’t heard of this yet then a simple internet search will show that this large language model, nicknamed “the google killer” seems to be taking over industry whilst also merging computational modelling into pop culture! “Chat GPT” is indeed part of the large language model and “GPT-3” family, as are other predecessor models like “text-davinci-003".
However, whilst “Chat GPT” and “text-davinci-003" are great for industry and pop culture whose metric is essentially “the more human-like, the better", they are not well suited to testing our pronoun theory. In a nutshell, the reason is that those models are not trained on text alone, but also something known as “reinforcement learning” - essentially they’re reward models trained from comparisons by human judgements (for more info, see here). That makes them cleverer, but we had to turn to simpler models aptly named “text-davinci-002" and “davinci” in order to show that large language models trained with huge corpora alone with no in built grammatical knowledge (albeit trained to understand how to perform tasks) – can acquire the same constraints on pronoun interpretation that we have demonstrated empirically with adults.
Our Tex-cellent computational results.
Overall, the models were a pronoun-ced success, demonstrating broadly similar patterns to humans in response to our pragmatic manipulations. Take the following scatterplot as an example, with “davinci” responses on the vertical y axis and adult responses on the horizontal x axis: a clear positive correlation that they both interpretated ‘telling’ events (red) bias a subject interpretation of a pronoun, whereas ‘asking’ events (blue) bias an object interpretation of the pronoun.
Our main task was to give the model a forced choice option, with an explicit question to extract what it interprets the pronoun as referring to - for example:
Sentence: Samuel misled Oliver about himself.
Question: Who does "himself" refer to in the above sentence?
A: Samuel
B: Oliver
Answer: _________
Computational linguistics isn’t the only scene to feast on in Austin.
Thanks to Kyle and some friends I made along the way, I dined at some top tier BBQ and Tex-Mex restaurants, and got to some of the hip cocktail bars and coffee joints of East Austin. I'm also a fan of the spirit of USA, and just enjoyed the optimism and big-thinking of Americans, alongside learning about the history and nature of Texas at places like the Bullock Museum, The Ladybird Wildflower Centre, and the downtown riverside hiking trail. I saw some live music in the form of jazz and hard rock clubs. And by the way, it cleared 20 Celsius every day except for one week – not a bad place to spend my entire January and February - and this enabled me to recharge my mental energy from time-to-time with some trips to the nearby tennis courts.
It would have been predictable to kick off this blog with a "thank you styled letter" regarding the personal opportunity that has been created through the generosity of Kyle for making all that I learned so accessible, and to LuCiD's Travel Award that has funded everything. But when I personally reflect upon my visit, the forefront of my thoughts is my overall gratitude for these inspiring levels of generosity and recognition.I should also give a shout out to the 8 weeks of computational linguistics seminars that I attended, where we read a paper on each occasion - I asked a lot of naïve questions but learned a *lot* from everyone!