My friends don’t really understand what I do in my job. I study how children learn language. Half my friends believe that I spend all day attaching electrodes to people’s heads to study their brains. The other half assume I spend hours just listening to what children say.
But in fact, to unlock the mystery of language development we need to do both, and more. Language is the most complex, flexible communication system that has ever evolved, and we humans are by far and away the best at it. No AI robot or computer even comes close. To solve such a complex puzzle, we need to assemble a jigsaw of converging evidence from a whole host of different approaches - neuroscience, computational modelling, behavioural experiments, and analyses of natural language in conversations (corpus analysis). This is the only way we’ll have a chance of discovering how humans, alone amongst all species, learn language.
In a recently published paper in the journal Language Learning, Padraic Monaghan and I illustrate the distinct advantages of combining different approaches to language development. We argue that previous attempts to understand language learning have been limited by two things, our failure to appreciate the richness of the multi-sensory input that children receive, and our lack of understanding of how that input is processed by the learning mechanisms in the brain. Multi-methodological approaches are a potential solution. For example, corpus analysis allows us to explore the input in more detail, and computational modelling and experimental work allow us to explore how the brain might learn from the input. By combining these approaches, we come much closer to an understanding of how children might learn language.
Take the thorny issue of how children learn grammatical categories (e.g. which words are nouns and which are verbs). This is a crucial part of the learning problem because how words behave in sentences depends on their grammatical category. Thus, you can eat a variety of things (I eat the cake /apple/universe..., you can even eat your words), but all the things you eat must be nouns. You can’t eat prepositions (I ate the of), adverbs (I ate the slowly) or adjectives (I ate the happy). How do children learn that cake/apple/universe are nouns, but of, slowly and happy are not? And how do they learn it so early? Many children are using nouns, verbs and adjectives correctly in sentences before their second birthday.
In the last few years, we have come much closer to a solution than ever before, and we have done this by combining evidence from three different approaches - corpus analysis, computational modelling, and experimental work.
From corpus analysis - analysing real conversations between people - we have learnt that nouns and verbs in sentences form patterns in some quite transparent, distinct ways. For example, in English, some words occur very frequently in little frames (e.g. the ... is). These frames predict, with a high degree of accuracy, the category of the intervening word (e.g. the intervening word between the and is is likely to be a noun; the cake is..., the apple is..., the universe is...).
From computational modelling we have learnt that a particular kind of learning mechanism – one that analyses the statistical distribution of words in the input – can learn noun and verb categories from these kinds of cues. Models trained on input with the properties of real language can pick up on these patterns, and use them to cluster words together into categories.
And from experimental work, we know that these types of learning mechanisms do exist in the human brain, and are used by children from early infancy to pick up meaning from the input that they hear. So it is very likely that, as children, we use these cues to help us categorise words into noun and verb categories.
Unfortunately, this isn’t the whole solution because, unlike children, the models never get 100% on their categorisation tests. Even a 3-year-old knows that you can’t “eat a happy”, but many models don’t. But we already knew they wouldn’t be perfect at it; corpus analysis tells us that when people talk, their sentences are noisy - replete with false starts and other speech production errors – and models find it hard to cope with this noise. In fact, grammatical categories themselves are noisy; in English, for instance, many nouns can be verbed or can be adapted to be adjectivey. So we know that there is still a lot more work to be done.
But by combining modelling, experimental work and corpus analysis we have established an empirical starting point upon which we can build. We are now extending our models by adding other cues that are present in speech and that we know our brains can use; cues that give us information about the meaning of words, for example, or about the beginnings and ends of sentences and phrases. In this way, we are filling in more and more pieces of the puzzle.
In essence, these new approaches have allowed us to change, radically, the way we study language. They allow us to explore both the input and the brain’s learning mechanism in better, more sophisticated ways. These are exciting times. In the meantime, I’m off to attach electrodes to someone’s head.
Monaghan, P. & Rowland, C. F. (in press). Combining language corpora with experimental and computational approaches for language acquisition research. Language Learning.