Distributional learning and the development of word class categories in English, German and Dutch

How do children learn the grammatical categories of their language? For example, how do children learn that dog is a Noun and chase is a Verb? Recent research with computer models of language learning has shown that one way of doing this is to group words together on the basis of the words that come before and after them. For example, in English, words that come after a and the and before is and can tend to be Nouns, whereas words that come after is and can and before a and the tend to be Verbs.

However, at the moment, there are two problems with computer models that group words together in this way. The first is that they work better for some languages (such as English) than they do for others (such as German and Dutch). The second is that they tend to be unrealistic as explanations of human language learning because they do not learn gradually like children.

In this project we developed a more child-like model of category learning that works across three different languages (English, German and Dutch). We achieved this by taking ideas from recent computer models that group words together at a single point in time, and building them into a model called MOSAIC that learns language more gradually. MOSAIC takes as input speech directed at language-learning children in several different languages, and produces as output child-like utterances that get longer as the model learns more about the input. We can therefore test the model by comparing the utterances it produces with those of children learning different languages at different points in development, and so use it to develop a more realistic explanation of the way children learn grammatical categories. This work showed that MOSAIC was not only able to build word classes in English, German and Dutch, but was also able to simulate developmental changes in the noun richness of children’s speech across the three languages.

Selected Outputs

Freudenthal, D., Pine, J. M., Jones, G. & Gobet, F. (2016). Simulating developmental changes in noun richness through performance-limited distributional analysis. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell, (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 602-607). Austin, TX: Cognitive Science Society.

Freudenthal, D., Pine, J. M., Jones, G. & Gobet, F. (2016). Developmentally plausible learning of word categories from distributional statistics. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell, (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 674-679). Austin, TX: Cognitive Science Society.

Freudenthal, D., Pine, J. M., & Gobet, F. (2018). A computational model of the acquisition of German case. In T. T. Rogers, M. Rau, X. Zhu, & C. W. Kalish (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (pp. 1687-1692). Austin, TX: Cognitive Science Society.

Freudenthal, D., Pine, J. M., & Gobet, F. (2019). Learning cross-linguistic word classes through developmental distributional analysis. In A. Goel, C. Seifert & C. Freska (Eds.), Proceedings of the 41st Annual Conference of the Cognitive Science Society (pp.1773-1779). Montreal, QB: Cognitive Science Society.  

Data

Freudenthal, Daniel and Pine, Julian and Jones, Gary and Gobet, Fernand (2021). International Centre for Language and Communicative Development: Defaulting Effects Contribute to the Simulation of Cross-linguistic Differences in Optional Infinitive Errors, 2014-2020. [Data Collection]. Colchester, Essex: UK Data Service. 10.5255/UKDA-SN-853921

Freudenthal, Daniel and Pine, Julian and Gobet, Fernand (2021). International Centre for Language and Communicative Development: A Computational Model of the Acquisition of German Case, 2014-2020. [Data Collection]. Colchester, Essex: UK Data Service. 10.5255/UKDA-SN-853922

Freudenthal, Daniel and Pine, Julian and Jones, Gary and Gobet, Fernand (2021). International Centre for Language and Communicative Development: Using a Developmentally Realistic Model of Word Class Acquisition to Simulate Developmental Changes in the Noun-richness of Children's Early Language Across English, Dutch and German, 2014-2020. [Data Collection]. Colchester, Essex: UK Data Service. 10.5255/UKDA-SN-853923

Project Team: Julian Pine (Lead), Daniel Freudenthal, Fernand Gobet, Elena Lieven and Padraic Monaghan

Start Date: July 2015

Duration: 4 years

(Work Package 12)