Probabilistic Use of High Frequency Words Helps Language Acquisition.

Frost, R. L. A., Monaghan, P., & Christiansen, M. H. (2017). Probabilistic Use of High Frequency Words Helps Language Acquisition. Poster presented at the 23rd Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP), Lancaster, UK.


High-frequency words have been found to benefit speech segmentation (Bortfeld, Morgan, Golinkoff, & Rathbun, 2005) and grammatical categorisation (Monaghan, Christiansen, & Chater, 2007), with recent research suggesting learners may be able to draw on these cues for both types of task at the same time (Frost, Monaghan & Christiansen, 2016). For instance, in English “the” occurs frequently, punctuating sequences of words in speech, and “the” also reliably precedes nouns – thereby providing grammatical category information, while also possibly assisting segmentation. Previous studies have tested the effect of high-frequency words on language acquisition by presenting them reliably within the experimental language, however natural language contains noise and variability that may provide further opportunities for robust learning (Monaghan, 2017). 

We tested the effect of variability on learning by familiarising adults with continuous speech comprising repetitions of target words, which were preceded by one of two high-frequency marker words 100%, 67%, or 33% of the time, with marker words distinguishing targets into two otherwise unidentifiable categories. Participants completed a 2AFC speech segmentation task, and a similarity judgement categorisation test, followed by a cross-situational word learning task where target words from the training speech were mapped onto actions and objects depicting two different grammatical categories (nouns and verbs), presented across multiple trials. Critically, labelling was either consistent or inconsistent with the distributional categories (between subjects), to examine whether learners drew on the statistics of the input during subsequent language use. Participants also completed a vocabulary test, which assessed which mappings they had learned.

There was a clear advantage of variability, with the 67% group performing best on measures of segmentation (100%: M = .62, SE = .03; 67%: M = .71, SE = .03; 33%: M = .67, SE = .02) and categorisation - giving significantly higher similarity ratings to test pairs containing items from the same (M = 3.80, SE = .17) versus different (M = 3.62, SE = .16) grammatical categories (t (23) = 2.194, p = .039). Data from the vocabulary test indicated that all conditions were better able to map target words onto nouns (overall M = .693, SE = .03) than verbs (overall M = .59, SE = .03, t (71) = 3.146, p = .002), but trends in the means indicated learning of verbs was better when target words labelled nouns and verbs in a way that was consistent (M =.63, SE = .035) rather than inconsistent (M = .55, SE = .038) with the distributional category distinction (though this difference was not significant, t (70), 1.461 p = .149). The data indicate that variability can help learners draw on the same high-frequency words during speech segmentation and grammatical categorisation. Further, findings suggest high frequency marker words denoting distributional categories may be especially helpful for learning of verbs.