We (that’s Tomoko Tatsumi, me and Julian Pine) have just published a new paper, and I think everyone who does child language research should read it (or, at least, this summary of it). No surprise there, perhaps; we’d all like to think (or at least hope) that our colleagues will read our papers. But this one is different. The reason that I think you should read this paper is that it has methodological implications for pretty much any study that looks at input effects on child language acquisition. (And since, even if you’re not looking at input effects per se, you almost certainly need to control for them, that means pretty much any child language study). In particular, it is relevant for all studies that look at the effect of two competing forms (e.g., play vs plays; foot vs feet; one construction vs another) on what children say. (The paper has important theoretical implications too; showing probably for the first time that input frequency effects hold even after controlling for morphological complexity. But it’s the methodological implications that have the broader relevance here).
In fact, the methodological implications are so broad that we’ll start somewhere else entirely, a trial of an imaginary drug for an imaginary disease. Of the 100 people who’ve been given this new drug, 70 have died and 30 have survived. Ban this killer drug, right? Not so fast. Of the 100 people in the control group, who had the disease and were not given the drug, 90 died and just 10 survived. The drug looks like a killer, but only if you ignore the base rate of deaths without it. It’s actually a life-saver. And of course, if these conclusions were based on a study with 1000 people in each condition (700 deaths vs 900 deaths), they’d be even stronger. If they were based on a study with just 10 people in each condition (7 deaths vs 9 deaths), they’d be much weaker. These intuitions are captured mathematically by a chi-square (or Fisher’s exact) test that calculates a statistic based on the number of deaths versus survivals in the drug group (70 vs 30) and the control group (90 vs 10).
This is all Stats101, and its implications are well understood by corpus linguists (e.g., Stefanowitch & Gries, 2003). But as far as I know – and correct me if I’m wrong here – I haven’t seen a chi-square statistic used in this way in the child-language literature (though, in fairness, Mike Ramscar and colleagues have repeatedly emphasized the usefulness of a related contingency measure calculated from the Rescorla-Wagner learning rule, and I should have cottoned on to the implications of this – and Stefanowitch & Gries’s – work much earlier).
Which brings us back to our paper. We were trying to explain children’s verb-by-verb production rates of simple versus stative Japanese verb forms (roughly pulled vs was pulling) in an experiment, on the basis of the relative frequency of these forms in the language that children hear. (We also did a second analogous study with simple vs completive past forms; but I’ll leave that out to keep things simple). In other words, the more children hear the stative (was pulling) versus simple form (pulled) of a particular verb, the more they should produce the stative versus simple form of that verb in our study (it’s not rocket science!). But, once you’ve found a suitable input corpus and picked out the relevant verbs, just how do you measure “the relative frequency of these forms in the language that they hear”.
The most straightforward option is to just use a simple proportion; e.g., 90% simple for pulled (vs was pulling), 70% simple for cried (vs was crying). But this won’t do, as it treats 90% of 10 corpus occurrences of that verb (9/10) as the same as 90% of 1000 occurrences (900/1000). This is like treating a drug trial with 10 people per group as just as informative as a drug trial with 1,000 people in each group. (And this isn’t just an “in principle” problem; quite a few verbs in our study had fewer than 10 occurrences in the corpus)
The next option is to use a measure of absolute bias, in the form of a binomial test. This gives you a p value which reflects the extent to which a particular bias (e.g., 90 vs 10) is different to chance (e.g., 50/50). Because “chance” is calibrated relative to the sample size (5/5 for 10 occurrences of the verb; 500/500 for 1000 occurrences) it avoids the pitfalls of the simple-proportion method. The p value will be a lot smaller for 900 simple past vs 100 stative past than for 9 simple past vs 1 stative past.
But this still won’t do. Why not? Think back to our imaginary drug trial. It neglects the base rate. In everyday spoken Japanese, simple past forms (VERB-ed) outnumber stative past forms (was VERBing) by around 9:1 (i.e., 90% are simple forms). This means that if you get a verb that occurs in simple past forms say 70% of the time (like nak, ‘cry’), it looks – on both the raw-proportion and binomial-test measures – like it’s biased towards simple past forms. But in the context of the language as a whole – i.e., how often Japanese speakers feel the need to mark stativeness, which is only around 10% of the time – it is actually biased in favour of stative forms: Most verbs have only 10% stative uses. This verb has 30% stative uses. What is more, in this context, an – on the face of it – “equi-biased” verb like asob (play) – which is split roughly 50/50 between stative (was playing) and simple (played) past forms – is actually showing a whopping bias towards the stative.
Now, to be completely honest, we hadn’t figured all of this out before running the studies. So we used a binomial test to choose (so we thought) simple- and stative- biased verbs, and then just the simple proportion as a measure of the extent of this bias. Consequently, our results were difficult to interpret. We had “simple-biased” verbs that were produced quite a lot in stative form, and we couldn’t work out why. But when we used a chi-square statistic as our measure of verb bias, it all made sense. In the context of the language as a whole, these “simple-biased” verbs were actually stative-biased, which is why kids kept producing them in stative form.
The take-home message is this: Whenever you are trying to explain children’s (or, for that matter) adult’s linguistic behaviour in terms of the relative frequency of competing forms in the input, don’t make the mistake we did. Don’t forget the base rate. It’s hip to be (chi) square.