Language Researchers' Toolkit

The LuCiD toolkit was a set of on-line tools developed in Phase 1 to facilitate the use of CHILDES corpora by developmental researchers.  Before the development of the toolkit, researchers would use CLAN to analyse the corpora.  Since CLAN is not a programming language, the analyses that could be done were limited.  As the use of programming languages like R and python increased, more and more people wanted to have CHILDES corpora in a csv/excel-like format that would be compatible with these languages.  At that time, researchers had to write their own parsers for XML or the raw text CHILDES files, and since this was difficult and time-consuming, it was rarely done.   The toolkit provided childes2csv, which was website that allowed users to select and download sets of corpora in csv format (each row was a word or an utterance).  In part inspired by the childes2csv, the CHILDES developers have implemented a talkbankDB query system which allows you to download CHILDES corpora by word/utterance in excel format.  It is available here.

The toolkit also provided an ngram page which allowed you to download ngrams for particular corpora in CHILDES.  This capability is now available in the talkbankDB by selecting the ngrams tab.

The toolkit also provided a browser page which allowed you to examine statistics for different corpora such as the number of utterances and speakers.  That capability is now provided by the participant tab in the talkbankDB.

At present, the talkbankDB has fewer visualization and analysis tools compared to the toolkit, but the developers are working on adding these tools to their system.

Project Team: Franklin Chang (Lead), Elena Lieven, Julian Pine, Caroline Rowland and Anna Theakston

Start Date: September 2015

Duration: 2.5 years