The ACQDIV Corpus
The ACQDIV Corpus is a database that brings ten corpora together in a formally and semantically standardized format:
- Allen Inuktitut Corpus
- Chintang Language Corpus
- Corpus of the Chisasibi Language Acquisition Study (Cree)
- Demuth Sesotho Corpus
- Koç University Longitudinal Language Development Database
- MiiPro Japanese Corpus
- Miyata Japanese Corpus
- MPI-EVA Jakarta Child Language Database
- Sarvasy Nungon Corpus
- Pfeiler Yucatec Child Language Corpus
- Stoll Russian Corpus
The corpus thus contains data from nine of the ten languages in the ACQDIV sample. To learn more about the corpus’ linguistic design, its structure, and its technical realization, read the corpus manual (linked below).
Access to the corpus may be granted to researchers upon request to PI Sabine Stoll. The corpus is not made publicly available because it contains sensitive data from unpublished subcorpora. In the current initial research phase, access is restricted to the ACQDIV core team and official collaborators of the project. We plan to publish the subcorpus that is based on CHILDES data under the Creative Commons license CC BY-NC-SA 3.0, as stipulated by the TalkBank Ground Rules.