Go back to abstract list
Developing corpora of spoken heritage languages
by Keith Plaster
Although corpus-based studies are becoming increasingly used in HL studies, corpora of spoken HLs have only recently begun to be developed. These corpora are important because of the insight they may provide into the on-line production of speakers; many important HL features may not be evident from grammaticality judgment tests or text-based corpora, including HL speakers’ on-line sentence planning strategies, variation in their production, evidence of the extent of transfer from the dominant language, and methods of coping with grammatical and lexical uncertainties and gaps. However, the development of spoken language corpora also presents a variety of considerations that may not be involved with text-based corpora.
In this talk, we will present an overview of the development of a spoken HL corpus, with emphasis on the issues and questions that should be considered when creating such a corpus, along with recommendations based on The Polinsky Language Sciences Lab’s experience developing corpora of several spoken heritage languages (Chinese, English, Korean, Japanese, Russian, and Spanish) over the past year. Topics to be discussed include the elicitation of spoken data, choice of prompts, transcription of recordings, annotation of transcribed data, use of spoken corpus data in research projects, and the development of cross-linguistic correspondences based on the corpora.