Complexity trade-offs in heritage languages: An information-theoretic analysis

By Ashvini Varatharaj (Worcester Polytechnic Institute), Gregory Scontras (University of California, Irvine) & Naomi Nagy (University of Toronto)

Please upgrade to a browser that supports HTML5 video or install Flash.Varatharaj_Complexity_trade_offs_in_heritage_languages-f5-g2n.jpg

Heritage languages are often of interest because of the ways they differ from the relevant baseline. Many conceive of these differences as a process of simplification: a loss of inflectional morphology, less lexical richness etc. Inspired by the finding that decreased complexity in one area of a language may lead to increased complexity in another (e.g. Koplenig et al. 2017) we take up the question of whether the changes during the development of heritage languages involve a general simplification or whether complexity trades off in heritage languages as it does in other languages: as speakers rely less on word-internal structure word order conveys more information and vice versa. Following Juola (1998) we apply information-theoretic compression-based measures of complexity in the domain of word structure (i.e. morphology) and word order (i.e. syntax) to six heritage languages. The data come from the Heritage Language Documentation Corpus (Nagy 2011) which includes multiple generations of eight heritage languages in the Toronto metropolitan area as well as homeland comparators. We calculate the word-structure metric by replacing each unique word type with a random number to destroy any morphological relationship at the word level. The original raw text and the modified text with random numbers are then compressed using the gzip compression algorithm. The ratio of the compressed sizes of these two files serves as an index of complexity. We calculate a similar metric for word order complexity but with shuffled word order instead of replacing the words with numbers. Preliminary results from our analysis suggest a mixed picture of complexity shifts in heritage languages. Some languages (Ukrainian Russian) trade off complexity in the expected way as evidenced by a negative correlation between word order and word structure complexity over generations of heritage language speakers. Other languages (Cantonese Faetar Italian Korean) exhibit a trend in the opposite direction such that the two measures of complexity are positively correlated. Looking at the full results we notice that heritage languages that are substantially more complex in the baseline than English (i.e. the dominant language in the context of the corpus) exhibit the expected complexity tradeoff whereas heritage languages that are similarly or less complex in the baseline than English do not. We confirmed this interpretation of the results by comparing relative complexity of the baseline/homeland varieties applying our complexity metrics to the Parallel Bible Corpus (Mayer & Cysouw 2014). From our analysis of word order and word structure we notice that Ukrainian and Russian are indeed more complex in the baseline than English; Korean and Italian are closer to English in complexity while Mandarin (which we use as a proxy for Cantonese) is less complex. Thus we find evidence for the role of specific language dyads in the changing complexity we observe across generations of heritage languages: depending on which languages are in contact the outcomes may vary. Further analysis is required to understand why languages more complex than English show a trade-off while the other languages do not.

View slides here.

published icon

Published: Thursday, April 22, 2021