osla variation across individuals and domains in norwegian heritage language

This paper investigates spontaneous production from 50 speakers of Norwegian heritage language in the Corpus of American Nordic Speech and studies the interplay between four linguistic properties: possessives and double definiteness, verb second word order, grammatical gender, and the amount of language mixing. It is shown that speakers cluster in the sense that some speakers produce more Norwegian-like structures across properties, whereas others produce more English-like structures across the same properties. Implications for the study of heritage grammars are also addressed.

Some produce more Norwegian-like structures, whereas others produce more English-like structures. Furthermore, we discuss what these patterns tell us about the study of the development of heritage speaker grammars. Throughout the article, we restrict our investigation to the 50 heritage speakers that were part of the study in Anderssen et al. (2018), who were the only ones available in the corpus at the time.
[2] background study 1: possessives and complex noun phrases Anderssen et al. (2018) focuses on DP structure in heritage Norwegian. Norwegian has both pre-and postnominal possessives, the former co-occurring with a bare noun and the latter with a definite noun, as illustrated in (1). (1) Min venn / Venn-en min My friend Friend-the my 'My friend .' Although the two structures are associated with partly different semantic and pragmatic entailments, there are many contexts where either structure can be used (see discussion in Anderssen et al. 2018, section 2.3). Anderssen & Westergaard (2010) investigated spontaneous production of homeland Norwegian (HLN) and revealed that speakers predominantly use the post-nominal possessive (around 75%), but that all speakers also use the prenominal possessive (on average 25%, range 7-35%). As the prenominal possessive is structurally identical to the English possessive, cross-linguistic influence could be expected in the heritage speakers' production, i.e., a higher proportion of prenominal structures. 2 However, Anderssen et al. (2018) found that the proportion of postnominal possessives was in fact slightly higher in the heritage speakers compared to HLN: 82.9%. Thus, as a group the heritage speakers produce fewer English-like structures than the homeland Norwegians. A closer look at the individual differences within the heritage speaker group showed that more than half of the participants (27/50) produced no prenominal possessives at all and that 32 of the 50 speakers produced below 7% prenominal possessives, i.e., below the range observed in HLN. The remaining 18 speakers, on the other hand, produced a very high proportion of prenominal possessives (45%). Thus, what looked like a relatively normal production of possessives at the group level concealed considerable variation within the group: One large group overusing the typical Norwegian structure (postnominal possessives) and a smaller group overusing the English structure (prenominal possessives). [2] This was found in English-Norwegian bilingual children ( A potential problem with the two-group proposal is that the pattern may be due to few observations per speaker: If a speaker produces only a handful of possessives, it is not unlikely that all are postnominal. Here, we look closer at the relationship between the number of possessives per speaker and the proportion of postnominal possessives in order to address the following question: Is it the case that heritage speakers approach the HLN proportions once they produce enough possessive structures? In Figure 1, we show the relation between number of possessives (x-axis) and the proportion of prenominal possessives (y-axis) per speaker. In the graph, we mark the prenominal possessive range of the HLN speakers in shaded grey. As we see, few of the heritage speakers fall within this range. Importantly, producing many possessives does not increase the likelihood of falling within the HLN range. In our comparison of the choice of possessive and other linguistic variables below, we only include speakers who produce at least seven possessives, marked in Figure 1 with a vertical black line. 3 Only five of the speakers who produce seven or more possessives have a production that lies within the native range. [3] The cut-off point in Anderssen et al. (2018) was nine. We set the cut-off point lower here as to not exclude too many speakers when we compare possessive structures with V2 production and language mixing.

VARIATION ACROSS INDIVIDUALS AND DOMAINS
[255] OSLa volume 11(2), 2020 Thus, we conclude that the low number of attestations per speaker does not explain the two-group pattern reported. Rather, the speakers genuinely cluster into two sub-groups: A large group that almost exclusively produces postnominal possessives, and a smaller group that uses a much higher rate of prenominal possessives than in HLN. Anderssen et al. (2018) suggest that the behaviour observed in both groups is due to influence from English. The preference for the prenominal possessive is the result of crosslinguistic influence (CLI) due to structural overlap between the two languages (Hulk & Müller 2000), while the high use of the postnominal possessive reflects a preference for the structure that saliently distinguishes Norwegian from English, a phenomenon referred to as crosslinguistic overcorrection (CLO) in Kupisch (2014). Note that the phenomenon of CLO is not the same as defaulting to an unmarked or more frequent structure; it is defaulting to the structures that are unique to the individual languages. Thus, at the group level, the production of the heritage speakers looks fairly similar to the Norwegian baseline group. However, at an individual level, the heritage speakers fall above or below the native range, and a relatively clear pattern emerges where they either prefer the English-like or the Norwegian-like structure. These groups are referred to as the English and Norwegian groups respectively. Anderssen et al. (2018) also investigated modified definites in the same population. Modified definites in Norwegian require two definiteness markers, a suffixal article, which also marks unmodified structures as definite, and a prenominal determiner, which only occurs in modified structures, shown in (2). The phenomenon is known as double or compositional definiteness. Some modifiers, such as andre 'other, second', venstre 'left' and øvre 'upper' may optionally occur without the prenominal determiner (Dahl 2015; van Baal 2020); see example (3).
(2) (Den store) bil-en the big car-the 'The (big) car.' (3) (Den) andre sko-en the other shoe-the 'The other shoe.' Given the complex nature of these modified structures, they were expected to be vulnerable in the heritage population. Thus, in addition to target-like production (DModNdef), the heritage speakers were predicted to produce non-target-like structures with either the suffixal article (*DModN) or the determiner [256] lundquist et al.
OSLa volume 11(2), 2020 (*ModNdef) missing. In addition, the determiner may be legitimately dropped with certain modifiers (ModNdef). Omission of the suffixal article results in a structure which is similar to English (e.g. den store bil_ 'the big car'), while omission of the determiner yields a structure that is 'typically Norwegian' (_ store bilen 'big car-the'). This leads Anderssen et al. (2018) to ask whether the speakers with a high proportion of prenominal possessives (N=7) also produce English-like modified DPs (*DModN), while the speakers who overused the postnominal possessive (N=21) would show a preference for the typically Norwegian structure without the determiner (*ModNdef). The result, provided in Figure 2, indeed reveals that the Norwegian group prefers Norwegian-like modified definites (omission of the determiner), while the English group is more likely to produce the English-like structure (dropping the suffix). 4 figure 2: Distribution of modified definites by the English and Norwegian groups.
The Anderssen et al. (2018) study thus clearly illustrates that it is important to look for patterns within the group of heritage speakers. The study shows that these speakers should be divided into (at least) two sub-groups, none of which [4] In an elicited production study, van Baal (2020) also finds that all the speakers omit the determiner, while only a subset of them drop the suffixal article. She explains the former with reference to incomplete acquisition due to the low frequency of these structures in the input, while the latter is argued to be due to an impoverishment rule. These explanations are not incompatible with Anderssen et al. (2018).

VARIATION ACROSS INDIVIDUALS AND DOMAINS
[257] OSLa volume 11 (2), 2020 showed a baseline-like behavior: One group shows cross-linguistic influence from English (CLI-ers), and the other one is affected by what Kupisch (2014) labels crosslinguistic overcorrection (CLO-ers), where forms that are unique to the heritage language are overused compared to the baseline, potentially to clearly mark the separation between the two languages. In sections 4 and 5, we show that these two groups also differ in other aspects of their syntactic preferences.
[3] background study 2: v2 and subject -initial declaratives Westergaard, Lohndal & Lundquist (2020) investigates V2 violations in the same group of heritage speakers (N=50). The study builds on a careful analysis of 10,609 declaratives from the CANS corpus. Overall, there are only 230 instances of V2 violations (2.2%). Similar results have previously been found in comparable groups (Håkansson 1995, Schmid 2002, Kühl 2018. Although 2.2% V2 errors may seem negligible, we find individual speakers that show severe problems with V2, as well as general group-level production patterns that hide potential V2 errors. As discussed in Westergaard et al. (2020), the low number of V2 violations may in fact be the result of a lack of contexts for V2, i.e., SVO word order. When the subject is in initial position, it is not possible to make a V2 error, unless the sentence contains a sentence adverbial, as in (4).
(4) *Johan muligens kjøpte ei ny bok Johan possibly bought a new book 'Johan may have bought a new book.' If we only consider contexts where a V2 violation could occur, i.e., non-subjectinitial declaratives and subject-initial declaratives with a sentence adverbial, the proportion of V2 violations is 6.5% (230/3534).
In Germanic V2 languages, roughly 30-40% of all declaratives are nonsubject-initial (Lightfoot 1999, Bohnacker & Rosén 2008, Westergaard 2009), while the proportions for English is much lower (less than 10%), due to a preference for subjects to appear sentence-initially (Yang 2001, p. 242). Thus, there is both a syntactic and a pragmatic difference between the two languages: Norwegian has V2 and a high number of non-subject-initial declaratives, whereas English has SVO and strongly prefers subjects in sentence-initial position, i.e., fronting of adverbs and objects is relatively rare in English. In order to investigate the V2 grammars of Norwegian heritage speakers, it is thus not enough to look at the number of V2 violations; we also need to consider the distribution of subject-and non-subject-initial declaratives. Westergaard et al. (2020) find that the overall proportion of non-subject-initial declaratives is [258] lundquist et al.
OSLa volume 11(2), 2020 around 17%, which puts the Norwegian heritage speakers somewhere between the English and the Norwegian baselines. Interestingly, they also find a correlation between the proportion of non-subject-initial declaratives and V2 violations: Speakers who produce below 17% non-subject-initial declaratives make on average 10.3% V2 violations (in potential non-V2 contexts), while speakers who produce more than 17% non-subject-initial declaratives only make 3.2% V2 errors. This suggests that some speakers have a fairly normal Norwegian syntax, with target-like fronting patterns and few V2 violations, while others produce more English-like structures, few instances of non-subject-initial declaratives and a higher proportion of V2 violations.
In the next section, we explore whether there is a correlation between word order at the clausal level and internal noun phrase syntax, specifically whether target-like production of V2 and non-subject-initial declaratives is more likely in speakers with a general preference for typically Norwegian possessives.
[4] correlations between v 2 violations and np structural choices Håkansson (1995) points out that V2 is relatively stable compared to other morphosyntactic properties such as agreement (see also Polinsky 2018 for an extensive review of syntactic and morphological patterns in heritage language). As discussed above, previous results from studies on Norwegian heritage language suggest that the most proficient speakers overuse typical Norwegianlike traits, such as postnominal possessives and postnominal definite marking. However, this generalization could also be interpreted as a tendency to overuse the more frequent structures, as in both cases, the typically Norwegian structure is also the more frequent one. As Anderssen et al. (2018) point out, this makes it difficult to distinguish between the effect of frequency and CLO. One should therefore try to consider properties where low frequency overlaps with structural difference in the heritage language (Anderssen et al. 2018: 760), and one candidate is V2. The more frequent word order in Norwegian is SVO, but overusing it would be indicative of CLI rather than CLO. In other words, the use of V2 in nonsubject initial declaratives represents a good test case to determine whether the behavior assumed to be the result of CLO in Anderssen et al. (2018) is simply overuse of the more frequent patterns (perhaps due to a lack of fine pragmatic distinctions), or whether it is in fact appropriate to distinguish between a Norwegian group influenced by CLO and an English group affected by CLI. Now, we want to investigate possible correlations between the speakers' V2 production and their structural choices in the production of DPs. In figure 3, we plot the relation between prenominal possessives and non-subject-initial

[259]
OSLa volume 11(2), 2020 declaratives. We include information about the participants' V2 violations (color and shape), and mark the median split for non-subject-initial declaratives (17%). Only speakers who produce more than 80 declaratives are included, and the set is further restricted to speakers who produce at least seven possessives (as above). The graph shows that speakers who produce the most non-subject-initial declaratives tend to stick to the Norwegian-like postnominal possessor. These speakers also make few V2 violations. There is a positive correlation between prenominal possessors and V2 violations (beta = 0.015, st.err. 0.006, p < 0.05) in addition to a negative correlation between V2 violations and non-subject fronting. These results strongly suggest that the speakers who use mainly postnominal possessors (the CLO-ers) have a more intact Norwegian grammar at the clause-level as well. They are not simply defaulting to the more frequent Norwegian structures (N-Poss, SVO), but rather show a preference for using structures that are 'typically Norwegian'. [5] correlating v2 and np syntax with other linguistic properties So far we have investigated two relatively frequent syntactic phenomena, V2 and noun phrase syntax, and found a correlation between them. Based on this correlation, we suggested that there are two different groups of heritage speakers. However, the results do not necessarily suggest that one group is "better" than the other, as they say nothing about lexical or morphological proficiency. Furthermore, we have not seen anything suggesting that the two groups differ in their overall syntactic proficiency. In this section we consider two non-syntactic properties for possible correlations with the syntactic properties: Grammatical gender and language mixing.

[5.1] Syntax and gender
Anderssen et al. (2018) found a high proportion of non-target-consistent production with respect to definiteness (34%, only counted on modified definite noun phrases) and Lohndal & Westergaard (2016) found 22.7% non-target-consistent production of gender marking. However, we find no correlation between these properties and V2 violations: The proportion of definiteness and gender errors is equally high for speakers with few or no V2 violations as for the rest of the speakers.
A weak correlation between definiteness errors and possessive production was found in Anderssen et al. (2018): The English group, i.e., speakers who produced a high number of prenominal possessives, produced slightly more definiteness errors than the Norwegian group. No significant correlation between gender errors and possessives or definiteness was found in that study either, suggesting that lexical knowledge (here, measured in terms of lexical gender knowledge) is independent of syntactic proficiency (see Heegård et al. 2019 for a similar independence of gender proficiency and phonological proficiency in heritage Danish).
Regarding grammatical gender, we note a weak correlation between V2 violations and the proportion of masculine gender articles (beta = 0.005, st.err = 0.002, p = 0.02, r 2 = 0.18). That is, the speakers with many V2 violations produce mainly nouns with masculine gender and mainly make errors with neuter nouns (i.e., overusing the masculine article). Indirectly, this suggests that the speakers in the English CLI group stick to simple syntactic structures and a smaller Norwegian lexicon, although they do not necessarily make more errors than the CLO group. Furthermore, gender errors may be avoided by using a small set of highfrequency nouns and possibly by switching to English in more lexically demanding contexts. In the next section, we consider this in detail.

[5.2] Language mixing
We have seen that some of the heritage speakers have a more English-like syntax than others. We now ask if this same group of speakers also have a higher proportion of English words in their speech, i.e., more instances of language mixing. As suggested in the previous section, less proficient speakers may stick to simpler syntactic structures (e.g. SVO). They may also be more likely to switch to English in lexically more challenging contexts. 5 We now investigate possible correlations between syntactic simplification and the size of an active Norwegian vocabulary. A careful investigation of this topic is clearly beyond the scope of this article, so for now, we only make use of language mixing statistics that are easily obtained from the CANS corpus. We searched for all items tagged as English segments (or tagged "X" in the corpus), but as we were mainly interested in vocabulary size, we included only items from the open word classes noun, adjective and verb. We carried out searches for English lexical items for each speaker in the sub-corpus used in the studies reported on here (N=50) and obtained both token and type (lemma) frequencies, which we later compared to type and token frequencies for Norwegian lexical items. 6 For the token count, the proportion of English lexical words per speaker ranges from 2.5% to 70%. For the lemma count, the range goes from 6.9% to 71%. Note that we see more variation here than for the syntactic variables we have looked at so far. In what follows, we have chosen to focus on the lemma/type frequencies rather than token frequencies, since the former better capture the size of the active lexicon than the latter.
We now correlate the proportion of English lexical lemmas with the morphosyntactic variables we considered above. We find that the amount of language mixing correlates neither with noun phrase syntax nor with gender marking. However, we find a direct correlation between the presence of English words and non-subject-initial declaratives (t = -2.57, p = 0.014): The speakers who produce a high proportion of subject-initial declaratives (i.e., SVO) also have a high proportion of English lexical items in their production. As we have already discussed, there is a correlation between fronting and V2 violations. However, we do not see a significant correlation between V2 violations and English mixing. This is not necessarily surprising, given that a high proportion of SVO may mask [5] Kühl & Heegård Petersen (2018) have shown that in Danish heritage language there is a correlation between V2 violations and the presence of an English word clause-initially. [6] As in most corpora, there are instances of incorrectly tagged items. We have not done a careful quality control of the English item tagging here, but rather assumed that possible tagging errors affect the statistics of all participants equally. [262] lundquist et al.
OSLa volume 11(2), 2020 a disappearing or weakening V2 system. In Figure 4, we illustrate the relationship between the proportion of language mixing, fronting and V2 violations (color and shape).
figure 4: V2 violations in relation to the proportion of English mixing and non-subject-initial clauses.
In short, we see a tendency that speakers who use a high proportion of English nouns, adjectives and verbs also use more English-like sentence structures.
[6] discussion Summarizing sections 2-5, we see that the 50 speakers cluster in essentially two groups: A group that produces more Norwegian-like structures across the linguistic variables investigated and a group that produces more English-like structures for the same variables. Interestingly, this does not hold for grammatical gender, yet there is a slight tendency for speakers who have a high proportion of non-V2 to default to masculine gender. Overall, we interpret our findings as suggesting that some speakers are in general prone to CLO, whereas others are prone to CLI. Speakers are not simply defaulting to the most frequent structure across the board; rather, they make fine-grained distinctions in their grammars that can be tracked across multiple linguistic properties. An obvious question is why we see the observed clustering effects. More systematic data about the linguistic and societal background of the 50 speakers would be helpful to probe this question. CANS contains sparse background data, which is self-reported and not consistently collected across all speakers. We have looked for patterns in the metadata, in particular concerning age of onset (for English) and use of Norwegian across the lifespan, but we have not been able to find clear systematic patterns. A category such as 'English from preschool' is often used across speakers, but that is too coarse-grained a category to provide reliable information about the input situation the speakers had as children. Furthermore, the problem is exacerbated by the fact that many speakers often reply inconsistently to questions regarding their own language use, making it hard to get a clear picture of what the facts are. That said, it is quite likely that factors such as age of onset, language exposure/input and actual usage are very important in understanding the nature of the grammatical representations for each individual speaker. It is also possible that a study including all the 227 speakers in the corpus could reveal correlations between background factors and linguistic profiles.
A more general lesson from this paper is that it is important to study the variation across heritage speakers and not just group-level effects; focusing on proportions and means at a group level may even provide misleading results, as they may hide considerable and systematic variation. We have uncovered systematic patterns across individual speaker data and across grammatical properties. That does not mean that these patterns are linked in the mental grammar; after all, we are dealing with properties that correlate, not properties that are necessarily causally linked. Nevertheless, our results clearly suggest that CLI and CLO are phenomena that hold across linguistic properties: If a speaker produces Norwegian-like structures in one domain, this speaker is also likely to produce Norwegian-like structures in another domain. And conversely: A speaker who is likely to produce English-like structures in one domain, is also likely to do so in another domain. These are trends, not exceptionless generalizations, but they are nevertheless interesting because they inform our understanding of multilingual speakers and the way multiple grammars interact in the mind.
[7] conclusion In this paper, we have investigated 50 speakers of Norwegian heritage language in the annotated Corpus of American Nordic Speech (CANS). We have studied the interplay between several linguistic properties: possessives and double definiteness, verb second word order, grammatical gender, and the amount of language mixing. The 50 speakers cluster into two groups: Some speakers produce more [264] lundquist et al.

OSLa volume 11(2), 2020
Norwegian-like structures across properties, whereas others produce more English-like structures across the same properties. We have argued that the latter group is affected by cross-linguistic influence (CLI) from English, while the production of the former group is the result of cross-linguistic overcorrection (CLO). Finally, we have discussed implications for the study of heritage speakers based on spoken corpora, by highlighting the importance of investigating the rich variation that often occurs among such speakers. Such investigations would not have been possible without CANS and the structured data collection initiated by Janne Bondi Johannessen.

acknowledgme nts
We are happy and grateful to get this opportunity to honor Janne and her memory. She played a vital role in the linguistics community in Norway and beyond and will be remembered for her many contributions, including her pioneering efforts to establish a rich source of spoken American Norwegian data. Without her initiative and great efforts, this article, as well as the numerous other publications on Norwegian heritage language, would not have been possible. Janne's untimely death is a great loss to the Norwegian linguistics community. It is unbelievable that her enthusiasm, wisdom, entrepreneurship and wonderful laughter will no longer enrich our lives.