new methodologies in the nordic syntax database : word order variation in norwegian wh-questions

Across Norwegian dialects, wh -questions show variation concerning word order possibilities, with many dialects allowing non-V2 word order. The acceptance of this order differs across dialects and depends on the complexity and function of the wh -element. This study examines data from 409 informants across 105 sites in the Nordic Syntax Database (NSD). Throughout the study, new methodologies are used in an attempt to overcome some of the limitations of the NSD-map building tool as well as present new insights from a more detailed assessment of the acceptability judgements. Analysis of the frequency of these acceptability judgements on four test items showed that four grammars could be distinguished: those that allow either only V2 word order; non-V2 word order across all wh -questions; non-V2 in all but long non-subject wh -questions; or non-V2 only with short wh s. An apparent-time study of the data supports a diachronic connection between some but not all of the varieties.

The acceptance of this non-V2 word order is subject to considerable variation at the more detailed level and has received quite a bit of attention in Norwegian dialectology.The influence of the information status of the subject (Westergaard 2003), the choice of verb and form of the subject (Westergaard & Vangsnes 2005) and the form of the wh-element (Åfarli 1986;Westergaard & Vangsnes 2005), but also the possibility of the insertion of the complementizer som 'that' under embedded subject extraction (Westergaard et al. 2012) have been claimed to influence word order possibilities and word order choice in wh-questions across dialects.The geographical distribution of non-V2 wh-questions across Norwegian dialects has been described thoroughly on the basis of data from the Nordic Dialect Corpus (Vangsnes & Westergaard 2014) as well with maps from the Nordic Syntax Database (e.g.Westergaard et al. 2017).Lie (1992), Vangsnes (2005), Westergaard (2009), Westergaard et al. (2012Westergaard et al. ( , 2017) and others have all proposed accounts for the historical development of non-V2 word order.Lie (1992) puts forth that non-V2 developed from cleft sentences such as Hå e de du si? 'what is it you are saying?'.The non-V2 order arises when the expletive pronominal subject de 'it' in the matrix cleft sentence is deleted.This deletion subsequently leads to non-V2 order when the construction is phonologically reduced through haplology to Hå du si? lit.'what you say? ' (1992:72).Using data from the Nordic Syntax Database (Lindstad et al. 2009), Westergaard et al. (2017) recently argued for a different and detailed diachronic development of the spread of non-V2 wh-questions.They discuss five stages in the diachronic development from V2 to non-V2 starting in simplex subject questions and gradually spreading to non-subject questions and questions with more complex wh-elements.The complementizer som 'that' plays a central role in the account by Westergaard et al.; non-V2 is realised in subject questions when som is inserted in the second position instead of the verb.This analysis will be discussed in more detail in Section 3.2.
Four items in the Nordic Syntax Database (Lindstad et al. 2009) exemplify the types of wh-interrogatives that allow non-V2 word order in Norwegian: simplex and complex wh-questions with either subject or non-subject wh-elements (Table 1).The notions 'simplex' and 'complex' will be used interchangebly with 'short' and 'long' as there is often a direct correspondence between complexity and length for the inventory of wh-items.Dialectal differences plays a role here and some examples will be discussed in later sections.Many of the studies mentioned above have used these test items and the corresponding maps/results from the NSD.A significant drawback of the Nordic Syntax Database, which forms the basis of the Westergaard et al. (2017) proposal and many of the other studies mentioned above, is that individual speakers' results cannot be taken into account in the map view, only in the list view.That is, on the maps drawn up in the NSD, judgements from several speakers are converged to a single score per location dismissing individual variation.The internal hierarchical structure of the database, which includes speakers from different age groups and genders, can thus not be taken into consideration.The map-building feature of the database furthermore does not allow one to make maps for various combinations of judgements; only providing options to show either high, medium or low scores for each location but not a combination of several differing scores.This way, only the geographic distribution of single linguistic features can be studied.The variation within different sites, as well as the role of sociolinguistic factors that may influence word order possibilities, such as age and gender, are understudied.
In this article I take into account the full range of data from 409 speakers from 105 locations across Norway in an attempt to overcome the aforementioned limitations of the map building feature in the database.This aggregate perspective encompasses as much of the variation as possible.The methods used and the results from the NSD are presented in section 2 below.Other data, e.g. from the Nordic Dialect Corpus (Johannessen et al. 2009), and theoretical issues are discussed in section 3. [4] The method used in this study is based on the assumption that ongoing language change causes synchronic variation between old and new forms (Kay 1975;Weinreich et al. 1986).Synchronic variation as a consequence of diachronic change is typically found between generations, where the language use of older generations represents an older stage of the language while younger generations show a newer stage (Labov 1994).The differences between the language of multiple generations can be utilised to study language change without requiring longitudinal data but instead making use of 'apparent time' (Labov 1965).
Rather than making use of the map building tool in the NSD, all the Norwegian results for the four test items (Table 1) were downloaded and converted to a code based on the combination of acceptability scores the speakers assigned the four non-V2 wh-questions.For this, all judgement scores were converted to dichotomous scores; low ('1' and '2') scores were converted to '0' (not accepted by speaker) and medium and high scores ('3', '4', '5') to '1' (accepted) 1 .Subsequently, if a speaker for example accepts only subject non-V2 interrogatives (items #17 and #1228) but not non-subject non-V2 interrogatives (#988 and 33), this speaker gets the code '1 1 0 0' (see Table 1 for test items).This aggregate analysis of the variation encompasses as much of the variation between language varieties as possible rather than concentrating on single linguistic features.Dialectometrists such as Nerbonne (2011) have argued for such a perspective, claiming that linguistic variation is multifaceted and that individual features of most non-dialectometric work often do not coincide or are geographically exception ridden (2011:479).The R environment (R Development Core Team 2016) is used to perform statistical analyses; maps are drawn using the same environment as well as Gabmap, a web-based application that facilitates explorations in quantitative dialectology (i.e.dialectometry).Gabmap allows even researchers with little computational expertise to create various maps and graphs of dialect data intended to illustrate quantitative results insightfully (Nerbonne et al. 2011). [1] The distribution of the scores across the four items was bimodal to such an extent that it was judged to be reasonably representative to read scores '1' and '2' as 'not accepted' and scores '3' and up as 'accepted'.2. In an attempt to minimise noise in the distribution, combinations containing medium judgements (score '3') were removed before calculating the combination frequencies again (Figure 1, dark blue bars).The resulting distribution is not significantly different from the original (chi-square analysis: X 2 (14) = 10.9876,p = 0.687).Unexpectedly, the biggest differences between the two distributions are found not in the infrequent combinations (such as '1000' or '1100') but in the combinations that allow non-V2 in most or all wh-questions (i.e.'1110' and '1111').The relative frequency between the two distributions (with v. without medium score) for both of these combinations was significant ('1110': X2 (1) = 8.1169, p < 0.01 and '1111': X2 (1) = 9.8182, p < 0.01).For the four most frequent variants, chi-square analysis showed that gender of the participants did not play a role in the score distribution (X2 (4) = 1.8466, p = 0.764).figure 2ab: Score distributions for item #1228 (long subject wh) in speakers with the 'mixed V2/non-V2' dialect (left) and item #33 (long non-subject wh) for 'non-V2'-speakers (right) Looking closer at the distribution of the scores for speakers of the two variants '1110' and '1111', we find that the majority of the medium scores for the '1110'speakers are given to complex subject wh-questions (Figure 2a).Speakers of the latter variant give most medium scores to complex non-subject questions (Figure 2b).The acceptance of item #1228 (Hvor mange elever som går på denne skolen?) is precisely what distinguishes speakers of dialect '1110' from speakers of one of the other frequent combinations, namely '1010' that only allow non-V2 order with short wh's.Similarly, item #33 (Når tid du gjekk ut av ungdomskolen da?) differentiates combination '1111' from '1110'.I take this as evidence for a link between these variants (mixed and fully non-V2; dialect with non-V2 only with short wh and mixed) as the speakers will come to fall into a different category when the acceptance of complex wh-questions with non-V2 order drops or rises.These high medium scores also fit with the documented low frequency of complex wh-questions (Vangsnes & Westergaard 2014); lack of input might make speakers insecure about the acceptability of the different word orders in complex interrogatives.
Further evidence of the variability of non-V2 acceptance in complex whquestions comes from two additional complex non-subject wh-questions (in addition to #33) that can be found in the database (see Table 3).These items were not included in the original typology because less than half of the participants gave judgements (N = 203 for #43; N = 153 for #1368) for these items.To examine the relationship between the acceptability judgement scores on these three wh-questions within speakers, a Spearman's rank-order correlation was run to determine whether there was a monotonic relationship between the variables.The correlations between the scores given to the different items were very weak to moderate (see Table 3).All correlations were significant, so unlikely to have occured by chance.A possible explanation for the difference in acceptability of #33 and the other two sentences is that the wh-phrase når tid lit.'when time' can easily be reduced to the short wh-word når 'when'.For 7 locations in the NSD, this is indeed the wh-element given in the written dialect form of the test sentence.Simplex wh-questions are more frequent overall and the [8] westendorp NALS Journal, Vol. 3, 1 overwelming majority of non-V2 questions start with a short wh-word (92%; Vangsnes & Westergaard 2014).It is possible that this variability with respect to the wh-word has resulted in higher scores being assigned to item #33.The weak correlations between the items again confirms that there is a considerable variation on the acceptibility of this question type, which is likely due to the low frequency of complex wh-questions.Focussing on only the four most frequent groups ('0000', '1110', '1010', '1111'), the young (15-30 years old) and the old (50+ years old) speakers of these variants are examined further.Figure 3 shows the result of this analysis, here the codes are supplemented by a description of the different dialect types.Neither the difference between the two generations for each dialect type 2 nor the overall difference between all groups (chi-square analysis: X 2 (3) = 4.139, p = 0.2468) was significant.However, the 'only V2' and the 'mixed V2/non-V2' variants are spoken by more old than young speakers, effectively declining; while the use of the 'non-V2' and 'short wh non-V2' variants seems to be expanding as these are used by more young than old speakers. [2] Results of chi-square analysis between age groups (Figure 3): only V2: X 2 (1) = 0.439, p = 0.5076; mixed V2/non-V2: X 2 (1) = 0.9608, p = 0.327; non-V2: X 2 (1) = 0.0476, p = 0.8273; short wh non-V2: X 2 (1) = 2.7222, p = 0.09896.In splitting the data by age group, location is lost as a factor in the distribution of the different stages.Therefore, the differences and similarities between young and old speakers were also studied per location.In 15 of the 105 locations available in the database, there is an apparent disparity in dialect preference between the generations with the older informants speaking one dialect and the younger generation another.Twelve of these locations included both speakers using mixed V2/non-V2 dialect and speakers of the variant with only short wh-words allowing non-V2 order.The cross tabulation in Figure 4 shows the frequency of each combination of dialect stages between old and young speakers per location.Per location, each combination of a young and an old speaker was tallied.Only speakers without medium scores were included in the tally.The size of the circles is proportional to the size of the group of old and young speakers with the different combination of language varieties as indicated on the axes.
The circles on the diagonal indicate the number of combinations of old and young speakers per location that agree on a particular dialect variant.We see that the mixed variant is not very stable (only two sets of an old and a young speaker westendorp NALS Journal, Vol. 3, 1 agreeing) while the typologically most transparent stages are considerably more stable (only V2, non-V2).The lower right corner of the diagram is filled more than the top left, which fits with the results presented in Figure 3 supporting the idea that young speakers use the dialects with non-V2 in all or only with short whwords more than the older generation.The high frequency of the combination of young speakers allowing non-V2 only with short wh-words and older speakers with the mixed variety is remarkable.This overlap shows that these varieties often occur together in the same location and suggest a historical connection with the mixed variant being the archetype for the variant where non-V2 is constricted to be allowed only in simplex wh-interrogatives.No connection between any of the other combination of variants is as apparent.

[2.4] Interim summary
Concluding, the data presented in this section provide substantial evidence for the existence of four main wh-grammars across the Norwegian dialects 3 .We find support for the following grammars: one allowing the standard verb-second word order only; a grammar that allows non-V2 with all types of wh-questions except long non-subject questions; a grammar that accepts non-V2 across all whquestions and a grammar where the non-V2 order is restricted to questions starting with short wh-elements.The score distributions for the different test items, the comparison between old and younger speakers, as well as the crosstabulation of different judgement combinations per location showed evidence for a connection between the mixed V2/non-V2 variant and the variant restricting non-V2 to short wh-words.The grammar allowing non-V2 across all items is not shown to be connected to any particular other stage using the apparent-time data.
[2.5] Aggregate variation Figure 5 plots the geographical distribution of the different non-V2 grammars across Norway.The size of the points is indicative of the number of speakers in each location using the variant.The mixed V2/non-V2 (pink dots) and the variant that allows non-V2 only with short wh-words (blue) are used mostly north of Trondheim, whereas the varieties prefering V2 or non-V2 across all types of whquestions are most prominent in the southern part of Norway (resp.red and green).Based on the data in the previous paragraph the non-V2 variant could not be linked to any of the other grammars diachronically.However, the geographical [3] This division into four groups of dialect varieties concerning (non-)V2 in wh-questions may be a consequence of the way the NSD is designed as well as the selection of the four test items included in this study.i.e. the way the test sentences and test variables are grouped.

WORD ORDER VARIATION IN NORWEGIAN WH-QUESTIONS
[11] NALS Journal, Vol. 3, 1 distribution of the non-V2 and the 'only V2' variants may inform us about a connection between these two dialects.I propose that the increased use of the 'non-V2' variety is caused not by a spread of non-V2 word order to more types of wh-questions as hypothesised in earlier studies, but instead is the result of linguistic borrowing of the non-V2 construction by speakers originally having a strict V2 requirement across all interrogatives.As a result of the increased input of non-V2 wh-questions, speakers formerly disallowing non-V2 adopt non-V2 word order into their grammars.However, these speakers borrow this non-V2 word order and generalise the order across all types of wh-questions in the mirror image of their own dialect.This idea fits with the geographical distribution of the non-V2 dialect which is spoken in a region between the Northern counties where non-V2 is widespread but most often not allowed across all question types and the south of Norway where non-V2 is not present.Finally, the 'network' or 'beam' maps in Figures 6 and 7 visualise the aggregate linguistic distances between the locations in the dataset.These maps are based on the mean linguistic distances between pairs of sites in the dataset.For every site, all the individual data points for the four test items were included.The darkness of the lines is directly proportional to the linguistic similarity between the sites.These figures confirm the pattern in the earlier figures, Norway can be roughly divided into three regions by the level to which non-V2 wh-questions are accepted.That is, two regions which are linguistically similar internally: Northern Norway (Trondheim and northwards) and a region in the southeast around Oslo.The third area, broaching West and Central Norway, is linguistically more diverse as indicated by the group of lighter beams in Figure 7. From Figures 5 and 6 we can conclude that this convergence of lighter beams has two separate explanations.Figure 5 showed that there was little agreement in the area in the south east Norway and none of the speakers here used any of the four main grammars.Furthermore, Figure 6 shows that there is a split between the Oslo-region (only V2) and the central west (non-V2 in all wh-questions).The low frequency of complex wh-questions documented by Vangsnes & Westergaard (2014) has been put forth as a central part of the explanation that non-V2 word order originates in simplex wh-questions as well as in explaining speaker's uncertainty concerning the acceptability of complex non-V2 interrogatives.The frequency of the particular four types of questions in the Nordic Syntax Database specifically was not tested by Vangsnes & Westergaard (2014).In the Nordic Dialect Corpus (Johannessen et al. 2009) a total of 880 examples of non-V2 wh-questions match the four types of wh-questions of the database.After manual exclusion of non-main clause sequences, 331 relevant results are left (see Table 4 below).Complex wh-questions are as expected very infrequent; accounting for only 6,3% of the total.The most frequent type of whquestion with non-V2 order found in the corpus corresponds to #988 in the NSD.
The relative infrequency of subject wh-questions is disjoint with the hypothesis by Westergaard et al. (2017) that non-V2 starts in subject wh-questions.There are two main theories of how the non-V2 word order developed: either from wh-questions in cleft constructions which are reduced as proposed by, amongst others, Lie (1992) and Westergaard et al. (2012Westergaard et al. ( , 2017)); or from embedded questions which always have non-V2 word order in Norwegian (e.g.Jeg lurer på hva han gjør.'I wonder (about) what he is doing.')(Iversen 1918;Knudsen 1949;Fiva 1990).As is known from research in language change, frequency is often a driving force in phonetic reduction (Jurafsky et al. 2001).Hence, one would expect this reduction to occur in a frequent construction if we are to take cleft reduction as the starting point for non-V2.The same argument can be applied to the hypothesis that non-verb second word order originates from embedded questions, presuming of course that short non-subject wh-questions are also the most frequent type of embedded question.Whether it is the main clause remaining unexpressed in such cases, or adoption of the embedded word order because it is more economical to not move the verb; frequency is likely to play a role here as well.It is important to keep in mind however, as is known from language acquisition research, that often it is not the mere number of examples but rather the sense in which a given construction may provide a clue for the underlying grammar that is decisive in determining whether a (novel) construction is adopted (Diessel 2007).A first step to test the above speculations would be to verify the frequency of clefted and embedded wh-questions.Nevertheless, it is probable that frequency plays some role in the change from strict verb second to non-V2 word order.On the basis of the corpus data, I would therefore tentatively suggest that non-V2 order developed in simplex non-subject wh-questions (i.e.type #988 from the NSD).
[3.2] Relation to Westergaard et al. (2012Westergaard et al. ( , 2017) ) Westergaard, Vangsnes & Lohndal (2012, 2017) have previously studied the word order variation in Norwegian wh-questions based on the four items in the database also discussed in this article.They propose that the loss of the V2 requirement is related to changes in the properties of the complementiser som and distinguish the five stages in the development (2016:27-8): (2) stage 0: General V2 stage 1: non-V2 in all subject questions with short and long wh-elements stage 2: non-V2 spreads to non-subject questions with short whelements stage 3a: non-V2 spreads to non-subject questions with complex whelements stage 3b non-V2 is restricted to short wh-elements The findings from the Nordic Syntax Database (Lindstad et al. 2009) presented here provide more evidence for some, but not all of the stages above.That is, the four variants that were shown to be most frequent, correspond to four of the five stages in Westergaard et al.'s (2017) proposal: i.e., stages 0, 2, 3a and 3b (see Figure 1).Stage 1 as described in (2) corresponds to the score combination '1 1 0 0' which was shown to be significantly less frequent across Norway 4 .Secondly, from the comparison between generations (Figure 2), we observe that the variants that are declining correspond to what Westergaard et al. (2017) propose to be older variants, while the other variants correspond to newer stages in their account of the development of non-V2 word order.Apart from a link between stages 2 and 3b ('1110' and '1010'), no evidence for the non-V2 word order spreading through the five stages 0 to 3b was found in the present study.

[3.3] Discussion of findings
The present study showed that there are four groups of dialects distinguishable on the basis of acceptability judgements on four non-V2 wh-questions in the Nordic Syntax Database (Lindstad et al. 2009).These four grammars have either only V2 word order; non-V2 word order across all wh-questions; non-V2 in all but long non-subject wh-questions; or non-V2 only with short wh's.The data show a few issues that require further exploration.In the first place, a striking finding is that not all the grammars could be linked to one another.The apparent-time study, as well as the cluster and linguistic distance maps, showed a clear connection between non-V2 with all but long non-subject wh's ('1110') and the grammar that allows non-V2 only with short wh's.However, no link between the former and the grammar with non-V2 in all wh-questions was found in the apparent-time study even though this was earlier hypothesised by Westergaard et al. (2017).I, therefore, proposed an alternative explanation that the grammar allowing non-V2 across all types of wh-questions is the result of the adoption of non-V2 by strict V2-speakers borrowing the construction in the mirror of there own underlying dialect type.Hence, the mixed grammar seems to be the archetype for grammar with non-V2 with short wh's that is an adaption of this grammar but with a phonological restriction.The grammar with non-V2 in all wh-questions is the result of a syntactic generalisation.
Finally, though this analysis of the database material has provided new evidence on the types of wh-grammars in Norwegian dialects, no conclusive explanation can be given as to why the non-V2 word order arose in the first place.Westergaard et al. (2017) argue that the word order change starts with changes in [4] Of course, a possible explanation of the unexpectedly low frequency of the assumed stage 1 in the scenario by Westergaard et al. (2017) is that this stage supposedly is the starting point of the whole development.It could well be the case that exactly because it was the starting point, it nowadays is less frequent. [16] westendorp NALS Journal, Vol. 3, 1 the lexicalisation possibilities of the complementiser som 'that', but that hypothesis was not borne out by the data presented in this article.Alternatively, I presented data from the Nordic Dialect Corpus arguing in favour of the hypothesis that non-V2 first appeared in short non-subject wh-questions.Still more research is needed to investigate what has caused the V2-requirement to change.
[4] conclusion Throughout this study, new methodologies were used in an attempt to overcome some of the limitations of the map tool in the Nordic Syntax Database as well as present new insights from a detailed examination of the acceptability judgements gathered in the database.The present study has investigated several hypotheses concerning the diachronic development and synchronic variation of non-V2 word order in Norwegian wh-questions.These hypotheses were tested by examining acceptability judgement data available in the Nordic Syntax Database of 409 informants from 105 sites across Norway.Examination of the frequency of acceptability judgements across individual speakers showed that four groups of dialects could be distinguished by the non-V2 variation across the four test questions.These four grammars have either only V2 word order; non-V2 word order across all wh-questions; non-V2 in all but long non-subject wh-questions; or non-V2 only with short wh's.Additionally, the geographical distribution of these four grammars was discussed.By using the apparent-time method, a historical connection between the latter two grammars was found.
figure 1: Frequency of use of different combinations of judgements on four non-V2 questions Figure 1 provides a graphical overview of the frequency distribution of the different judgement combinations across Norway.Data from 373 participants is included, for the remaining 36 speakers, the data in the NSD was incomplete.Apart from the combination '0000' where only V2 order is accepted, three combinations of judgements stand out as very frequent: '1010', '1110' and '1111'.A breakdown of these combinations is provided in Table2.
figure 3: Frequency of use of different dialect types split by age group

figure 4 :
figure 4: Cross tabulation of different dialect type combinations between young and old age group per location (without medium scores)

figure 6 :
figure 6: Aggregate linguistic distances between neighbouring sites figure 7: Aggregate linguistic distances between all sites

table 3 :
Spearman 's rank-order correlations between test items with complex, non-subject wh-elements