Competing Pressures in Diglossia: Avoiding Colloquial Elements in Writing Modern Standard Arabic

This article investigates speaker choice of variant lexemes and structures when writing in formal Modern Standard Arabic, using a multiple-choice survey that was distributed to 28 native speakers of Damascene Arabic. The study finds that speakers tend to avoid elements that are common in their local colloquial dialect, even if they are attested and permissible in Modern Standard Arabic, what might be called “negative interference.” However, in some cases interference from the colloquial form is so strong that speakers appear to be confused as to which form is correct (“positive interference”), and when given the choice, prefer to avoid problematic forms altogether. These results suggest that there are a number of competing pressures in diglossia, supplementing previous studies which have primarily found evidence of positive interference from the local dialects on Modern Standard Arabic. This study concludes that this avoidance behavior may explain the historical robustness of diglossia, as well as some of the regional variation that occurs in Modern Standard Arabic.


Introduction
The study of diglossia, since the publication of Ferguson's seminal paper (1959), has largely been dominated by a descriptive approach to register variation, especially with regards to Arabic diglossia.These studies involving Arabic (see for example Badawi 1973;Hary 1996;Meiseles 1980;Walters 2003) have attempt to further articulate the distinctions beyond Ferguson's simple dichotomy of "High" and "Low" and to catalogue the linguistic variables that characterize each level or register, but have done little to illuminate how these levels emerge from the complex interaction between a speaker's native dialect and Modern Standard Arabic (MSA), acquired largely through education.
An exception to this is Belnap and Bishop (2003), who used structured interviews to investigate how and why speakers produce a mixed register in personal correspondence that combines Modern Standard Arabic (MSA) and elements of their native spoken dialects.Some speakers would avoid forms which might betray an inability to correctly use the case system of MSA, though it is not clear from the article what exactly they substituted for the problematic forms.In other cases, speakers created hybrid forms that are text-based, descriptive approach, and may be responsible for failing to uncover pressures pushing the speaker away from, rather than towards, their native dialect when writing formal MSA.These pressures that promote the avoidance of colloquial lexemes and structures might be termed "negative interference," in that their influence on MSA moves away from the dialect rather than towards it.
This study takes a novel approach to analyzing the pressures operating in diglossia; rather than attempting to analyze texts, it employs a survey to test what choices a speaker makes when producing and homogenizing the register of those texts.As the literature on Arabic registers has clearly shown, speakers are constantly drawing on different levels of language, and thus every-day discourse tends to be a heterogeneous mixture of elements from various levels of the language.In order to produce a text which is linguistically homogenous, speakers must meticulously filter out the elements which they feel do not belong to the targeted level.By looking at this process of filtering, we can discern the decisions that native speakers make when choosing what words and structures to use.Such an approach allows us to develop a better model for how speakers themselves conceive of linguistic levels and registers and the pressures they face in working within this system.
The focus of this study is on written MSA, which brings with it two primary advantages: first, focusing on a written register can control for the vagaries of pronunciation and accent, and second, writing generally carries with it pressure to use a higher register.However, it is not sufficient to assume that all written Arabic is in a homogenous register; indeed, Belnap and Bishop (2003), discussed above, showed that even the language of personal correspondence is at times subject to register mixing.Therefore, the register normally used in journalistic, narrative, and expository writing, the example par excellence of MSA, particularly for its extremely infrequent use of colloquial elements, was chosen as the basis for this research.
Inasmuch as this article is trying to determine how speakers themselves define colloquial and MSA, it is difficult but necessary to have a working definition of each register.MSA will be defined as the variety which uses MSA morphology, function words (such as the subjunctive particle ʾin) and constructions (e.g. the use of lā to negate present tense verbs) as described in reference grammars of MSA such as Dahūn (2003) and Buckley (2004) and dictionaries of MSA such as Wehr & Cowan (1994).Damascene Colloquial Arabic (DCA) is the variety of Arabic defined by the use of a different set of grammatical markers and constructions (such as indicative b-marker, nominal mu-negator, etc.) as described by Cowell (1964) whose excellent grammar of Levantine Arabic has not yet been surpassed. 2There is undoubtedly an area of overlap between the two varieties where the nature of a given word or construction is ambiguous, especially in a written context where pronunciation is not clearly indicated.In these cases, it was necessary to call upon the judgments of native speaker informants.Lexically, a word is defined as belonging to DCA if it is used frequently in daily, non-elevated speech contexts, as judged by both the author and the informants (given the absence of any studies of Syrian or Damascene Arabic word frequencies), while MSA lexemes are those which are used frequently in formal written contexts, as verified against the BYU Arabic newspapers corpus maintained by Dilworth Parkinson ("arabiCorpus") and native speaker judgments.
The primary finding of this study is that speakers appear to avoid colloquial forms rather than choosing standard forms, suggesting that formal written Arabic is not defined in its own right, but rather in contrast to colloquial dialects.The study begins with an explanation of how the research instrument was developed and employed in Section 2, followed by an analysis of the results in Section 3, specifically the avoidance of colloquial forms in Section 3.2 and the interference from colloquial varieties in Section 3.3.Section 4 discusses the wider implications of this study, while Section 5 offers a summary and conclusion.The Appendix contains the complete text of the survey.

Methodology
The instrument used in this study is an original multiple choice survey developed by the researcher.This survey allows us to restrict the options of the speakers, and thus to generalize across a large number of respondents.Each question on the survey requires the respondent to make a decision, and thus the survey lays bare the decision-making process used when speakers attempt to produce a text in a single homogenous register.Thus, both the preferred and dispreferred responses give us insight into how this process works.A multiple choice survey also has weakness, especially the danger that respondents would have preferred options that were not present; however this study is best viewed as a baseline which further research can elaborate on.
The survey was developed in consultation with native speaker informants who work as teachers of both DCA and MSA at the University of Damascus.It consists of 39 items written in Arabic, each consisting of a sentence written in a formal style, in the register of journalistic prose, with a blank and two or three options for filling the blank.The prompt for respondents was ḍaʿū dāʾira ḥawla al-kalima al-ʾansab 'place a circle around the most appropriate word.'A complete copy of the survey is in the appendix.
Four different categories of elements which may vary between registers were included in the survey: (1) word choice, (2) derivational and inflectional morphology, (3) preposition use, (4) syntactic structures and collocations.For each category, pairs of synonymous words or phrases were chosen such that one of the forms was frequently used in DCA, while the other is less common or not used at all.Frequency judgments were informally obtained from native speaker informants.Most of the words or phrases were also chosen such that they are forms attested in formal Arabic writing.Four of the questions in the survey had three options, where one option was clearly colloquial in form, and the other two were closer to MSA.Sentences were ordered randomly, as were the response options.
Judgments as to what items are colloquial or standard were based on native speaker judgments and standard reference works, as describe in the introduction, supplemented with searches in the BYU Arabic newspaper corpus.Where relevant, the results of searches in the BYU corpus are referenced in the analysis in the text.Native speaker consultants also checked the sentences for any linguistic errors and to ensure that they were indeed written in a homogenous register.
Demographic information such as age, gender and education level was collected and used in the analysis.Age may act as a proxy for changes in social norms in the use of language, while the fact that MSA is a variety acquired through education may be reflected in the relation between education and the respondent's choices.Gender was included as most of the research on language and gender in Western contexts has largely found that men tend to conform less closely to the standard language than women (For an overview, see Wodak and Benke 1998).However, few sociolinguistic studies of gender have been conducted in the Arab world, and thus it is unclear whether this variation based on gender lines is in line with or contradictory to local norms.While Bakir (1986) did find that men were significantly more likely to conform to MSA norms than women in Basra, which parallels results from Abd-el-Jawad (1983) in Amman, Jordan, both these studies focused on speech rather than writing.Nonetheless, the expectation from most of the sociolinguistic research is that women will conform more closely to the standard forms than men.

Results
The majority of the results of this study appear to be the result of speakers' avoidance of forms that occur in colloquial speech, i.e. of negative interference from DCA.This avoidance goes beyond the lexical to include both morphological and syntactic structures.Additional influences include positive interference from colloquial patterns or a rejection of certain neologisms.This section first offers a broad overview of the data before analyzing the results in detail.

Overall Results
Of the 28 respondents, 18 were women, 10 were men.Seven of the respondents were either in the process of completing or had completed a high school degree, while 15 of the respondents were working on or had obtained a post-secondary degree.Five of the respondents had completed graduate degrees.One respondent did not indicate his degree status.The youngest respondent was 16, the oldest 65; the average age was 30.8 years, standard deviation 10.69 years.All respondents were natives of Syria, and were almost all were from Damascus or surrounding areas.
A baseline figure for the preference of DCA over MSA forms was obtained by counting the number of responses overall that chose the colloquial option for each question. 3Across all speakers, the colloquial option was chosen 20.7% (standard deviation: 7.8%) of the time.There were no statistically significant differences between male and female speakers; however in some cases education was statistically significant.High school graduates chose the colloquial option 29.3% (std.dev.: 15.3%) of the time, while those with at least some university education chose the colloquial option 18.4% (std.dev.: 8.0%) of the time, and those with post-graduate education chose the colloquial option 16.4% (std.dev.: 4.0%) of the time.The difference between the high-school educated speakers and university educated speakers was statistically significant (two-proportion z-test: -3.64, p-value adjusted for three-way comparison: < .01)as was the difference between high-school educated speakers and those with post-graduate education (two-proportion z-test: -3.22, p-value adjusted < .001),but there was no statistically significant difference between college and postgraduate educated speakers.There was no significant correlation between the number of colloquial options chosen and age.

Avoidance of Colloquial
The primary finding of this study is that speakers appear to deliberately avoid forms attested in classical and modern standard texts when they perceive these forms to be colloquial.Speakers not only avoid DCA lexemes, but also avoid morphological and syntactic forms associated with DCA.The general trends discussed above support this result, but in the sections below the results will be analyzed on the level of the individual items.

Word Choice
The results shown in Table 1 clearly show that speakers tend to avoid MSA lexemes associated with DCA.If this were not the case, we would expect approximately evenly split results, however the results are often tipped strongly away from colloquial forms.In items 2, 4, 8, 14, 27 and 34 the non-colloquial option is preferred by 88% or more of respondents, a near-categorical rejection of the use of DCA lexemes.In all these cases, the "colloquial" words in their uses here are present in formal Arabic dictionaries and are attested both in classical and MSA texts.
In an ambiguous context, such as item 34, where the sentence 'When the child lost his new toy, his mother refused to buy ______ toy' allowed for either the interpretation 'another' or 'a second,' the phonological similarity between MSA ṯāniya 'second' and DCA tāniya 'another' seems sufficient to push speakers to avoid this word in favor of the MSA uḫrā, even though ṯāniya would have been acceptable in this context, and is clearly an MSA lexeme by virtue of containing an interdental, absent in DCA.
That this is avoidance behavior, rather than simply differences in word choice is shown by the three-way split in item 20.Here, there is no clear preference in the choice between MSA ‫إطار‬ ʾiṭār and ‫عجلة‬ ʿaǧala, but when combined these two words account for 85% of the responses, against DCA ‫دوالب‬ dūlāb for 'tire.'Speakers are therefore not sure what the best alternative is, but are united in their dispreference for the lexeme used in DCA. 4requency of usage in colloquial also appears to play a role in avoidance behavior, with speakers preferring to choose a word less frequent in colloquial.In item 4, both ‫الالزم‬ allāzim in item 4, the primary colloquial modal used for obligation and ‫الضروري‬ aḍ-ḍarūrī are attested in DCA.In absence of any word frequency studies on DCA, there is no empirical evidence of the former being more frequent than the latter.However it seems quite likely, since ‫الالزم‬ al-lāzim is a modal auxiliary, while ‫الضروري‬ aḍ-ḍarūrī is simply used as an adjec-Table 1: Word Choice tive and we expect function words to be significantly higher frequencythan content words, which does imply a frequency effect in the avoidance of ‫الالزم‬ al-lāzim.5 Item 5 tested whether terms of Arabic origin would be preferred to loan words of foreign origin.The results show that the Arabic-based neologism ‫ائي‬ ‫الر‬ ar-rāʾī for 'television' was the least favored of the three choices, suggesting that Arabic origin does not grant favored status in formal writing.Indeed, a search of the BYU newspaper corpus shows only a single use of this lexeme in the meaning of 'television,' and it is used only parenthetically to clarify the loanword ‫تلفاز‬ talfāz: The DCA response is that element which is closest to the more frequent colloquial Arabic usage.
The same convention will be followed in all following figures.
Item 10, containing words sharing the meaning 'to occur,' was intended to act as a control, as neither of the choices clearly represents the DCA usage, which would require the verb ṣār.The results are relatively more evenly divided than most of the other lexical items, but there is a clear bias towards ‫ثحد‬ ḥadaṯa.This may be due to the relatively higher frequency use of that word for the meaning of 'to occur' in MSA, while ‫حصل‬ ḥaṣala is primarily used with a prepositional complement ʿalā with the meaning of 'to obtain.' 6 This suggests that there are some frequency effects from MSA itself that are not caused simply by avoidance of DCA, though more similar items would be necessary to clarify the role of frequency.

Morphological Distinctions
Speakers also avoid the use of morphological forms, whether nominal or verbal, which are associated with colloquial.That is to say, even when the choice is between two words from the same consonantal root, speakers have clear preferences against certain derived morphological forms.These results summarized in Table 2.
With regards to verbs, items 22 and 35 show speakers tend to prefer the MSA internal passive forms (passives formed by modifying the vocalic melody of the original verb) over the morphological derived "reflexive" forms (forms formed by the prefixing or infixing of certain consonants) which reflect DCA usage.Similarly, in items 13 and 15, respondents prefer verbal derivational forms (ʾawzān) which are not present in colloquial, even though the meaning is the same.Thus, they prefer the ʾafʿal causative form (IV) to the equivalent faʿʿala form causative form (II) in 13, and the iftaʿala reflexive form (VIII) to the tafaʿʿala reflexive form (V), the latter forms being much more frequent in DCA.The two forms IV and VIII are not very productive in DCA, the former being almost exclusively used with classicisms, and thus speakers appear to think of them as MSA forms (Cowell 1964:pp. 85, 100).
Nouns show the same pattern of avoidance of colloquial forms.In item 17, no respondent chose the feminine plural form samakāt for 'fish (pl.)' though it is an acceptable form in Syrian colloquial, versus the "broken" plural ʾasmāk.The feminine plural for this word is however rare in MSA, with only 10 occurrences in the BYU newspaper corpus versus over 3,500 for the broken plural form.Similarly, in item 19, the broken plural form was heavily preferred to the feminine plural suffix which is used frequently in colloquial.In general, it appears that speakers perceive the broken plural forms as somehow more standard than 6 A search of the BYU newspaper corpus with the search terms ‫أن‬ ‫حصل‬ ḥaṣala ʾanna and ‫أن‬ ‫حدث‬ ḥadaṯa ʾanna 'it happened that,' designed to match synonymous uses, finds 1.93 instances of ḥadaṯa ʾanna per 100,000 words, versus .34 for ḥaṣala ʾanna.

Prepositions
The results regarding prepositions, summarized in Table 3, are somewhat less clear, but show a similar trend of avoiding colloquial forms.The clearest result is in the strong rejection of the colloquial ‫من‬ min 'since' in favor of ‫منذ‬ munḏu in items 18 and 23.Similarly, speakers rejected the use of ‫على‬ ʿalā for motion towards, as it is used in DCA, in item 31.
However, for a number of other prepositions, there is no clear avoidance of DCA structures.In item 1, there is no preference for using the preposition ‫إلى‬ ʾilā, which is absent in DCA, for the dative object.Similarly, in item 36, where the verb subcategorizes for ‫على‬ ʿalā, a preposition used also in DCA, speakers were surprisingly split between this and the less frequent ‫إىل‬ ʾilā which is occasionally used with this verb in the same meaning.7The cause of the split in the choice of ‫إىل‬ ʾilā versus ‫على‬ ʿalā in item 36 is rather unclear, since frequency of usage would predict a preference for ‫على‬ ʿalā.However Syrian informants suggested that the phrase ‫اىل‬ ‫التعرف‬ at-taʿarruf ʾilā, possibly a form of hypercorrection, is seen by some as more "correct" than ‫على‬ ‫التعرف‬ at-taʿarruf ʿalā for various reasons and may be taught as such in schools.
The prepositions ‫ـ‬ ‫ب‬ bi-and ‫يف‬ fī 'in' occur in complementary distribution in DCA, such that the preposition fi-is used only preceding pronouns, while bi-is used elsewhere (Cowell 1964: 479).Several items on the survey (1, 3, 11, 12 and 24) were designed to test how

‫إلى‬ ‫التعرف‬
at-taʿarruf ʾilā 42.9 speakers would handle these prepositions in variety of contexts, with the prediction that preposition use would be the opposite of DCA.This was indeed found to be the case.In item 24, the prepositional phrase is an adjunct and therefore not subcategorized for by the verb, speakers overwhelmingly chose ‫يف‬ fī before a full nominal, against the DCA pattern.
The items also tested the interaction between subcategorization and preposition choice.In items 3 and 11, where the non-colloquial structure is already subcategorized for by the verb, the results reflect the prediction, but it is not clear whether speakers chose the preposition due to the verb or in order to avoid using a DCA preposition, though it clearly shows that there is relatively little positive interference from DCA.
Where there is a conflict between the subcategorization of the verb, and avoiding the colloquial structure, speakers hypercorrect away from the use of DCA.In item 12, speakers preferred to avoid the colloquial patterning for bi-and fi-over choosing the correct phrasal preposition.The correct preposition for the verb ‫رغب‬ raġiba "he desired" is ‫يف‬ fī; however, in this instance, the preposition is followed by a pronoun, making the use of ‫يف‬ fī congruent to the DCA structure.Speakers, therefore, are left with a conundrumwhether to choose the correct preposition, or to avoid the appearance of colloquial.The data shows a majority of respondents taking the latter strategy by choosing the preposition ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ب‬ bi-instead, thus they avoid using a DCA structure, but violate the rules of MSA.This suggests that the pressure to avoid DCA is a stronger than the avoidance of errors in MSA.
The level of hypercorrection appears to differ along gender lines, with men choosing to hypercorrect to a much higher degree than women.Though women were perfectly evenly split in their choice of prepositions, men were significantly different (two proportion z-test = 2.12, p = 0.034) preferring to hypercorrect by choosing ‫به‬ bi-hi (90.0%) versus 9.0% who chose ‫فيه‬ fīhi.The literature on the Arabic world discussed above in Section 2 suggests that men are more likely to conform to MSA norms in their use of Arabic than women.This data might at first appear to contradict that research, since men are overwhelmingly choosing incorrect forms.However, since speakers in this study appear to consider whatever is furthest from DCA to be the most formal, the men are indeed attempting to use to what they perceive to be correct MSA. 8  Both secondary and university students appeared to prefer hypercorrecting (62.5% and 71.4% hypercorrected respectively) while the opposite was true for those with graduate degrees (only 40% chose to hypercorrect).This result is expected, as more educated speakers would presumably be more sensitive to making mistakes in MSA than those with less education.However, due to the very small number of graduate degree holders, the difference is not statistically significant.A larger sample size in a future study might be able to discern more clearly what influence, if any, education has on hypercorrection.

Syntax and Collocations
Finally, on the level of syntactic structures and collocations, we continue to find avoidance of colloquial forms, with the results summarized in Table 4.In items 16 and 32, speakers avoid the use of ‫ما‬ mā for negation of the past tense, a form which is not only acceptable in MSA, but which is also used as the primary form of negation in past and present in DCA.
In item 21, respondents reject the use of the active participle form ‫كاتب‬ kātib with a perfect meaning inside a construct phrase, a normal and acceptable usage in DCA (Cowell 1964:262), but rare in MSA. 9 Similarly, in items 26 and 29 speakers avoid using the complementizer ‫ما‬ mā, shared between MSA and DCA, preferring instead the exclusively MSA ‫أن‬ ʾan.
Items 7 and 9, which tested the placement of the word ‫نفس‬ nafs 'self, same', were included in the survey due to an informant's intuitions that the placement of this word was a function of register, with the placement of nafs following the noun in apposition more formal that when preceding the noun.Typically appositional structures of this type are emphatic in nature, and do not necessarily appear to have any relation to formality.Further-8 It would be incorrect to say that this data contradicts the results of much European and American sociolinguistic research that finds women are more likely to conform to linguistic standards, since both the linguistic situation in the Arab World, the modality of this study (written as opposed to spoken) and the sharply different histories of education in the two regions make any results difficult to compare.9 This use of participles with a verbal meaning is rare in modern standard Arabic but it was acceptable in classical Arabic.Wright (1896:131-132) {Citation}notes that some participles can be used both with the meaning of a permanent quality and as a "real participle, indicating a temporary, transitory or accidental action or state of being," a usage quite similar to that in Syrian colloquial Arabic.more, both structures can occur in MSA and in DCA.The evenly split responses in these questions suggest that indeed, the placement of nafs is probably not governed by concerns of formality or avoidance of colloquial style and here they stand in essentially free variation.
Items 37-39 tested the choice between SVO and VSO word order.Both word orders occur in DCA, though Cowell (1964) notes that indefinite subjects tend to follow the verbs, while definite subjects occur either before or after the verb, suggesting that VSO may be the less marked order in DCA.Each item had definite subjects, allowing either word order, but for all three of these items, respondents strongly prefer the verbal sentence.The preference for verbal sentences poses something of a problem given the fluidity of Syrian colloquial word order.One possibility is that VSO sentences are simply seen as more formal, regardless of their status in colloquial, a possibility supported by informal conversations with native speakers who prefer the use of VSO sentences in formal writing.
Thus, the evidence shown here suggests that, when writing in formal MSA, speakers deliberately avoid lexemes, morphological forms and syntactic structures which occur frequently in their native dialect.This reaches the level of hypercorrection, where some speakers will actually make linguistic errors in their use of MSA to avoid using a construction present in colloquial Arabic.However, some constructions, such as the VSO word order, appear to be thought of as appropriate for a formal context regardless of their presence or absence in DCA.In addition to this, there are certain neologisms that, though both native and formal, simply do not appear to be well accepted by speakers of Arabic.

Interference from Colloquial
One morphological form, verbs with double final radicals, was difficult for respondents as the conjugation of this form differs between DCA and MSA.In MSA, the past tense stem of these verbs for 1 st and 2 nd person has a vowel between the final radicals, whereas the 3 rd person forms are geminated.In DCA, these verbs are treated as final weak verbs, with the vowel /e:/ inserted before the 1 st and 2 nd person suffixes, and gemination maintained in the 3 rd person stem.
The results of the survey items that deal with these forms, detailed in Table 5, are strangely contradictory.In item 25, speakers clearly prefer the form closer to MSA, iṭmaʾnantu, while the opposite is true of item 33, with almost the same percentage breakdown.In item 30, speakers who chose the verb istamarra are evenly divided between the colloquial and standard forms, and indeed many of them chose an entirely different but semantically similar verb instead.The fact that speakers have difficulty with these forms may suggest that they have an imperfect mastery of the MSA conjugation system.This is somewhat unlikely however, as speakers were able to choose the correct form in item 25, and moreover, no significant differences were found based on level of education in the pattern of responses to this question.A more likely explanation is that speakers' knowledge of colloquial interferes to a large enough degree that in the case of words whose forms match that of DCA, they are unsure of the correct form of the verb.This data support this hypothesisin the case of the verb ‫اطمأننت‬ iṭmaʾnantu in item 25, the colloquial form of the verb is recast as a form II (doubled middle radical) verb and therefore would not be conjugated as a final week radical verb in DCA, thus the form *iṭmaʾinnayt does not appear in DCA, and cannot therefore interfere significantly. 10This contrasts with the form ‫ينا‬ ‫أصر‬ ʾaṣarraynā in item 33, which is identical to the colloquial ʾaṣarrēna and which appears to interfere with the choice of the correct form.A similar effect occurs in item 30; however half the respondents chose to completely sidestep the problem of how to conjugate the verb and instead chose a semantically similar but structurally less difficult verb.
These results suggest that while speakers do indeed avoid colloquial forms when possible, they may not actually be sure of which form is DCA and which is MSA, resulting in interference of DCA in their formal writing.For most of the forms surveyed in this study, speakers had an apparently quite clear idea of which of the choices was more formal than the others, but here the speakers are simply unable to determine unambiguously which the more formal variant is.Further research is needed to determine whether this is true of other forms in the language, and if so which forms, and why those forms specifically are difficult.The strategy of avoiding difficult forms altogether also highlights why previous studies based on texts are limited, since there is no way of knowing what forms were considered or avoided via ex post facto evidence.

Hypocorrection and Register
The results of this study offer a counterpoint to the research discussed in the introduction that suggests the major pressure in diglossia is toward hypocorrection (Belnap and Bishop 2003) or positive interference from speakers' native dialects as they write in MSA (Wilmsen 2010).This study instead found evidence of speaker avoidance of colloquial language, to the point of hypercorrection, what we refer to here as negative interference.At the same time, the study also found further evidence of positive interference from some colloquial structures, though this triggered a different type of avoidance behavior, such that speakers, when given the option, will simply sidestep difficult structures.
This evidence therefore suggest that within a diglossic environment, there exist multiple and at times conflicting pressures on a speaker trying to use a specific register.There are pressures to not use too elevated a register, as noted by Belnap and Bishop (2003), nor too casual a register, as shown in this study.At the same time, a speaker's native dialect can interfere and cause them to choose structures from that dialect even when their target register is MSA (this study and Wilmsen 2010).What is unclear is why certain structures are avoided, and others are not.There may perhaps be issues of salience.The sociolinguistic distinction between indicators, which operate below the level of social awareness, markers, for which there is some social awareness and style shifting, and finally stereotypes, which are often an overt part of speakers' metalinguistic awareness of stylistic differences (Labov 2001:196-197) may be relevant here.Within such a framework, those structures which are explicitly avoided in favor of MSA forms may be markers, while speakers have presumably less sociolinguistically awareness of the forms which demonstrate interference behavior.Structured interviews, similar to those used in Belnap and Bishop's study, could provide some light on this subject in the future.
Education, both in the sense of the producing general linguistic competence, as well as the idiosyncratic "pet peeves" of a speaker's teachers may also influence which structures are more or less salient or accessible to a speaker.Some speakers may not be as familiar with the prescriptive rules of MSA as others, and may even be aware of their linguistic limitations, which causes the avoidance of "tricky" forms in favor of less problematic ones as seen in item 30.Teachers may also single out a particular structure as being "colloquial" while ignoring other structures that are indeed colloquial in form.This study does provide some evidence for the role of education, since the speakers with more education appeared to avoid the colloquial words and forms more fastidiously than those with less education.

Implications for Diglossia
A number of studies on variation in Arabic, especially those that explore the use of colloquial Arabic in "unexpected" places such as writing (Belnap and Bishop 2003;Hafez 1993), political speeches (Holes 1993) and television news (Al-Batal 2002) among others, suggest that perhaps MSA is in danger of disappearing or otherwise being subsumed by colloquial Arabic.However, history has shown the register of formal Arabic to be remarkably robust, and the results of this study suggest a mechanism that underlies this robustness.Speakers appear to view MSA as a type of linguistic "other," which is not defined solely on its own intrinsic properties, but rather as that which is not a part of the local vernacular -even if both registers share the same features.That is to say, when both the local dialect and MSA share form X, speakers will often prefer to use an equivalent form Y in MSA, as MSA is largely defined as that which is not the local dialect.11Thus, as the dialects shift and change, speakers play an important role in shifting their definition of formality to maintian MSA's "otherness," which may be one of the properties that makes it formal.Therefore, formal Arabic may be defined not simply by the rules of grammar books, but by the speakers themselves.
The vast reserve of vocabulary and structures present in MSA makes it quite easy to find synonyms which differ from those used in every day speech and thus while regional versions of MSA might differ (see below as well), they all draw on the same linguistic reservoir.It would be unexpected for Egyptians, for example, who do not have the same type of preposition usage as in Levantine dialects, to hypercorrect in the same direction as in Section 3. Nor would they be expected to avoid the word ṣār 'to become' in item 2 or indeed prefer it to ʾaṣbaḥa, as neither of these are used in their local dialects.For speakers from Egypt or the Levant, we might expect the verb baġā 'to desire' to interpreted as quite formal, but in Morocco or the Arabian peninsula, where it is used in the local dialects, it might be seen as overly informal and thus would be avoided in a formal written text.Future research should be conducted with speakers from multiple dialects to see whether these hypotheses hold true.
This avoidance behavior may also explain the regional variation in written formal Arabic shown by Ibrahim and Parkinson, among others (Ibrahim 1997;Parkinson and Ibrahim 1999).In her dissertation, a study of newspaper headlines, Ibrahim found that many Egyptians had difficulty understanding the headlines in Lebanese newspapers due to their use of terminology specific to Lebanese MSA.Ibrahim also relates a story from a Lebanese professor who wrote a letter using the plural form ‫رساميل‬ rasāmīl of the singular raʾs māl 'capital' that was corrected by the Egyptian professor with whom he was corresponding to ‫مال‬ ‫رؤوس‬ ruʾūs māl (p.129), clearly showing that the two speakers had very different ideas of what the proper MSA form is. Ibrahim and Parkinson also show statistical evidence that even writers working for the same newspaper, Al-Ḥayāt, but from different countries, make very different choices of lexical or morphological forms, similar to the results of Wilmsen's study (2010).The well known variation between and within colloquial dialects would lead to differences in speaker avoidance strategies, thus leading to further variation in the terms regularly used in formal MSA.This process will not always be transparentfor example, the data showed that one strategy speakers used to avoid the problem of how to conjugate the verb istamarra 'to continue' was to simply choose a different but synonymous verb, tābaʿa.A researcher looking at the end result of this process would have a difficult time explaining this variation, much less tracing it to an avoidance of colloquial forms.Furthermore, regional variation exists not only in the type of colloquial spoken, but even in the boundaries of what is or is not acceptable.Al-Batal's article on Lebanese news media shows that on the LBCI channel at least, the use of elements of Lebanese Arabic morphology and phonology is not only acceptable but seen as an important indicator of national identity.Ibrahim (1997: 87-89) also showed that the headlines she used from Lebanese newspapers contained a number of colloquial Lebanese expressions, which contrasts with the avoidance behavior shown here by Syrian speakers.In this case, Lebanese speakers simply appear to have different standards of what is acceptable in the formal register of news media than in other Arabic speaking countries.Variation in register also would change what is seen as MSA, such that the definition of formal Arabic clearly would differ between countries.Whatever the boundaries between registers might be, this study shows that an essential part of the definition of formal Arabic is its otherness and difference from a speaker's personal colloquial dialect.
One other noteworthy result of this study is the fact that speakers rejected certain neologisms such as ar-rāʾī for "television" and al-ḥāsūb for "computer" in spite of their Arabic origin.This is revealing on two levelsfirst, it shows that adoption of these neologisms versus foreign loan words is poor, and second, speakers do not appear to judge a word as more standard simply due to having an Arabic root.Simply being derived from a native root is not sufficient to privilege these neologisms, and in the case of talafizyūn, this foreign word is acceptable even in formal contexts.
This study therefore suggests the possibility that the boundaries between colloquial and MSA are largely maintained by the speakers themselves rather than being constrained by a clear prescriptive definition of what constitutes MSA.If further evidence can be found for this phenomenon, it would suggest that the boundaries between the two are amazingly fluid and adaptable, which might explain why diglossia arose and how it has continued to exist for such a long period of time across a vast geographical expanse.
Due to the limited sample size of this study, future research should focus on using similar instruments with larger, international groups of subjects.These instruments will need to include a large number of items which vary in some dialects and not others, to provide confirmation that it is indeed the local native dialect that determines word choice in writing formal MSA.If possible, corpus studies should continue to be used to explore this phenomenon.

Conclusion
This study has presented evidence from a survey which suggests that, to a large degree, speakers define MSA as that which is not a part of the local colloquial.This results in an avoidance of lexical items, derivational and inflection morphemes and syntactic constructions and patterns that are associated with the colloquial dialect.The occurrence of hypercorrection further suggests that speakers are motivated primarily by the need to avoid colloquial more so than the need to conform to MSA norms.Avoidance of colloquial appeared to be the primary factor in speakers' responses on the survey, and while other factors were also influential this suggests that colloquial Arabic is better defined in speakers' minds than is formal Arabic.At the same time, interference from some colloquial structures, and the definition of a small number of structures as being formal suggest that this avoidance behavior is simply one of many competing pressures on speakers within a diglossic language.
This avoidance behavior helps explain the robustness of diglossia, as the definition of Ferguson's H becomes an ever moving target, which makes it difficult for it to be subsumed by the L register.Thus this study gives a much less dire prognosis for Arabic diglossia than have previous works on the subject, but at the same time allows for a greater regional variation in what speakers conceive of as the formal register.

Table 2 : Morphological Distinctions
Further research could test whether masculine plural endings, as opposed to broken plurals, are considered more or less colloquial.