Vissen naar variatie: Digitaal op zoek naar onbekende Noord/Zuid-verschillen in de grammatica van het Nederlands

Stefan Grondelaers; Robbert De Troij; Dirk Speelman; Antal van den Bosch

doi:10.5117/NEDTAA2020.1.004.GRON

ISSN: 1384-5845
E-ISSN: 2352-1171

oa Vissen naar variatie

Digitaal op zoek naar onbekende Noord/Zuid-verschillen in de grammatica van het Nederlands
By Stefan Grondelaers, Robbert De Troij, Dirk Speelman & Antal van den Bosch
Publisher: Amsterdam University Press
Source: Nederlandse Taalkunde, Volume 25, Issue 1, Apr 2020, p. 73 - 99
DOI: https://doi.org/10.5117/NEDTAA2020.1.004.GRON
Language: English
- Published online: 01 Apr 2020

Abstract

Belgian Dutch (BD) and Netherlandic Dutch (ND) are known to exhibit phonetic and lexical differences, but national variation in the syntax of Dutch has often been claimed to be quasi non-existent. This view is rooted in the fact that both laypersons and researchers are oblivious to national divergences in the grammar of Dutch (unless they are categorical and/or heavily mediatized), but also in the undisputed belief that BD and ND are different surface manifestations of ‘the same grammatical motor’. As a result, only a few syntactic phenomena have hitherto been shown to be sensitive to national constraints. In this paper we illustrate a computational bottom-up approach (pioneered in Bannard & Callison-Burch 2005) to cast the net as widely as possible. Building on statistical machine translation and a parallel corpus of Dutch translations of English subtitles, we identify plausible mappings between English n-grams and their Dutch translations. We do this in order to obtain paraphrases, i.e., stretches of interchangeable Dutch text that carry approximately the same meaning. In a first case study, we found corroborating evidence among the discovered paraphrases for many syntactic variables that have previously been attested in Dutch, including complementizer variation, existential er-variation, word order phenomena, and inflection variation. Crucially, we also discovered a number of alternations we had not anticipated as interesting variables. In order to detect national constraints on the newly found variables, we carried out a second experiment with a smaller corpus of Belgian and Netherlandic subtitles: the two variables we investigated in this light – deictic strength variation and subordination variation – did indeed manifest national sensitivity.

Article metrics loading...

/content/journals/10.5117/NEDTAA2020.1.004.GRON

2020-04-01

2025-06-27

The full text of this item is not currently available.

References

Abraham, Werner & C.Jac Conradie(2001). Präteritumschwund und Diskursgrammatik: Präteritumschwund in gesamteuropäischen Bezügen: Areale Ausbreitung, heterogene Entstehung, Parsing sowie diskursgrammatische Grundlagen und Zusammenhänge. Amsterdam/Philadelphia: John Benjamins.
[Google Scholar]
Ariel, Mira(1990). Accessing noun-phrase antecedents. Londen/New York: Routledge.
[Google Scholar]
Bannard, Colin & ChrisCallison-Burch(2005). Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 597-604.
[Google Scholar]
Bergen, Geertje van & Peterde Swart.(2010). Scrambling in spoken Dutch: Definiteness versus weight as determinants of word order variation. Corpus Linguistics and Linguistic Theory6, 267-295.
[Google Scholar]
Bergen, Geertje van(2011). Who's first and what's next: Animacy and word order variation in Dutch language production. Doctorale dissertatie Radboud Universiteit Nijmegen.
[Google Scholar]
Beveren, Amélie Van, TimothyColleman & GertDe Sutter(2018). De om-alternantie: Een verkennende casestudy. Handelingen van de Koninklijke Zuid-Nederlandse Maatschappij voor Taal- en Letterkunde en Geschiedenis LXXI, 191-222.
[Google Scholar]
Bouma, Gerlof & Helende Hoop(2008). Unscrambled pronouns in Dutch. Linguistic Inquiry39, 669-677.
[Google Scholar]
Bouma, Gosse(2017). Om-omission. In: MartijnWieling, MartinKroon, Gertjanvan Noord, & GosseBouma (red.), From Semantics semantics to dialectometry: Festschrift in honor of John Nerbonne. Londen: College Publications, 65-74.
[Google Scholar]
Broekhuis, Hans & Marcelden Dikken(2012). Syntax of Dutch: Nouns and noun phrases, vol. 2. Amsterdam: Amsterdam University Press.
[Google Scholar]
Burrows, John(2002). ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing17(3), 267-287.
[Google Scholar]
Callison-Burch, Chris(2007). Paraphrasing and translation. Doctorale dissertatie University of Edinburgh.
[Google Scholar]
Colleman, Timothy(2010). Lectal variation in constructional semantics: ‘Benefactive’ ditransitives in Dutch. In: DirkGeeraerts, GitteKristiansen & YvesPeirsman (red.), Advances in Cognitive Sociolinguistics. Berlijn/New York: De Gruyter Mouton, 191-221.
[Google Scholar]
Daems, Frans(1981). Een eigen syntactische norm in Vlaanderen? Volgordevariatie in werkwoordsgroepen. In: Handelingen van het zesendertigste Nederlandse Filologencongres, 119-126.
[Google Scholar]
Díaz Cintas, Jorge & AliceRemael(2014). Audiovisual translation: Subtitling. Londen/New York: Routledge.
[Google Scholar]
Eder, Maciej, JanRybicki & MikeKestemont(2016). Stylometry with R: A package for computational text analysis. R Journal8(1), 107-121.
[Google Scholar]
Fehringer, Carol(2017). Internal constraints on the use of gaan versus zullen as future markers in spoken Dutch. Nederlandse Taalkunde22(3), 359-387.
[Google Scholar]
Geeraerts, Dirk, StefanGrondelaers & DirkSpeelman(1999). Convergentie en divergentie in de Nederlandse woordenschat: Een onderzoek naar kleding- en voetbaltermen. Amsterdam: Meertens.
[Google Scholar]
Grondelaers, Stefan(2009). Woordvolgorde in presentatieve zinnen en de theoretische basis van multifactoriële grammatica. Nederlandse Taalkunde14, 282-299.
[Google Scholar]
Grondelaers, Stefan, DirkSpeelman & AnCarbonez(2001). Regionale variatie in de postverbale distributie van presentatief er. Neerlandistiek.nl 01.04. <www.neerlandistiek.nl>
[Google Scholar]
Grondelaers, Stefan, DirkSpeelman & DirkGeeraerts(2002). Regressing on er: Statistical analysis of texts and language variation. In: AnnieMorin & PascaleSébillot (red.), 6ièmes journées internationales d’analyse statistique des données données. Rennes: Institut National de Recherche en Informatique et en Automatique, 335-346.
[Google Scholar]
Grondelaers, Stefan, DirkSpeelman & DirkGeeraerts.(2008). National variation in the use of er ‘there’: Regional and diachronic constraints on cognitive explanations. In: GitteKristiansen & RenéDirven (red.), Cognitive Sociolinguistics: Language variation, cultural models, social systems. Berlijn/New York: De Gruyter Mouton, 153-204.
[Google Scholar]
Grondelaers, Stefan, KatrienDeygers, HildeVan Aken, VickyVan Den Heede & DirkSpeelman(2000). DigiTaal: Het CONDIV-corpus geschreven Nederlands. Nederlandse Taalkunde5(4), 356-363.
[Google Scholar]
Grondelaers, Stefan, Paulvan Gent & Roelandvan Hout (ter perse). On the inevitability of social meaning and ideology in accounts of syntactic change: Evidence from pronoun competition in Netherlandic Dutch. In: TanyaChristensen & TorbenJuel Jensen (red.), Explanations in sociosyntax: Dialogue across paradigms. Amsterdam/Philadelphia: John Benjamins.
[Google Scholar]
Gyselinck, Emmeline & TimothyColleman(2016). Je dood vervelen of je te pletter amuseren? Het intensiverende gebruik van de pseudo-reflexieve resultatiefconstructie in hedendaags Belgisch en Nederlands Nederlands. Handelingen van de Koninklijke Zuid-Nederlandse Maatschappij voor Taal- en Letterkunde en Geschiedenis LXX, 103-136.
[Google Scholar]
Haeseryn, Walter(1990). Syntactische normen in het Nederlands: Een empirisch onderzoek naar volgordevariatie in de werkwoordelijke eindgroep. Doctorale dissertatie Katholieke Universiteit Nijmegen.
[Google Scholar]
Haeseryn, Walter(1996). Grammaticale verschillen tussen het Nederlands in België en het Nederlands in Nederland: Een poging tot inventarisatie. In: Roelandvan Hout & JoepKruijsen (red.), Taalvariaties: Toonzettingen en modulaties op een thema. Dordrecht: Foris Publications, 109-126.
[Google Scholar]
Haeseryn, Walter, KirstenRomijn, GuidoGeerts, Jaapde Rooij & Maartenvan den Toorn(1997). Algemene Nederlandse Spraakkunst. Tweede, geheel herziene druk. Groningen/Deurne: Martinus Nijhoff/Wolters Plantyn.
[Google Scholar]
Haver, Jozef Van(1989). Noorderman & Zuiderman: Het taalverdriet van Vlaanderen. Tielt: Lannoo.
[Google Scholar]
Hearne, Mary & AndyWay(2011). Statistical machine translation: A guide for linguists and translators. Language and Linguistics Compass5(5), 205-226.
[Google Scholar]
Hoppenbrouwers, Cor(1991). Nederlanders over het Nederlands van Vlamingen. In: MarysaDemoor (red.), De kracht van het woord: 100 jaar Germaanse filologie aan de RUG (1890-1990). Gent: Studia Germanica Gandensia, 13-38.
[Google Scholar]
Hosmer, David W. & StanleyLemeshow(2000). Applied logistic regression. Tweede editie. New York: Wiley & Sons.
[Google Scholar]
Jansen, Frank(1987). Omtrent de om-trend. Spektator17, 83-98.
[Google Scholar]
Keuleers, Emmanuel, MarcBrysbaert & BorisNew(2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods42, 643-650.
[Google Scholar]
Kirsner, Robert S.(1979). The problem of presentative sentences in Modern Dutch. Amsterdam: North-Holland Publishing Company.
[Google Scholar]
Koehn, Philipp, HieuHoang, AlexandraBirch, ChrisCallison-Burch, MarcelloFederico, NicolaBertoldi, BrookeCowan, WadeShen, ChristineMoran, RichardZens, ChrisDyer, OndřejBojar, AlexandraConstantin & EvanHerbst(2007). Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics companion volume: Proceedings of the demo and poster sessions, 177-180.
[Google Scholar]
Krawczak, Karolina(2018). Between grammar and semantics: A multivariate account of complement alternation in complex causal adpositions. Lezing gepresenteerd in de workshop Quantitative approaches to constructional variation: Corpus-driven studies of alternations van de 11de International Conference of the Spanish Cognitive Linguistics Association (AELCO), 17-19 oktober 2018.
[Google Scholar]
Labov, William(1993). The unobservability of structure and its linguistic consequences. Lezing gepresenteerd op NWAV 22, University of Ottawa.
[Google Scholar]
Lavandera, Beatriz(1978). Where does the sociolinguistic variable stop?Language in Society7(2), 171-182.
[Google Scholar]
Levon, Erez & IsabelleBuchstaller(2015). Perception, cognition and linguistic structure: The effect of linguistic modularity and cognitive style on sociolinguistic processing. Language Variation and Change27(3), 319-348.
[Google Scholar]
Levshina, Natalia(2017). Online film subtitles as a corpus: An n-gram approach. Corpora12(3), 311-338.
[Google Scholar]
Levshina, Natalia, DirkGeeraerts & DirkSpeelman(2013). Towards a 3D-grammar: Interaction of linguistic and extralinguistic factors in the use of Dutch causative constructions. Journal of Pragmatics52, 34-48.
[Google Scholar]
Lison, Pierre & JörgTiedemann(2016). OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, 923-929.
[Google Scholar]
Meyerhoff, Miriam & James A.Walker(2013). An existential problem: The sociolinguistic monitor and variation in existential constructions on Bequia (St. Vincent and the Grenadines). Language in Society42, 407-428.
[Google Scholar]
Mondorf, Britta(2002). Gender differences in English syntax. Journal of English Linguistics30, 158-180.
[Google Scholar]
New, Boris, MarcBrysbaert, JeanVeronis & ChristophePallier(2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics28, 661-677.
[Google Scholar]
Och, Franz J. & HermannNey(2003). A systematic comparison of various statistical alignment methods. Computational Linguistics29(1), 19-51.
[Google Scholar]
Oostdijk, Nelleke(2002). The design of the Spoken Dutch Corpus. In: PamPeters, PeterCollins & Adam S.Cohen (red.), New frontiers of corpus research. Amsterdam/New York: Rodopi, 105-112.
[Google Scholar]
Oostdijk, Nelleke, MartinReynaert, VéroniqueHoste & InekeSchuurman(2013). The construction of a 500-million-word reference corpus of contemporary written Dutch. In: PeterSpyns & JanOdijk (red.), Essential speech and language technology for Dutch. Heidelberg: Springer, 219-247.
[Google Scholar]
Pijpops, Dirk & FreekVan de Velde(2018). A multivariate analysis of the partitive genitive in Dutch: Bringing quantitative data into a theoretical discussion. Corpus Linguistics and Linguistic Theory14(1), 99-131.
[Google Scholar]
Pijpops, Dirk(2019). How, why and where does argument structure vary? A usage-based investigation into the Dutch transitive–prepositional alternation. Doctorale dissertatie Katholieke Universiteit Leuven.
[Google Scholar]
Poplack, Shana(2015). Pursuing symmetry by eradicating variability. Lezing gepresenteerd op NWAV 44, University of Toronto.
[Google Scholar]
Prieels, Lynn & GertDe Sutter(2018). A mixed-method approach to the use of Colloquial Belgian Dutch in intralingual subtitling on Flemish television: Further evidence for the gradual acceptance of tussentaal. Taal en Tongval70(2), 211-256.
[Google Scholar]
Romaine, Suzanne(1984). On the problem of syntactic variation and pragmatic meaning in sociolinguistic theory. Folia Linguistica18, 409-439.
[Google Scholar]
Spärck Jones, Karen(2007). Computational linguistics: What about the linguistics?Computational Linguistics33(3), 437-441.
[Google Scholar]
Speelman, Dirk & DirkGeeraerts.(2009). Causes for causatives: The case of Dutch doen and laten. In: TedSanders & EveSweetser (red.), Causal categories in discourse and cognition. Berlijn/New York: De Gruyter Mouton, 173-204.
[Google Scholar]
Speelman, Dirk, StefanGrondelaers & DirkGeeraerts(2003). Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities37, 317-337.
[Google Scholar]
Sutter, Gert De, DirkSpeelman & DirkGeeraerts.(2005). Regionale en stilistische effecten op de woordvolgorde in werkwoordelijke eindgroepen. Nederlandse Taalkunde10, 97-128.
[Google Scholar]
Taeldeman, Johan(1992). Welk Nederlands voor Vlamingen?Nederlands van Nu40(2), 33-51.
[Google Scholar]
Tiedemann, Jörg(2012). Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, 2214-2218.
[Google Scholar]
Tummers, Jose(2005). Het naakt(e) adjectief. Kwantitatief-empirisch onderzoek naar de adjectivische buigingsalternantie bij neutra. Doctorale dissertatie Katholieke Universiteit Leuven.
[Google Scholar]
Velde, Marc van de(1983). De volgorde binnen de drieledige werkwoordgroep. In: BrunoCallebaut (red.), Linguïstische en socio-culturele aspecten van het taalonderwijs. Gent: Faculteit Letteren en Wijsbegeerte, 139-148.
[Google Scholar]
Vogels, Jorrig & Geertjevan Bergen(2017). Where to place inaccessible subjects in Dutch: The role of definiteness and animacy. Corpus Linguistics and Linguistic Theory13(2), 369-398.
[Google Scholar]

/content/journals/10.5117/NEDTAA2020.1.004.GRON

Vissen naar variatie

NedTaal 25, 73 (2020); https://doi.org/10.5117/NEDTAA2020.1.004.GRON

/content/journals/10.5117/NEDTAA2020.1.004.GRON

Data & Media loading...

Article Type: Research Article

Keyword(s): computational linguistics; machine translation; national variation; subtitles; Syntactic variation

oa Vissen naar variatie

Digitaal op zoek naar onbekende Noord/Zuid-verschillen in de grammatica van het Nederlands

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Leve hun! Waarom hun nog steeds hun zeggen

Tussentaal wordt omgangstaal in Vlaanderen

Expressive markers in online teenage talk

Understanding grammar at the community level requires a diachronic perspective

Language-specific tendencies towards morphological or syntactic constructions

Goed of fout

Feiten en fictie - Taalvariatie in Vlaamse televisiereeksen vroeger en nu

Perceptie van tussentaal in het gesproken Nederlands in Vlaanderen

Connectieven in de rechterperiferie - Een contrastieve analyse van dus en donc in gesproken taal

Expeditie Tussentaal - Leeftijd, identiteit en context in “Expeditie Robinson”