2004
Volume 25, Issue 1
  • ISSN: 1384-5845
  • E-ISSN: 2352-1171

Abstract

Abstract

Belgian Dutch (BD) and Netherlandic Dutch (ND) are known to exhibit phonetic and lexical differences, but national variation in the syntax of Dutch has often been claimed to be quasi non-existent. This view is rooted in the fact that both laypersons and researchers are oblivious to national divergences in the grammar of Dutch (unless they are categorical and/or heavily mediatized), but also in the undisputed belief that BD and ND are different surface manifestations of ‘the same grammatical motor’. As a result, only a few syntactic phenomena have hitherto been shown to be sensitive to national constraints. In this paper we illustrate a computational bottom-up approach (pioneered in Bannard & Callison-Burch 2005) to cast the net as widely as possible. Building on statistical machine translation and a parallel corpus of Dutch translations of English subtitles, we identify plausible mappings between English -grams and their Dutch translations. We do this in order to obtain paraphrases, i.e., stretches of interchangeable Dutch text that carry approximately the same meaning. In a first case study, we found corroborating evidence among the discovered paraphrases for many syntactic variables that have previously been attested in Dutch, including complementizer variation, existential -variation, word order phenomena, and inflection variation. Crucially, we also discovered a number of alternations we had anticipated as interesting variables. In order to detect national constraints on the newly found variables, we carried out a second experiment with a smaller corpus of Belgian and Netherlandic subtitles: the two variables we investigated in this light – deictic strength variation and subordination variation – did indeed manifest national sensitivity.

Loading

Article metrics loading...

/content/journals/10.5117/NEDTAA2020.1.004.GRON
2020-04-01
2022-01-24
Loading full text...

Full text loading...

References

  1. Abraham, Werner & C.Jac Conradie(2001). Präteritumschwund und Diskursgrammatik: Präteritumschwund in gesamteuropäischen Bezügen: Areale Ausbreitung, heterogene Entstehung, Parsing sowie diskursgrammatische Grundlagen und Zusammenhänge. Amsterdam/Philadelphia: John Benjamins.
  2. Ariel, Mira(1990). Accessing noun-phrase antecedents. Londen/New York: Routledge.
  3. Bannard, Colin & ChrisCallison-Burch(2005). Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 597-604.
    [Google Scholar]
  4. Bergen, Geertje van & Peterde Swart.(2010). Scrambling in spoken Dutch: Definiteness versus weight as determinants of word order variation. Corpus Linguistics and Linguistic Theory6, 267-295.
    [Google Scholar]
  5. Bergen, Geertje van(2011). Who's first and what's next: Animacy and word order variation in Dutch language production. Doctorale dissertatie Radboud Universiteit Nijmegen.
  6. Beveren, Amélie Van, TimothyColleman & GertDe Sutter(2018). De om-alternantie: Een verkennende casestudy. Handelingen van de Koninklijke Zuid-Nederlandse Maatschappij voor Taal- en Letterkunde en Geschiedenis LXXI, 191-222.
    [Google Scholar]
  7. Bouma, Gerlof & Helende Hoop(2008). Unscrambled pronouns in Dutch. Linguistic Inquiry39, 669-677.
    [Google Scholar]
  8. Bouma, Gosse(2017). Om-omission. In: MartijnWieling, MartinKroon, Gertjanvan Noord, & GosseBouma (red.), From Semantics semantics to dialectometry: Festschrift in honor of John Nerbonne. Londen: College Publications, 65-74.
    [Google Scholar]
  9. Broekhuis, Hans & Marcelden Dikken(2012). Syntax of Dutch: Nouns and noun phrases, vol. 2. Amsterdam: Amsterdam University Press.
  10. Burrows, John(2002). ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing17(3), 267-287.
    [Google Scholar]
  11. Callison-Burch, Chris(2007). Paraphrasing and translation. Doctorale dissertatie University of Edinburgh.
  12. Colleman, Timothy(2010). Lectal variation in constructional semantics: ‘Benefactive’ ditransitives in Dutch. In: DirkGeeraerts, GitteKristiansen & YvesPeirsman (red.), Advances in Cognitive Sociolinguistics. Berlijn/New York: De Gruyter Mouton, 191-221.
    [Google Scholar]
  13. Daems, Frans(1981). Een eigen syntactische norm in Vlaanderen? Volgordevariatie in werkwoordsgroepen. In: Handelingen van het zesendertigste Nederlandse Filologencongres, 119-126.
    [Google Scholar]
  14. Díaz Cintas, Jorge & AliceRemael(2014). Audiovisual translation: Subtitling. Londen/New York: Routledge.
  15. Eder, Maciej, JanRybicki & MikeKestemont(2016). Stylometry with R: A package for computational text analysis. R Journal8(1), 107-121.
    [Google Scholar]
  16. Fehringer, Carol(2017). Internal constraints on the use of gaan versus zullen as future markers in spoken Dutch. Nederlandse Taalkunde22(3), 359-387.
    [Google Scholar]
  17. Geeraerts, Dirk, StefanGrondelaers & DirkSpeelman(1999). Convergentie en divergentie in de Nederlandse woordenschat: Een onderzoek naar kleding- en voetbaltermen. Amsterdam: Meertens.
  18. Grondelaers, Stefan(2009). Woordvolgorde in presentatieve zinnen en de theoretische basis van multifactoriële grammatica. Nederlandse Taalkunde14, 282-299.
    [Google Scholar]
  19. Grondelaers, Stefan, DirkSpeelman & AnCarbonez(2001). Regionale variatie in de postverbale distributie van presentatief er. Neerlandistiek.nl 01.04. <www.neerlandistiek.nl>
    [Google Scholar]
  20. Grondelaers, Stefan, DirkSpeelman & DirkGeeraerts(2002). Regressing on er: Statistical analysis of texts and language variation. In: AnnieMorin & PascaleSébillot (red.), 6ièmes journées internationales d’analyse statistique des données données. Rennes: Institut National de Recherche en Informatique et en Automatique, 335-346.
    [Google Scholar]
  21. Grondelaers, Stefan, DirkSpeelman & DirkGeeraerts.(2008). National variation in the use of er ‘there’: Regional and diachronic constraints on cognitive explanations. In: GitteKristiansen & RenéDirven (red.), Cognitive Sociolinguistics: Language variation, cultural models, social systems. Berlijn/New York: De Gruyter Mouton, 153-204.
    [Google Scholar]
  22. Grondelaers, Stefan, KatrienDeygers, HildeVan Aken, VickyVan Den Heede & DirkSpeelman(2000). DigiTaal: Het CONDIV-corpus geschreven Nederlands. Nederlandse Taalkunde5(4), 356-363.
    [Google Scholar]
  23. Grondelaers, Stefan, Paulvan Gent & Roelandvan Hout (ter perse). On the inevitability of social meaning and ideology in accounts of syntactic change: Evidence from pronoun competition in Netherlandic Dutch. In: TanyaChristensen & TorbenJuel Jensen (red.), Explanations in sociosyntax: Dialogue across paradigms. Amsterdam/Philadelphia: John Benjamins.
    [Google Scholar]
  24. Gyselinck, Emmeline & TimothyColleman(2016). Je dood vervelen of je te pletter amuseren? Het intensiverende gebruik van de pseudo-reflexieve resultatiefconstructie in hedendaags Belgisch en Nederlands Nederlands. Handelingen van de Koninklijke Zuid-Nederlandse Maatschappij voor Taal- en Letterkunde en Geschiedenis LXX, 103-136.
    [Google Scholar]
  25. Haeseryn, Walter(1990). Syntactische normen in het Nederlands: Een empirisch onderzoek naar volgordevariatie in de werkwoordelijke eindgroep. Doctorale dissertatie Katholieke Universiteit Nijmegen.
  26. Haeseryn, Walter(1996). Grammaticale verschillen tussen het Nederlands in België en het Nederlands in Nederland: Een poging tot inventarisatie. In: Roelandvan Hout & JoepKruijsen (red.), Taalvariaties: Toonzettingen en modulaties op een thema. Dordrecht: Foris Publications, 109-126.
    [Google Scholar]
  27. Haeseryn, Walter, KirstenRomijn, GuidoGeerts, Jaapde Rooij & Maartenvan den Toorn(1997). Algemene Nederlandse Spraakkunst. Tweede, geheel herziene druk. Groningen/Deurne: Martinus Nijhoff/Wolters Plantyn.
  28. Haver, Jozef Van(1989). Noorderman & Zuiderman: Het taalverdriet van Vlaanderen. Tielt: Lannoo.
  29. Hearne, Mary & AndyWay(2011). Statistical machine translation: A guide for linguists and translators. Language and Linguistics Compass5(5), 205-226.
    [Google Scholar]
  30. Hoppenbrouwers, Cor(1991). Nederlanders over het Nederlands van Vlamingen. In: MarysaDemoor (red.), De kracht van het woord: 100 jaar Germaanse filologie aan de RUG (1890-1990). Gent: Studia Germanica Gandensia, 13-38.
    [Google Scholar]
  31. Hosmer, David W. & StanleyLemeshow(2000). Applied logistic regression. Tweede editie. New York: Wiley & Sons.
  32. Jansen, Frank(1987). Omtrent de om-trend. Spektator17, 83-98.
    [Google Scholar]
  33. Keuleers, Emmanuel, MarcBrysbaert & BorisNew(2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods42, 643-650.
    [Google Scholar]
  34. Kirsner, Robert S.(1979). The problem of presentative sentences in Modern Dutch. Amsterdam: North-Holland Publishing Company.
  35. Koehn, Philipp, HieuHoang, AlexandraBirch, ChrisCallison-Burch, MarcelloFederico, NicolaBertoldi, BrookeCowan, WadeShen, ChristineMoran, RichardZens, ChrisDyer, OndřejBojar, AlexandraConstantin & EvanHerbst(2007). Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics companion volume: Proceedings of the demo and poster sessions, 177-180.
    [Google Scholar]
  36. Krawczak, Karolina(2018). Between grammar and semantics: A multivariate account of complement alternation in complex causal adpositions. Lezing gepresenteerd in de workshop Quantitative approaches to constructional variation: Corpus-driven studies of alternations van de 11de International Conference of the Spanish Cognitive Linguistics Association (AELCO), 17-19 oktober 2018.
    [Google Scholar]
  37. Labov, William(1993). The unobservability of structure and its linguistic consequences. Lezing gepresenteerd op NWAV 22, University of Ottawa.
    [Google Scholar]
  38. Lavandera, Beatriz(1978). Where does the sociolinguistic variable stop?Language in Society7(2), 171-182.
    [Google Scholar]
  39. Levon, Erez & IsabelleBuchstaller(2015). Perception, cognition and linguistic structure: The effect of linguistic modularity and cognitive style on sociolinguistic processing. Language Variation and Change27(3), 319-348.
    [Google Scholar]
  40. Levshina, Natalia(2017). Online film subtitles as a corpus: An n-gram approach. Corpora12(3), 311-338.
    [Google Scholar]
  41. Levshina, Natalia, DirkGeeraerts & DirkSpeelman(2013). Towards a 3D-grammar: Interaction of linguistic and extralinguistic factors in the use of Dutch causative constructions. Journal of Pragmatics52, 34-48.
    [Google Scholar]
  42. Lison, Pierre & JörgTiedemann(2016). OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, 923-929.
    [Google Scholar]
  43. Meyerhoff, Miriam & James A.Walker(2013). An existential problem: The sociolinguistic monitor and variation in existential constructions on Bequia (St. Vincent and the Grenadines). Language in Society42, 407-428.
    [Google Scholar]
  44. Mondorf, Britta(2002). Gender differences in English syntax. Journal of English Linguistics30, 158-180.
    [Google Scholar]
  45. New, Boris, MarcBrysbaert, JeanVeronis & ChristophePallier(2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics28, 661-677.
    [Google Scholar]
  46. Och, Franz J. & HermannNey(2003). A systematic comparison of various statistical alignment methods. Computational Linguistics29(1), 19-51.
    [Google Scholar]
  47. Oostdijk, Nelleke(2002). The design of the Spoken Dutch Corpus. In: PamPeters, PeterCollins & Adam S.Cohen (red.), New frontiers of corpus research. Amsterdam/New York: Rodopi, 105-112.
    [Google Scholar]
  48. Oostdijk, Nelleke, MartinReynaert, VéroniqueHoste & InekeSchuurman(2013). The construction of a 500-million-word reference corpus of contemporary written Dutch. In: PeterSpyns & JanOdijk (red.), Essential speech and language technology for Dutch. Heidelberg: Springer, 219-247.
    [Google Scholar]
  49. Pijpops, Dirk & FreekVan de Velde(2018). A multivariate analysis of the partitive genitive in Dutch: Bringing quantitative data into a theoretical discussion. Corpus Linguistics and Linguistic Theory14(1), 99-131.
    [Google Scholar]
  50. Pijpops, Dirk(2019). How, why and where does argument structure vary? A usage-based investigation into the Dutch transitive–prepositional alternation. Doctorale dissertatie Katholieke Universiteit Leuven.
  51. Poplack, Shana(2015). Pursuing symmetry by eradicating variability. Lezing gepresenteerd op NWAV 44, University of Toronto.
    [Google Scholar]
  52. Prieels, Lynn & GertDe Sutter(2018). A mixed-method approach to the use of Colloquial Belgian Dutch in intralingual subtitling on Flemish television: Further evidence for the gradual acceptance of tussentaal. Taal en Tongval70(2), 211-256.
    [Google Scholar]
  53. Romaine, Suzanne(1984). On the problem of syntactic variation and pragmatic meaning in sociolinguistic theory. Folia Linguistica18, 409-439.
    [Google Scholar]
  54. Spärck Jones, Karen(2007). Computational linguistics: What about the linguistics?Computational Linguistics33(3), 437-441.
    [Google Scholar]
  55. Speelman, Dirk & DirkGeeraerts.(2009). Causes for causatives: The case of Dutch doen and laten. In: TedSanders & EveSweetser (red.), Causal categories in discourse and cognition. Berlijn/New York: De Gruyter Mouton, 173-204.
    [Google Scholar]
  56. Speelman, Dirk, StefanGrondelaers & DirkGeeraerts(2003). Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities37, 317-337.
    [Google Scholar]
  57. Sutter, Gert De, DirkSpeelman & DirkGeeraerts.(2005). Regionale en stilistische effecten op de woordvolgorde in werkwoordelijke eindgroepen. Nederlandse Taalkunde10, 97-128.
    [Google Scholar]
  58. Taeldeman, Johan(1992). Welk Nederlands voor Vlamingen?Nederlands van Nu40(2), 33-51.
    [Google Scholar]
  59. Tiedemann, Jörg(2012). Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, 2214-2218.
    [Google Scholar]
  60. Tummers, Jose(2005). Het naakt(e) adjectief. Kwantitatief-empirisch onderzoek naar de adjectivische buigingsalternantie bij neutra. Doctorale dissertatie Katholieke Universiteit Leuven.
  61. Velde, Marc van de(1983). De volgorde binnen de drieledige werkwoordgroep. In: BrunoCallebaut (red.), Linguïstische en socio-culturele aspecten van het taalonderwijs. Gent: Faculteit Letteren en Wijsbegeerte, 139-148.
    [Google Scholar]
  62. Vogels, Jorrig & Geertjevan Bergen(2017). Where to place inaccessible subjects in Dutch: The role of definiteness and animacy. Corpus Linguistics and Linguistic Theory13(2), 369-398.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.5117/NEDTAA2020.1.004.GRON
Loading
/content/journals/10.5117/NEDTAA2020.1.004.GRON
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error