Big Pimpin’. Een big data-benadering van de verspreiding van het leenwoord pimpen in het Nederlands

Dirk Pijpops; Stefano De Pascale; Freek Van de Velde; Eline Zenner

doi:10.5117/TET2023.1.005.PIJP

ISSN: 0039-8691
E-ISSN: 2215-1214

oa Big Pimpin’. Een big data-benadering van de verspreiding van het leenwoord pimpen in het Nederlands
Authors: Dirk Pijpops¹, Stefano De Pascale², Freek Van de Velde³ & Eline Zenner⁴
View Affiliations Hide Affiliations

¹ Lilith, Faculté de Philosophie et Lettres, Université de Liège ² FWO Vlaanderen, Vrije Universiteit Brussel, Brussel en Quantitative Lexicology and Variational Linguistics (QLVL), KU Leuven ³ Quantitative Lexicology and Variational Linguistics (QLVL), KU Leuven ⁴ Quantitative Lexicology and Variational Linguistics (QLVL), KU Leuven
Publisher: Amsterdam University Press
Source: Taal en Tongval, Volume 75, Issue 1, Sep 2023, p. 73 - 113
DOI: https://doi.org/10.5117/TET2023.1.005.PIJP
Language: Dutch
- Published online: 01 Sep 2023

Abstract

This article illustrates some of the opportunities and challenges of pursuing a big data approach in linguistic research. To do so, we investigate the diffusion of the loan verb pimpen ‘to fancify’ in Dutch based on Twitter data. First, we focus on the derivations of the verb (e.g.: terugpimpen ‘to pimp back’, herpimpen ‘to repimp’, etc.) and plot the diversity of these forms through time, using the Chao-Wang-Jost estimation of Shannon entropy. We follow this up with an alternation study that compares pimpen not only to its ‘native’ alternative opleuken, but also its most frequent derivation oppimpen, using multinomial regression. It is found that, while pimpen’s early expansion in Dutch has proceeded at breakneck speed, resulting e.g. in a plethora of derivations that has so far gone undetected, its current momentum seems to be waning.

© Dirk Pijpops, Stefano De Pascale, Freek Van de Velde & Eline Zenner. This is an open access article distributed under the terms of the CC BY-NC-ND 4.0 license. https://creativecommons.org/licenses/by-nc-nd/4.0/

Article metrics loading...

/content/journals/10.5117/TET2023.1.005.PIJP

2023-09-01

2025-06-01

The full text of this item is not currently available.

References

Baayen, R. Harald. 1996. “The effects of lexical specialization on the growth curve of the vocabulary.”Computational Linguistics22 (4): 455–480.
[Google Scholar]
Baayen, R. Harald. 2001. Word frequency distributions. Dordrecht: Springer Science en Business Media.
[Google Scholar]
Baayen, R. Harald. 2009. “Corpus linguistics in morphology: Morphological productivity.” In Corpus Linguistics. An International Handbook. Volume 2, geredigeerd door AnkeLüdeling en MerjaKytö, 899–919. Berlijn/New York: De Gruyter Mouton.
[Google Scholar]
Baayen, R. Harald, Jacolienvan Rij, Cecilede Cat, en SimonWood. 2018. “Autocorrelated Errors in Experimental Data in the Language Sciences: Some Solutions Offered by Generalized Additive Mixed Models” In Mixed-Effects Regression Models in Linguistics, geredigeerd door DirkSpeelman, KrisHeylen, en DirkGeeraerts, 49–69. Cham: Springer International Publishing.
[Google Scholar]
Blythe, Richard A., en WilliamCroft. 2012. “S-curves and the mechanisms of propagation in language change.”Language88 (2): 269–304.
[Google Scholar]
Box, George, en DavidCox. 1964. “An Analysis of Transformations.”Journal of the Royal Statistical Society. Series B (Methodological)26 (2): 211–252.
[Google Scholar]
Brezina, Vaclav. 2018. Statistics in corpus linguistics: A practical guide. Cambridge: Cambridge University Press.
[Google Scholar]
Chao, Anne, Peter A.Henderson, Chun-HuoChiu, FayeMoyes, Kai-HsiangHu, MariaDornelas, en Anne E.Magurran. 2021. “Measuring temporal change in alpha diversity: A framework integrating taxonomic, phylogenetic and functional diversity and the iNEXT. 3D standardization.”Methods in Ecology and Evolution12 (10): 1926–1940.
[Google Scholar]
Chao, Anne, Y.T.Wang, en LouJost. 2013. “Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species.”Methods in Ecology and Evolution4 (11): 1091–1100.
[Google Scholar]
Cohen, Jacob. 1988. Statistical power analysis for the behavioral science2e editie. Hillsdale: Lawrence Erlbaum Associates.
[Google Scholar]
Corpus Hedendaags Nederlands – CHN (Versie 3.0). Beschikbaar op het Instituut voor de Nederlandse Taal: http://hdl.handle.net/10032/tm-a2-s8. Geraadpleegd op 8/08/2022.
De Pascale, Stefano, DirkPijpops, FreekVan de Velde, en ElineZenner. “Reassembling the Pimped Ride: A Quantitative Look at the Integration of a Borrowed Expression.”Frontiers in Communication7: 777312.
[Google Scholar]
Desagulier, Guillaume. 2018. Corpus Linguistics and Statistics with R. Introduction to Quantitative Methods in Linguistics. Dordrecht: Springer International Publishing.
[Google Scholar]
Divjak, Dagmar, NataliaLevshina, en JaneKlavan. 2016. “Cognitive Linguistics: Looking back, looking forward.”Cognitive Linguistics27 (4): 447–463.
[Google Scholar]
Fahy, Matthew, JesseEgbert, BenediktSzmrecsanyi, en DouglasBiber. 2022. “Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables: Three-Way Genitive Variation in English.”Data and Methods in Corpus Linguistics: comparative approaches, 194–223. Cambridge/New York: Cambridge University Press.
[Google Scholar]
Fox, John, SanfordWeisberg, MichaelFriendly, JangmanHong, RobertAndersen, DavidFirth en SteveTaylor. 2016. “Effect Displays for Linear, Generalized Linear, and Other Models.”R package version 3.2.
[Google Scholar]
Gale, William A., en GeoffreySampson. 1995. “Good-turing frequency estimation without tears.”Journal of quantitative linguistics2 (3): 217–237.
[Google Scholar]
Geeraerts, Dirk. 2006. “Methodology in Cognitive Linguistics.” In Cognitive Linguistics: Current Applications and Future Perspectives, geredigeerd door GitteKristiansen, MichelAchard, RenéDirven, and FranciscoRuiz de Mendoza Ibañez, 21–49. Berlijn/New York: Mouton de Gruyter.
[Google Scholar]
Good, Irving J.1953. “The population frequencies of species and the estimation of population parameters.”Biometrika40 (3-4): 237–264.
[Google Scholar]
Gold, David L.1985. “Nouns Ending in -Mobile.”American Speech60 (4): 362–366.
[Google Scholar]
Gotelli, Nicholas J., en AnneChao. 2013. “Measuring and estimating species richness, species diversity, and biotic similarity from sampling data.” In Encyclopedia of Biodiversity, geredigeerd door Simon A.Levin, 195–211. 2e editie. Waltham, MA: Academic Press.
[Google Scholar]
Gries, Stefan Th.2009. Statistics for linguistics with R. A practical introduction. 1e editie. Berlijn: De Gruyter.
[Google Scholar]
Gries, Stefan Th.2013. Statistics for linguistics with R. A practical introduction. 2e editie. Berlijn: De Gruyter.
[Google Scholar]
Gries, Stefan Th.2021. Statistics for linguistics with R. A practical introduction. 3e editie. Berlijn: De Gruyter Mouton.
[Google Scholar]
Grondelaers, Stefan, DirkSpeelman, en DirkGeeraerts. 2008. “National variation in the use of er ‘there’. Regional and diachronic constraints on cognitive explanations.” In Cognitive Sociolinguistics. Language Variation, Cultural Models, Social Systems, geredigeerd door GitteKristiansen, en RenéDirven, 153–203. Berlijn/New York: Mouton de Gruyter.
[Google Scholar]
Grondelaers, Stefan, EstherVeerbeek, RoelandVan Hout, en AstraeaBlonk. 2021. “What happened to Twitter when adolescents left? The “Great Exodus” and its consequences for social media-based research on syntactic diffusion.” (Paper gepresenteerd tijdens New Ways of Analyzing Variation49 (NWAV49), University of Texas in Austin).
[Google Scholar]
Hand, David J., en Robert J.Till. 2001. “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.”Machine learning45 (2): 171–186.
[Google Scholar]
Haspelmath, Martin, Matthew S.Dryer, DavidGil, en BernardComrie. 2005. The world atlas of language structures. Oxford: Oxford Univerisity Press.
[Google Scholar]
Heidbuchel, Hendrik J.P.1962. ABN Woordenboek. Hasselt: Heideland.
[Google Scholar]
Hilton, Nanna H., en AdrianLeemann. 2021. “Editorial: using smartphones to collect linguistic data.”Linguistics Vanguard7, s1: 20200132.
[Google Scholar]
Jakubiček, Miloš, AdamKilgarriff, VojtěchKovár, PavelRychly, en VítSuchomel. 2013. “The TenTen corpus family.” In Proceedings of the 7th International Corpus Linguistics Conference CL, 125–127. Lancaster: Lancaster University
[Google Scholar]
Janda, Laura A., ed. 2013. Cognitive Linguistics – The Quantitative Turn. The Essential Reader. Berlijn, Boston: De Gruyter Mouton.
[Google Scholar]
Kestemont, Mike, en DirkVan Hulle, eds. 2019. “Theorie en de digitale geesteswetenschappen: Ten geleide.”Tijdschrift voor Nederlandse Taal-en Letterkunde. 135 (4)
[Google Scholar]
Kestemont, Mike, FolgertKarsdorp, Elisabethde Bruijn, MatthewDriscoll, Katarzyna A.Kapitan, Pádraig ÓMacháin, DanielSawyer, et al. 2022. “Forgotten books: The application of unseen species models to the survival of culture.”Science. 375 (6582): 765–769.
[Google Scholar]
Klein, Richard A., Kate A.Ratliff, MichelangeloVianello, Reginald B.Adams Jr, ŠtěpánBahnik, Michael J.Bernstein, KonradBocian, et al. 2014. “Investigating variation in replicability.”Social psychology45 (3): 142-152.
[Google Scholar]
Klein, Richard A., MichelangeloVianello, FredHasselman, Byron G.Adams, Reginal B.Adams Jr., SinanAlper, MarkAveyard, et al. 2018. “Many Labs 2: Investigating variation in replicability across samples and settings.”Advances in Methods and Practices in Psychological Science1 (4): 443–490.
[Google Scholar]
Koplenig, Alexander. 2017. “Why the quantitative analysis of diachronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions.”Digital Scholarship in the Humanities. 32 (1): 159–168.
[Google Scholar]
Koplenig, Alexander, en CarolinMüller-Spitzer. 2016. “Population size predicts lexical diversity, but so does the mean sea level--why it is important to correctly account for the structure of temporal data.”PloS One11 (3): e0150771.
[Google Scholar]
L’Heureux, Alexandra, KatarinaGrolinger, Hany F.Elyamany, en Miriam A.M.Capretz. 2017. “Machine Learning with Big Data: Challenges and Approaches.”IEEE Access. 5: 7776–7797.
[Google Scholar]
Labov, William. 1969. Contraction, Deletion, and Inherent Variability of the English Copula. Language: Journal of the Linguistic Society of America. 45.4: 715–762.
[Google Scholar]
Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.
[Google Scholar]
Labov, William. 2001. Principles of linguistic change, vol. 2: Social factors. Oxford: Blackwell.
[Google Scholar]
Levshina, Natalia. 2015. How to do linguistics with R. Amsterdam: John Benjamins.
[Google Scholar]
MacWhinney, Brian. 2000. The CHILDES project: Tools for analyzing talk: Transcription format and programs, Vol. 1, 3e editie. Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
[Google Scholar]
Manybabies Consortium. 2020. “Quantifying sources of variability in infancy research using the infant-directed-speech preference.”Advances in Methods and Practices in Psychological Science3 (1): 24–52.
[Google Scholar]
Moscoso del Prado Martin, Fermín. 2014. “Grammatical change begins within the word: Causal modeling of the co-evolution of Icelandic morphology and syntax.” In Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 36.
[Google Scholar]
Moscoso del Prado Martin, Fermín. 2015. “Vocabulary, grammar, sex, and aging.”Cognitive Science41 (4): 950–975.
[Google Scholar]
Oostdijk, Nelleke, MartinReynaert, VéroniqueHoste, en InekeSchuurman. 2013. “The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch.” In Essential Speech and Language Technology for Dutch, Theory and Applications of Natural Language Processing, geredigeerd door PeterSpyns, en JanOdijk, 219–247. Heidelberg: Springer.
[Google Scholar]
Pijpops, Dirk. 2020. “What is an alternation? Six answers.”Belgian Journal of Linguistics34: 283–294.
[Google Scholar]
Pijpops, Dirk, DirkSpeelman, en Antalvan den Bosch. Te verschijnen. “Generating hypotheses for alternations at low and intermediate levels of schematicity. The use of Memory-Based Learning.”Linguistics Vanguard.
[Google Scholar]
Mayer-Schönberger, Viktor, en KennethCukier. 2013. Big data: A revolution that will transform how we live, work, and think. Boston/New York: Houghton Mifflin Harcourt.
[Google Scholar]
Piantadosi, Steven T., HarryTily, en EdwardGibson. 2012. “The communicative function of ambiguity in language.”Cognition122 (3): 280–291.
[Google Scholar]
Pierrehumbert, Janet, en RamonGranell. 2018. “On Hapax Legomena and Morphological Productivity.” In Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, 125–130. Brussel: Association for Computational Linguistics.
[Google Scholar]
R Core Team. 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Wenen.
[Google Scholar]
Shannon, Claude E.1948. “A Mathematical Theory of Communication.”Bell System Technical Journal27 (3): 379–423.
[Google Scholar]
Rosemeyer, Malte, en FreekVan de Velde. 2021. “On cause and correlation in language change. Word order and clefting in Brazilian Portuguese.”Language Dynamics and Change11 (1): 130–166.
[Google Scholar]
Sharp, Harriet. 2001. English in spoken Swedish: a corpus study of two discourse domains. Stockholm: Almqvist en Wiksell International.
[Google Scholar]
Sönning, Lulas, en ValentinWerner. 2021. “The replication crisis, scientific revolutions, and linguistics.”Linguistics59 (5): 1179–1206.
[Google Scholar]
Speelman, Dirk. 2014. “Logistic regression: A confirmatory technique for comparisons in corpus linguistics.” In Corpus Methods for Semantics: Quantitative studies in polysemy and synonymy, geredigeerd door DylanGlynn, en Justyna A.Robinson, 487–533. Amsterdam: John Benjamins.
[Google Scholar]
Stoll, Sabine, BalthazarBickel, ElenaLieven, NetraPaudyal, GomaBanjade, Toya N.Bhatta, MartinGaenszle, et al. 2012. “Nouns and verbs in Chintang: children’s usage and surrounding adult speech.”Journal of Child Language39 (2): 284–321.
[Google Scholar]
Szmrecsanyi, Benedikt, DouglasBiber, JesseEgbert, en KarlienFranco. 2016. “Toward more accountability: Modeling ternary genitive variation in Late Modern English.”Language Variation and Change28 (1): 1–29.
[Google Scholar]
Tagliamonte, Sali A.2012. Variationist sociolinguistics: change, observation, interpretation. Chichester: Wiley-Blackwell.
[Google Scholar]
Turpijn, Loes, SamanthaKneefel en Neilder Veer. 2015. Nationale social media onderzoek 2015. Amsterdam: Newcom Research en Consultancy.
[Google Scholar]
van der Sijs, Nicoline. 2020. In hoeverre houden geëmigreerde Nederlanders en Vlamingen in de eenentwintigste eeuw vast aan de Nederlandse taal en cultuur? Internationale Neerlandistiek. 58.1: 5–21.
[Google Scholar]
Van de Velde, Freek en Eline Zenner. 2010. “Pimp my Lexis: het nut van corpusonderzoek in normatief taaladvies.” In Liever meer of juist minder? Over normen en variatie in taal geredigeerd door ElsHendrickx, KarlHendrickx, WillyMartin, HansSmessaert, WilliamVan Belle, en Joopvan der Horst, 51–68. Gent: Academia Press.
[Google Scholar]
Van de Velde, Freek, KarlienFranco en DirkGeeraerts. 2019. “Reality check voor de kwantitatieve Nederlandse taalkunde: laveren tussen de Scylla van het conservatisme en de Charybdis van de zelfgenoegzaamheid.”Tijdschrift voor Nederlandse Taal- en Letterkunde135 (4): 329–343.
[Google Scholar]
Van de Velde, Freek en PeterPetré. 2020. “Historical linguistics.” In The Routledge handbook of English language and digital humanities, geredigeerd door SvenjaAdolphs en DawnKnight, 328–359. Londen: Routledge.
[Google Scholar]
Van de Velde, Freek en Joopvan der Horst. 2021. “De taalwetenschap: een plaatsbepaling.”Verslagen en mededelingen van de KANTL130 (1): 5–23.
[Google Scholar]
Van de Velde, Freek en Dirk Pijpops. 2021. “Investigating Lexical Effects in Syntax with Regularized Regression (Lasso).”Journal of Research Design and Statistics in Linguistics and Communication Science6 (2): 166–199.
[Google Scholar]
Van Hout, Roeland, en AnneVermeer. 2007. “Comparing measures of lexical richness.” In Modelling and Assessing Vocabulary Knowledge, geredigeerd door HelmutDaller, JamesMilton, en JeanineTreffers-Daller, 93-115. Cambridge: Cambridge University Press
[Google Scholar]
Venables, William, en BrianRipley. 2002. Modern applied statistics with S. 4te editie. New York: Springer.
[Google Scholar]
Wickham, Hadley. 2016. ggplot2: Elegent graphics for data analysis. New York: Springer.
[Google Scholar]
Winter, Bodo. 2020. Statistics for linguists: An introduction using R. London: Routledge.
[Google Scholar]
Zenner, Eline, DirkSpeelman, en DirkGeeraerts. 2015. “A sociolinguistic analysis of borrowing in weak contact situations: English loanwords and phrases in expressive utterances in a Dutch reality TV show.”International Journal of Bilingualism19 (3): 333–346.
[Google Scholar]
Zenner, Eline, KrisHeylen, en FreekVan de Velde. 2018. “Most borrowable construction ever! A large-scale approach to contact-induced pragmatic change.”Journal of Pragmatics133: 134–149.
[Google Scholar]

/content/journals/10.5117/TET2023.1.005.PIJP

Big Pimpin’. Een big data-benadering van de verspreiding van het leenwoord pimpen in het Nederlands

Taal en Tongval 75, 73 (2023); https://doi.org/10.5117/TET2023.1.005.PIJP

/content/journals/10.5117/TET2023.1.005.PIJP

Data & Media loading...

Keyword(s): alternantie; big data; entropie; multinomiale regressie; Twitter

oa Big Pimpin’. Een big data-benadering van de verspreiding van het leenwoord pimpen in het Nederlands

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Destandardization is not destandardization

Taaldiversiteit in Nederland

Contemporary Standard Language Change

‘Stabilisering’ van tussentaal?

De dynamiek van geslachtsmarkering in de Noord-Brabantse dialecten

Wandel und Variation in der Morphosyntax der schweizerdeutschen Dialekte

Wat dragen we vandaag: een hemd met blazer of een shirt met jasje?

The pragmatic necessity of borrowing

Codification and reallocation in seventeenth-century Paris

Ne zelfzekere leraar of gewoon nen enthousiaste mens? Een matched-guise onderzoek naar de attitude tegenover tussentaal bij West-Vlamingen