- Home
- A-Z Publications
- Taal en Tongval
- Previous Issues
- Volume 75, Issue 1, 2023
Taal en Tongval - Volume 75, Issue 1, 2023
Volume 75, Issue 1, 2023
- Articles
-
-
-
Sound Change Estimation in Netherlandic Regional Languages: Reducing Inter-Transcriber Variability in Dialect Corpora
Authors: Raoul Sergio Samuel Jan Buurke & Martijn WielingAbstractLarge phonetic corpora are frequently used to investigate language variation and change in dialects, but these corpora are often constructed by many researchers in a collaborative effort. This typically results in inter-transcriber issues that may impact the reliability of analyses using these data. This problem is exacerbated when multiple phonetic corpora are compared when investigating real time dialect change. In this study, we therefore propose a method to automatically and iteratively merge phonetic symbols used in the transcriptions to obtain a more coarse-grained, but better comparable, phonetic transcription. Our approach is evaluated using two large phonetic Netherlandic dialect corpora in an attempt to estimate sound change in the area in the 20th century. The results are discussed in the context of the available literature about dialect change in the Netherlandic area.
-
-
-
The validity of mixed-effects regression for analysing linguistic distance matrices: a simulation study
Authors: John L.A. Huisman & Roeland van HoutAbstractRecent work in dialectometry has proposed the use of linear mixed-effects regression (LMER) for analysing full distance matrices. While the outcomes are promising, work is needed to confirm that such outcomes are valid, given that the analysis of distance matrices using this method is not established. The current contribution provides a supporting framework for this approach by testing its validity through a series of simulated datasets. We analysed the generated data using LMER, and compared its performance to that of the well-established multiple regression on distance matrices (MRM) approach. We find that the LMER results are on par with—and sometimes even exceed—the results obtained from MRM. The potential to include random effects makes LMER a more powerful tool than MRM to examine a linguistic area as a whole, with all pairwise comparisons included, making it an ideal candidate for big data analyses that are becoming more prevalent with the ongoing digitisation of large dialect databases.
-
- Artikelen
-
-
-
Big Pimpin’. Een big data-benadering van de verspreiding van het leenwoord pimpen in het Nederlands
Authors: Dirk Pijpops, Stefano De Pascale, Freek Van de Velde & Eline ZennerAbstractThis article illustrates some of the opportunities and challenges of pursuing a big data approach in linguistic research. To do so, we investigate the diffusion of the loan verb pimpen ‘to fancify’ in Dutch based on Twitter data. First, we focus on the derivations of the verb (e.g.: terugpimpen ‘to pimp back’, herpimpen ‘to repimp’, etc.) and plot the diversity of these forms through time, using the Chao-Wang-Jost estimation of Shannon entropy. We follow this up with an alternation study that compares pimpen not only to its ‘native’ alternative opleuken, but also its most frequent derivation oppimpen, using multinomial regression. It is found that, while pimpen’s early expansion in Dutch has proceeded at breakneck speed, resulting e.g. in a plethora of derivations that has so far gone undetected, its current momentum seems to be waning.
-
-
- Articles
-
-
-
Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch
More LessAbstractIn this contribution, we present the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus of Early and Late Modern Dutch (ca. 1550-1850). It consists of a digitised collection of handwritten administrative texts (e.g. town council meeting reports), handwritten ego-documents (e.g. diaries and travelogues), and printed pamphlets (e.g. of a political or religious nature). The corpus is also balanced between northern and southern material, with data from the provinces of Holland and Zeeland for the North, and from Flanders and Brabant for the South. After having discussed its structure and composition, we will illustrate the value of the new corpus with a number of smaller case studies. Based on our experiences with the corpus, we will conclude by launching a plea for historical corpus building not to focus too much on the quantity of data (‘big data’), but rather shift attention to data quality.
-
-