Lowering the Language Barrier: Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts

Moritz Laurer; Wouter van Atteveldt; Andreu Casas; Kasper Welbers

doi:10.5117/CCR2023.2.7.LAUR

E-ISSN: 2665-9085

oa Lowering the Language Barrier: Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts
Authors: Moritz Laurer¹, Wouter van Atteveldt², Andreu Casas³ & Kasper Welbers⁴
View Affiliations Hide Affiliations

¹ Vrije Universiteit Amsterdam ² Vrije Universiteit Amsterdam ³ Vrije Universiteit Amsterdam ⁴ Vrije Universiteit Amsterdam
Publisher: Amsterdam University Press
Source: Computational Communication Research, Volume 5, Issue 2, Jan 2023, p. 1
DOI: https://doi.org/10.5117/CCR2023.2.7.LAUR
Language: English

Abstract

The social science toolkit for computational text analysis is still very much in the making. We know surprisingly little about how to produce valid insights from large amounts of multilingual texts for comparative social science research. In this paper, we test several recent innovations from deep transfer learning to help advance the computational toolkit for social science research in multilingual settings. We investigate the extent to which ‘prior language and task knowledge’ stored in the parameters of modern language models is useful for enabling multilingual research; we investigate the extent to which these algorithms can be fruitfully combined with machine translation; and we investigate whether these methods are not only accurate but also practical and valid in multilingual settings – three essential conditions for lowering the language barrier in practice. We use two datasets with texts in 12 languages from 27 countries for our investigation. Our analysis shows, that, based on these innovations, supervised machine learning can produce substantively meaningful outputs. Our BERT-NLI model trained on only 674 or 1674 texts in only one or two languages can validly predict political party families’ stances towards immigration in eight other languages and ten other countries.

Article metrics loading...

/content/journals/10.5117/CCR2023.2.7.LAUR

2023-01-01

2025-06-01

Full text loading...

/content/journals/10.5117/CCR2023.2.7.LAUR

Article Type: Research Article

Keyword(s): computational social sciences; machine learning; multilingualism; text-as-data

Most Cited Most Cited RSS feed

- oa A framework for privacy preserving digital trace data collection through data donation
  
  Authors: Laura Boeschoten, Jef Ausloos, Judith E. Möller, Theo Araujo & Daniel L. Oberski
- oa The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research
  
  Authors: Stijn Peeters & Sal Hagen
- oa Fifteen Seconds of Fame: TikTok and the Supply Side of Social Video
  
  Authors: Benjamin Guinaudeau, Kevin Munger & Fabio Votta
- oa OSD2F: An Open-Source Data Donation Framework
  
  Authors: Theo Araujo, Jef Ausloos, Wouter van Atteveldt, Felicia Loecherbach, Judith Moeller, Jakob Ohme, Damian Trilling, Bob van de Velde, Claes de Vreese & Kasper Welbers
- oa Conversational Agent Research Toolkit
  
  By Theo Araujo
- oa Computational observation
  
  Authors: Mario Haim & Angela Nienierza
- oa Detecting Impoliteness and Incivility in Online Discussions
  
  Authors: Anke Stoll, Marc Ziegele & Oliver Quiring
- oa The Pervasive Presence of Chinese Government Content on Douyin Trending Videos
  
  Authors: Yingdan Lu & Jennifer Pan
- oa Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment
  
  Authors: Chung-hong Chan, Joseph Bajjalieh, Loretta Auvil, Hartmut Wessler, Scott Althaus, Kasper Welbers, Wouter van Atteveldt & Marc Jungblut
- oa How Document Sampling and Vocabulary Pruning Affect the Results of Topic Models
  
  Authors: Daniel Maier, Andreas Niekler, Gregor Wiedemann & Daniela Stoltenberg
More Less

oa Lowering the Language Barrier: Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

oa A framework for privacy preserving digital trace data collection through data donation

oa The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research

oa Fifteen Seconds of Fame: TikTok and the Supply Side of Social Video

oa OSD2F: An Open-Source Data Donation Framework

oa Conversational Agent Research Toolkit

oa Computational observation

oa Detecting Impoliteness and Incivility in Online Discussions

oa The Pervasive Presence of Chinese Government Content on Douyin Trending Videos

oa Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment

oa How Document Sampling and Vocabulary Pruning Affect the Results of Topic Models