Machine Translation for Accessible Multi-Language Text Analysis

Edward Chew; Mahasweta Chakraborti; William Weisman; Seth Frey

doi:10.5117/CCR2025.1.5.CHEW

E-ISSN: 2665-9085

oa Machine Translation for Accessible Multi-Language Text Analysis
Authors: Edward Chew¹, Mahasweta Chakraborti², William Weisman³ & Seth Frey⁴
View Affiliations Hide Affiliations

¹ Department of Communication, University of California Davis, Davis, CA ² Department of Communication, University of California Davis, Davis, CA ³ Department of Communication, University of California Davis, Davis, CA ⁴ Department of Communication, University of California Davis, Davis, CA
Publisher: Amsterdam University Press
Source: Computational Communication Research, Volume 7, Issue 1, Jan 2025, p. 1
DOI: https://doi.org/10.5117/CCR2025.1.5.CHEW
Language: English

Abstract

English is the international standard of social research, but scholars are increasingly conscious of their responsibility to meet the need for scholarly insight into communication processes globally. This tension is as true in computational methods as in any other area, with revolutionary advances in the tools for English language texts leaving most other languages far behind. In this paper, we aim to leverage those very advances to demonstrate that multi- language analysis is currently accessible to all computational scholars. We show that English-trained measures computed after translation to English have adequate-to-excellent accuracy compared to source-language measures computed on original texts. We show this for three major analytics—sentiment analysis, topic analysis, and word embeddings—over 16 languages, including Spanish, Chinese, Hindi, and Arabic. We validate this claim by comparing predictions on original language tweets and their back-translations: double translations from their source language to English and back to the source language. Our results suggest that Google Translate, a simple and widely accessible tool, effectively preserves semantic content across languages and methods. Modern machine translation can thus help computational scholars make more inclusive and general claims about human communication.

Article metrics loading...

/content/journals/10.5117/CCR2025.1.5.CHEW

2025-01-01

2025-06-02

Full text loading...

/content/journals/10.5117/CCR2025.1.5.CHEW

Article Type: Research Article

Keyword(s): back translation; computational text analysis; multi-lingual text analysis; natural language processing; sentiment analysis; topic modelling; word embedding

oa Machine Translation for Accessible Multi-Language Text Analysis

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

A framework for privacy preserving digital trace data collection through data donation

The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research

Fifteen Seconds of Fame: TikTok and the Supply Side of Social Video

OSD2F: An Open-Source Data Donation Framework

Conversational Agent Research Toolkit

Computational observation

Detecting Impoliteness and Incivility in Online Discussions

The Pervasive Presence of Chinese Government Content on Douyin Trending Videos

Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment

How Document Sampling and Vocabulary Pruning Affect the Results of Topic Models