Word Embedding Enrichment for Dictionary Construction: An Example of Incivility in Cantonese | Amsterdam University Press Journals Online
2004
Volume 5, Issue 1
  • E-ISSN: 2665-9085

Abstract

Dictionary-based methods remain valuable to measure concepts based on texts, though supervised machine learning has been widely used in much recent communication research. The present study proposes a semi-automatic and easily implemented method to build and enrich dictionaries based on word embeddings. As an example, we create a dictionary of political incivility that contains vulgarity and name-calling words in Cantonese. The study shows that dictionary-based classification outperforms supervised machine learning methods, including deep neural network models. Furthermore, a small number of random seed words can generate a highly accurate dictionary. However, the uncivil content detected is only weakly correlated with uncivil perceptions, as we demonstrate in a population-based survey experiment. The strengths and limitations of dictionary-based methods are discussed.

Loading

Article metrics loading...

/content/journals/10.5117/CCR2023.1.10.LIAN
2023-01-01
2024-04-28
Loading full text...

Full text loading...

http://instance.metastore.ingenta.com/content/journals/10.5117/CCR2023.1.10.LIAN
Loading
  • Article Type: Research Article
Keyword(s): Cantonese; dictionary construction; machine learning; political incivility; swearing
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error