The Sentiment is in the Details: A Language-agnostic Approach to Dictionary Expansion and Sentence-level Sentiment Analysis in News Media

Erik de Vries

doi:10.5117/CCR2022.2.003.VRIE

E-ISSN: 2665-9085

oa The Sentiment is in the Details

A Language-agnostic Approach to Dictionary Expansion and Sentence-level Sentiment Analysis in News Media
By Erik de Vries¹
View Affiliations Hide Affiliations

¹ Department of Media and Social Sciences, University of Stavanger
Publisher: Amsterdam University Press
Source: Computational Communication Research, Volume 4, Issue 2, Oct 2022, p. 424 - 462
DOI: https://doi.org/10.5117/CCR2022.2.003.VRIE
Language: English
- Published online: 01 Oct 2022

Abstract

Determining the sentiment in the individual sentences of a newspaper article in an automated fashion is a major challenge. Manually created sentiment dictionaries often fail to meet the required standards. And while computer-generated dictionaries show promise, they are often limited by the availability of suitable linguistic resources. I propose and test a novel, language-agnostic and resource-efficient way of constructing sentiment dictionaries, based on word embedding models. The dictionaries are constructed and evaluated based on four corpora containing two decades of Danish, Dutch (Flanders and the Netherlands), English, and Norwegian newspaper articles, which are cleaned and parsed using Natural Language Processing. Concurrent validity is evaluated using a dataset of human-coded newspaper sentences, and compared to the performance of the Polyglot sentiment dictionaries. Predictive validity is tested through two long-standing hypotheses on the negativity bias in political news. Results show that both the concurrent validity and predictive validity is good. The dictionaries outperform their Polyglot counterparts, and are able to correctly detect a negativity bias, which is stronger for tabloids. The method is resource-efficient in terms of manual labor when compared to manually constructed dictionaries, and requires a limited amount of computational power.

Article metrics loading...

/content/journals/10.5117/CCR2022.2.003.VRIE

2022-10-01

2025-06-01

The full text of this item is not currently available.

References

Alba, A., Gruhl, D., Ristoski, P., & Welch, S. (2018). Interactive dictionary expansion using neural language models. HumL@ ISWC, 7–15.
[Google Scholar]
Aldayel, A., & Magdy, W. (2021). Stance detection on social media: State of the art and trends. Information Processing & Management, 58 (4), 102597. https://doi.org/10.1016/j.ipm.2021.102597
[Google Scholar]
Alhothali, A., & Hoey, J. (2017). Semi-Supervised Affective Meaning Lexicon Expansion Using Semantic and Distributed Word Representations. arXiv:1703.09825 [Cs]. https://arxiv.org/abs/1703.09825
[Google Scholar]
Almeida, F., & Xexéo, G. (2019). Word Embeddings: A Survey. arXiv:1901.09069 [Cs, Stat]. https://arxiv.org/abs/1901.09069
[Google Scholar]
Al-Rfou, R., Perozzi, B., & Skiena, S. (2013). Polyglot: Distributed Word Representations for Multilingual NLP. 10.
[Google Scholar]
Amsler, M. (2020). Using Lexical-Semantic Concepts for Fine-Grained Classification in the Embedding Space [PhD thesis]. University of Zurich.
[Google Scholar]
Bleich, E., & van der Veen, A. M. (2018). Media portrayals of Muslims: A comparative sentiment analysis of American newspapers, 1996–2015. Politics, Groups, and Identities, 1–20. https://doi.org/10.1080/21565503.2018.1531770
[Google Scholar]
Boukes, M., van de Velde, B., Araujo, T., & Vliegenthart, R. (2020). What’s the Tone? Easy Doesn’t Do It: Analyzing Performance and Agreement Between Off-the-Shelf Sentiment Analysis Tools. Communication Methods and Measures, 14 (2), 83–104. https://doi.org/10.1080/19312458.2019.1671966
[Google Scholar]
Chen, Y., & Skiena, S. (2014). Building Sentiment Lexicons for All Major Languages. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 383–389. https://doi.org/10.3115/v1/P14-2063
[Google Scholar]
de Vreese, C., Esser, F., & Hopmann, D. N. (2016). Comparing Political Journalism. Routledge. https://doi.org/10.4324/9781315622286
[Google Scholar]
Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. Studies in Linguistic Analysis.
[Google Scholar]
Glogger, I. (2019). Soft Spot for Soft News? Influences of Journalistic Role Conceptions on Hard and Soft News Coverage. Journalism Studies, 20 (16), 2293–2311. https://doi.org/10.1080/1461670X.2019.1588149
[Google Scholar]
Hallin, D. C., & Mancini, P. (2004). Comparing Media Systems: Three Models of Media and Politics. In Cambridge Core. /core/books/comparing-mediasystems/B7A12371782B7A1D62BA1A72C1395E43; Cambridge University Press. https://doi.org/10.1017/CBO9780511790867
[Google Scholar]
Hlavac, M. (2018). Stargazer: Well-Formatted Regression and Summary Statistics Tables.https://CRAN.R-project.org/package=stargazer.
[Google Scholar]
Khoo, C. S., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44 (4), 491–511. https://doi.org/10.1177/0165551517703514
[Google Scholar]
Lengauer, G., Esser, F., & Berganza, R. (2012). Negativity in political news: A review of concepts, operationalizations and key findings. Journalism, 13 (2), 179–202. https://doi.org/10.1177/1464884911427800
[Google Scholar]
Makki, R., Brooks, S., & Milios, E. E. (2014). Context-specific sentiment lexicon expansion via minimal user interaction. 2014 International Conference on Information Visualization Theory and Applications (IVAPP), 178–186.
[Google Scholar]
Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–751.
[Google Scholar]
Mohammad, S. M. (2016). Sentiment analysis: Detecting valence, emotions, and other affectual states from text. In H.Meiselman (Ed.), Emotion measurement. Elsevier.
[Google Scholar]
Muddiman, A., McGregor, S. C., & Stroud, N. J. (2019). (Re)Claiming Our Expertise: Parsing Large Text Corpora With Manually Validated and Organic Dictionaries. Political Communication, 36 (2), 214–226. https://doi.org/10.1080/10584609.2018.1517843
[Google Scholar]
Nivre, J., Abrams, M., Agić, Ž., Ahrenberg, L., Antonsen, L., Aplonova, K., Aranzabe, M. J., Arutie, G., Asahara, M., Ateyah, L., Attia, M., Atutxa, A., Augustinus, L., Badmaeva, E., Ballesteros, M., Banerjee, E., Bank, S., Barbu Mititelu, V., Basmov, V., & Zhu, H. (2018). Universal dependencies 2.3.
[Google Scholar]
Otto, L., Glogger, I., & Boukes, M. (2017). The Softening of Journalistic Political Communication: A Comprehensive Framework Model of Sensationalism, Soft News, Infotainment, and Tabloidization. Communication Theory, 27 (2), 136–155. https://doi.org/10.1111/comt.12102
[Google Scholar]
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162
[Google Scholar]
Proksch, S.-O., Lowe, W., Wäckerle, J., & Soroka, S. (2019). Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches. Legislative Studies Quarterly, 44 (1), 97–131. https://doi.org/10.1111/lsq.12218
[Google Scholar]
Reinemann, C., Stanyer, J., Scherr, S., & Legnante, G. (2012). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13 (2), 221–239. https://doi.org/10.1177/1464884911427803
[Google Scholar]
Rheault, L., Beelen, K., Cochrane, C., & Hirst, G. (2016). Measuring Emotion in Parliamentary Debates with Automated Textual Analysis. PLOS ONE, 11 (12), e0168843. https://doi.org/10.1371/journal.pone.0168843
[Google Scholar]
Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š., & Sedlmair, M. (2018). More than Bags of Words: Sentiment Analysis with Word Embeddings. Communication Methods and Measures, 12 (2-3), 140–157. https://doi.org/10.1080/19312458.2018.1455817
[Google Scholar]
Shi, T., Malioutov, I., & İrsoy, O. (2020). Semantic Role Labeling as Syntactic Dependency Parsing. arXiv:2010.11170 [Cs]. https://arxiv.org/abs/2010.11170
[Google Scholar]
Soroka, S., Young, L., & Balmas, M. (2015). Bad News or Mad News? Sentiment Scoring of Negativity, Fear, and Anger in News Content. The ANNALS of the American Academy of Political and Social Science, 659 (1), 108–121. https://doi.org/10.1177/0002716215569217
[Google Scholar]
Straka, M., & Straková, J. (2017). Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 88–99.
[Google Scholar]
van Atteveldt, W., Sheafer, T., Shenhav, S. R., & Fogel-Dror, Y. (2017). Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War. Political Analysis, 25 (02), 207–222. https://doi.org/10.1017/pan.2016.12.
[Google Scholar]
van Atteveldt, W., van der Velden, M. A. C. G., & Boukes, M. (2021). The Validity of Sentiment Analysis: Comparing Manual Annotation, Crowd-Coding, Dictionary Approaches, and Machine Learning Algorithms. Communication Methods and Measures, 15 (2), 121–140. https://doi.org/10.1080/19312458.2020.1869198
[Google Scholar]
Young, L., & Soroka, S. (2012). Affective News: The Automated Coding of Sentiment in Political Texts. Political Communication, 29 (2), 205–231. https://doi.org/10.1080/10584609.2012.671234
[Google Scholar]

/content/journals/10.5117/CCR2022.2.003.VRIE

The Sentiment is in the Details

CCR 4, 424 (2022); https://doi.org/10.5117/CCR2022.2.003.VRIE

/content/journals/10.5117/CCR2022.2.003.VRIE

Data & Media loading...

oa The Sentiment is in the Details

A Language-agnostic Approach to Dictionary Expansion and Sentence-level Sentiment Analysis in News Media

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

A framework for privacy preserving digital trace data collection through data donation

The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research

Fifteen Seconds of Fame: TikTok and the Supply Side of Social Video

OSD2F: An Open-Source Data Donation Framework

Conversational Agent Research Toolkit

Computational observation

Detecting Impoliteness and Incivility in Online Discussions

The Pervasive Presence of Chinese Government Content on Douyin Trending Videos

Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment

How Document Sampling and Vocabulary Pruning Affect the Results of Topic Models