2004
Volume 2, Issue 1
  • ISSN: 2665-9085
  • E-ISSN: 2665-9085

Abstract

Abstract

Impoliteness and incivility in online discussions have recently been discussed as relevant issues in communication science. However, automatically detecting these concepts with computational methods is challenging. In our study, we build and compare supervised classification models to predict impoliteness and incivility in online discussions on German media outlets on Facebook. Using a sample of 10,000 hand-coded user comments and a theory-grounded coding scheme, we develop classifiers on different feature sets including unigram and n-gram distributions as well as various dictionary-based features. Our findings show that impoliteness and incivility can be measured to a certain extent on the word level of a comment, but the models suffer from high misclassification rates, even if lexical resources are included. This is mainly because the classifiers cannot reveal subtle forms of incivility and because comment authors often use predictive words of incivility or impoliteness in non-offensive ways or in different contexts. Still, when applying the classifiers to a comparable set of comments, we find that the machine-coded categories and the hand-coded categories reveal similar patterns regarding the distribution of and the user reactions to uncivil/impolite comments. The findings of our study therefore provide new insights into the supervised machine learning approach to the detection of different forms of offensive language.

Loading

Article metrics loading...

/content/journals/10.5117/CCR2020.1.005.KATH
2020-02-01
2021-11-30
Loading full text...

Full text loading...

/deliver/fulltext/26659085/2/1/05_CCR2020.1.005.KATH.html?itemId=/content/journals/10.5117/CCR2020.1.005.KATH&mimeType=html&fmt=ahah

References

  1. Alonzo, M., & Aiken, M.(2004). Flaming in electronic communication. Decision Support Systems, 36(3), 205-213.
    [Google Scholar]
  2. Anderson, A.A., Brossard, D., Scheufele, D.A., Xenos, M.A., & Ladwig, P.(2014). The nasty effect: Online incivility and risk perceptions of emerging technologies. Journal of Computer‐Mediated Communication, 19(3), 373-387.
    [Google Scholar]
  3. Agarwal, S., & Sureka, A.(2014). A focused crawler for mining hate and extremism promoting videos on YouTube. In Proceedings of the 25th ACM conference on Hypertext and social media, pp. 294-296.
    [Google Scholar]
  4. Aggarwal, C.C., & Zhai, C.(2012). A survey of text classification algorithms. In Mining text data (pp. 163-222). Boston: Springer.
    [Google Scholar]
  5. Baayen, R.H.(2002). Word frequency distributions. Berlin: Springer Science & Business Media.
  6. Bird, S., Klein, E., & Loper, E.(2009). Natural language processing with Python: analyzing text with the natural language toolkit. London: O’Reilly Media.
  7. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T.(2016). Enriching word vectors with subword information. arXiv:1607.04606.
    [Google Scholar]
  8. Burnap, P., & Williams, M.L.(2015). Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet, 7(2), 223-242.
    [Google Scholar]
  9. Coe, K., Kenski, K., & Rains, S.A.(2014). Online and uncivil? Patterns and determinants of incivility in newspaper website comments. Journal of Communication, 64(4), 658–679. doi:10.1111/jcom.12104
    [Google Scholar]
  10. Coelho, L.P., & Richert, W.(2015). Building machine learning systems with Python. Birmingham: Packt Publishing Ltd.
  11. Daniels, J.(2009). Cloaked websites: propaganda, cyber-racism and epistemology in the digital era. New Media & Society, 11(5), 659-683.
    [Google Scholar]
  12. Davidson, T., Warmsley, D., Macy, M., & Weber, I.(2017). Automated hate speech detection and the problem of offensive language. arXiv:1703.04009.
    [Google Scholar]
  13. Denecke, K.(2008). Using sentiwordnet for multilingual sentiment analysis. In Data Engineering Workshop, 2008. ICDEW 2008, pp. 507-512.
    [Google Scholar]
  14. Dunteman, G.H., & Ho, M.-H.R.(2006). An introduction to generalized linear models. Thousand Oaks, CA: Sage.
  15. Garreta, R., & Moncecchi, G.(2013). Learning scikit-learn: machine learning in python. Birmingham: Packt Publishing Ltd.
  16. Gitari, N.D., Zuping, Z., Damien, H., & Long, J.(2015). A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering, 10(4), 215-230.
    [Google Scholar]
  17. Grimmer, J., & Stewart, B.M.(2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3), 267-297.
    [Google Scholar]
  18. Grice, P.(1989)Studies in the Way of Words. Cambridge: Harvard University Press.
  19. Han, J., Pei, J., & Kamber, M.(2011). Data mining: concepts and techniques. Elsevier.
  20. Herbst, S.(2010). Rude democracy: Civility and incivility in American politics. Philadelphia: Temple University Press.
  21. Hsueh, M., Yogeeswaran, K., & Malinen, S.(2015). “Leave your comment below”: Can biased online comments influence our own prejudicial attitudes and behaviors?Human Communication Research, 41(4), 557–576. doi:10.1111/hcre.12059
    [Google Scholar]
  22. Hsueh, P.Y., Melville, P., & Sindhwani, V. (2009, June). Data quality from crowdsourcing: a study of annotation selection criteria. In Proceedings of the NAACL HLT 2009, workshop on active learning for natural language processing, pp. 27-35.
    [Google Scholar]
  23. Jurafsky, D., & Martin, J.H.(2014). Speech and language processing (Vol. 3). London: Pearson.
  24. Kalch, A., & Naab, T.K.(2018). Replying, disliking, flagging: How users engage with uncivil and impolite comments on news sites. SCM Studies in Communication and Media, 6(4), 395–419.
    [Google Scholar]
  25. Kotsiantis, S.B., Zaharakis, I., & Pintelas, P.(2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3-24.
    [Google Scholar]
  26. Kwok, I., & Wang, Y. (2013, July). Locate the Hate: Detecting Tweets against Blacks. In Proceedings of the 27th National Conference on Artificial Intelligence (AAAI), pp. 1621-1624.
    [Google Scholar]
  27. Larsson, A.O.(2018). Assessing “The Regulars”—and Beyond: A study of comments on Norwegian and Swedish newspaper Facebook pages. Journalism Practice, 12(5), 605-623.
    [Google Scholar]
  28. Mahrt, M., & Scharkow, M.(2013). The value of big data in digital media research. Journal of Broadcasting & Electronic Media, 57(1), 20-33.
    [Google Scholar]
  29. Mandelbrot, B.(1961). On the theory of word frequencies and on related Markovian models of discourse. Structure of language and its mathematical aspects, 12, 190-219.
    [Google Scholar]
  30. Manning, C.D., Schütze, H., & Raghavan, P.(2008). Introduction to information retrieval (Vol. 39). Cambridge University Press.
  31. Manning, C.D., & Schütze, H.(2000). Foundations of statistical natural language processing (3. print.). Cambridge: MIT Press.
  32. McCallum, A., & Nigam, K. (1998, July). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, No. 1, pp. 41-48).
    [Google Scholar]
  33. Mikolov, T., Chen, K., Corrado, G., & Dean, J.(2013). Efficient estimation of word representations in vector space. arXiv:1301.3781.
    [Google Scholar]
  34. Muddiman, A., & Stroud, N.J.(2017). News values, cognitive biases, and partisan incivility in comment sections. Journal of Communication, 67(4), 586–609. doi:10.1111/jcom.12312
    [Google Scholar]
  35. Pang, B., & Lee, L.(2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135.
    [Google Scholar]
  36. Papacharissi, Z.(2004). Democracy online: Civility, politeness, and the democratic potential of online political discussion groups. New Media & Society, 6(6), 259–283. doi:10.1177/1461444804041444
    [Google Scholar]
  37. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J.(2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12, 2825-2830.
    [Google Scholar]
  38. Pendar, N.(2007). Toward spotting the pedophile telling victim from predator in text chats. In Semantic Computing, 2007 (ICSC 2007), pp. 235-241.
    [Google Scholar]
  39. Pennington, J., Socher, R., & Manning, C.(2014). Glove: Global vectors for word representation.In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543.
    [Google Scholar]
  40. Prochazka, F., Weber, P., & Schweiger, W.(2018). Effects of civility and reasoning in user comments on perceived journalistic quality. Journalism Studies, 19(1), 62–78.
    [Google Scholar]
  41. Raschka, S., & Mirjalili, V.(2017). Python machine learning. Packt Publishing Ltd.
  42. Remus, R., Quasthoff, U., & Heyer, G.(2010). SentiWS – a public available german-language resource for sentiment analysis. In Proceedings of the 7th International Language Resources and Evaluation (LREC’10), pp. 1168-1171.
    [Google Scholar]
  43. Robert, C.(2014). Machine Learning, a Probabilistic Perspective. CHANCE, 27(2), 62-63.
    [Google Scholar]
  44. Ross, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky, N., & Wojatzki, M.(2017). Measuring the reliability of hate speech annotations: The case of the european refugee crisis. arXiv preprint arXiv:1701.08118.
    [Google Scholar]
  45. Rowe, I.(2015). Civility 2.0: A comparative analysis of incivility in online political discussion. Information, Communication & Society, 18(2), 121-138.
    [Google Scholar]
  46. Sebastiani, F.(2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
    [Google Scholar]
  47. Silva, L.A., Mondal, M., Correa, D., Benevenuto, F., & Weber, I. (2016, March). Analyzing the Targets of Hate in Online Social Media. In ICWSM (pp. 687-690).
    [Google Scholar]
  48. Su, L.Y.F., Xenos, M.A., Rose, K.M., Wirz, C., Scheufele, D.A., & Brossard, D.(2018). Uncivil and personal? Comparing patterns of incivility in comments on the Facebook pages of news outlets. New Media & Society, 1461444818757205.
    [Google Scholar]
  49. Tausczik, Y.R., & Pennebaker, J.W.(2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29(1), 24-54.
    [Google Scholar]
  50. Thelwall, M., Buckley, K., & Paltoglou, G.(2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), 406-418.
    [Google Scholar]
  51. Thelwall, M., Wilkinson, D., & Uppal, S.(2010). Data mining emotion in social network communication: Gender differences in MySpace. Journal of the American Society for Information Science and Technology, 61(1), 190-199.
    [Google Scholar]
  52. Wiegand, M., Siegel, M., & Ruppenhofer, J.(2018). Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language. In Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS 2018) (pp. 1-10). Vienna, Austria.
    [Google Scholar]
  53. Witten, I.H., Frank, E., Hall, M.A., & Pal, C.J.(2016). Data Mining: Practical machine learning tools and techniques. Burlington: Morgan Kaufmann.
  54. Ziegele, M., Breiner, T., & Quiring, O.(2014). What Creates Interactivity in Online News Discussions?. Journal of Communication, 64, 1111–1138. doi:10.1111/jcom.12123
    [Google Scholar]
  55. Zipf, G.K.(1945). The meaning-frequency relationship of words. The Journal of General Psychology, 33(2), 251-256.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.5117/CCR2020.1.005.KATH
Loading
/content/journals/10.5117/CCR2020.1.005.KATH
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error