The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research | Amsterdam University Press Journals Online
2004
Volume 4, Issue 2
  • E-ISSN: 2665-9085

Abstract

Abstract

This paper introduces the 4CAT Capture and Analysis Toolkit (4CAT), an open-source Web-based research tool. 4CAT can capture data from a variety of online sources (including Twitter, Telegram, Reddit, 4chan, 8kun, BitChute, Douban and Parler) and analyze them through analytical processors. 4CAT seeks to make robust data capture and analysis available to researchers not familiar with computer programming, without ‘black-boxing’ the implemented research methods. Before outlining the practical use of 4CAT, we discuss three ‘affordances’ that inform its design: modularity, transparency, and traceability. 4CAT is modular because new data sources and analytical processors can be easily added and changed; transparent because it aims to render legible its inner workings; and traceable because of automatic and shareable documentation of intermediate analysis steps. We then show how 4CAT operationalizes these features through a description of its general setup and a short walkthrough. Finally, we discuss how 4CAT strives for an ‘ethics by design’ development philosophy that enables ethically sound data-driven research. 4CAT is then positioned as both an answer to and a further call for ‘tool criticism’ in computational social research.

Loading

Article metrics loading...

/content/journals/10.5117/CCR2022.2.007.HAGE
2022-10-01
2024-03-03
Loading full text...

Full text loading...

/deliver/fulltext/26659085/4/2/CCR2022.2.007.HAGE.html?itemId=/content/journals/10.5117/CCR2022.2.007.HAGE&mimeType=html&fmt=ahah

References

  1. Al-Rawi, A. (2020). The convergence of social media and other communication technologies in the promotion of illicit and controlled drugs. Journal of Public Health, fdaa210. https://doi.org/10.1093/pubmed/fdaa210
    [Google Scholar]
  2. Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An Open Source Software for Exploring and Manipulating Networks. Third International AAAI Conference on Weblogs and Social Media, 361–362.
    [Google Scholar]
  3. Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., & Blackburn, J. (2020). The Pushshift Reddit Dataset. ArXiv:2001.08435 [Cs]. http://arxiv.org/abs/2001.08435
    [Google Scholar]
  4. Baym, N. K. (2000). Tune in, log on: Soaps, fandom, and online community. Sage Publications.
    [Google Scholar]
  5. Berry, D. (2011). The computational turn: Thinking about the digital humanities. Culture Machine, 12.
    [Google Scholar]
  6. Borra, E., & Rieder, B. (2014). Programmed method: Developing a toolset for capturing and analyzing tweets. Aslib Journal of Information Management. https://doi.org/10.1108/AJIM-09-2013-0094
    [Google Scholar]
  7. Bowker, G. (2013). Data flakes: An afterword to “Raw Data” is an oxymoron. In L.Gitelman (Ed.), “Raw Data” is an Oxymoron (pp. 167–171). MIT Press.
    [Google Scholar]
  8. Boyd, D., & Crawford, K. (2011). Six Provocations for Big Data. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.1926431
    [Google Scholar]
  9. Bucher, T., & Helmond, A. (2017). The Affordances of Social Media Platforms. Sage Publications. https://dare.uva.nl/search?identifier=149a9089-49a4-454c-b935-a6ea7f2d8986
    [Google Scholar]
  10. Crawford, K. (2013, April1). The Hidden Biases in Big Data. Harvard Business Review. https://hbr.org/2013/04/the-hidden-biases-in-big-data
    [Google Scholar]
  11. de Zeeuw, D., Hagen, S., Peeters, S., & Jokubauskaite, E. (2020). Tracing normiefication: A cross-platform analysis of the QAnon conspiracy theory. First Monday. https://doi.org/10.5210/fm.v25i11.10643
    [Google Scholar]
  12. Donovan, J., Lewis, B., & Friedberg, B. (2019). Parallel Ports: Sociotechnical Change from the Alt-Right to Alt-Tech. In M.Fielitz & N.Thurston (Eds.), Post-Digital Cultures of the Far Right: Online Actions and Offline Consequences in Europe and the US (pp. 49–63). transcript Verlag.
    [Google Scholar]
  13. Gaffney, D., & Matias, J. N. (2018). Caveat emptor, computational social science: Large-scale missing data in a widely-published Reddit corpus. PLOS ONE, 13(7), e0200162. https://doi.org/10.1371/journal.pone.0200162
    [Google Scholar]
  14. Giguet, E., & Lucas, N. (2009). Share and Explore Discussion Forum Objects on the Calico Website. Computer Supported Collaborative Learning Practices, 174–176.
    [Google Scholar]
  15. Hutchby, I. (2001). Technologies, Texts and Affordances. Sociology, 35(2), 441–456. https://doi.org/10.1177/S0038038501000219
    [Google Scholar]
  16. Jokubauskaitė, E., & Peeters, S. (2020). Generally Curious: Thematically Distinct Datasets of General Threads on 4chan/pol/. Proceedings of the International AAAI Conference on Web and Social Media, 14, 863–867.
    [Google Scholar]
  17. Kirschenbaum, M. (2012). What Is Digital Humanities and What’s It Doing in English Departments? In M. K.Gold (Ed.), Debates in the Digital Humanities (pp. 3–11). University of Minnesota Press.
    [Google Scholar]
  18. Knorr-Cetina, K. (1999). Epistemic cultures: How the sciences make knowledge. Harvard University Press.
    [Google Scholar]
  19. Koolen, M., van Gorp, J., & van Ossenbruggen, J. (2019). Toward a model for digital tool criticism: Reflection as integrative practice. Digital Scholarship in the Humanities, 34(2), 368–385. https://doi.org/10.1093/llc/fqy048
    [Google Scholar]
  20. Latour, B. (2011). Networks, Societies, Spheres: Reflections of an Actor-network Theorist. International Journal of Communication, 5(0), 15.
    [Google Scholar]
  21. Latour, B., Jensen, P., Venturini, T., Grauwin, S., & Boullier, D. (2012). ‘The Whole Is Always Smaller Than Its Parts’ – A Digital Test of Gabriel Tardes’ Monads. The British Journal of Sociology, 63(4), 590–615. https://doi.org/10.1111/j.1468-4446.2012.01428.x
    [Google Scholar]
  22. Manovich, L. (2001). The Language of New Media. MIT Press.
    [Google Scholar]
  23. Niemelä, M., Kaasinen, E., & Ikonen, V. (2014). Ethics by design—An experience-based proposal for introducing ethics to R&D of emerging ICTs. ETHICOMP 2014 – Liberty and Security in an Age of ICTs. https://cris.vtt.fi/en/publications/ethics-by-design-an-experience-based-proposal-for-introducing-eth
    [Google Scholar]
  24. Rieder, B., & Röhle, T. (2012). Digital Methods: Five Challenges. In D. M.Berry (Ed.), Understanding Digital Humanities (pp. 67–84). Palgrave Macmillan UK. https://doi.org/10.1057/9780230371934_4
    [Google Scholar]
  25. Rieder, B., & Röhle, T. (2017). Digital Methods: From Challenges to Bildung. In M. T.Schäfer & K.Van Es (Eds.), The Datafied Society: Studying Culture through Data (pp. 109–124). Amsterdam University Press.
    [Google Scholar]
  26. Sveningsson Elm, M. (2009). How do various notions of privacy influence decisions in qualitative internet research? In A.Markham & N.Baym (Eds.), Internet Inquiry: Conversations About Method (pp. 69–97). SAGE Publications, Inc. https://doi.org/10.4135/9781483329086
    [Google Scholar]
  27. Thaler, R. H., & Sunstein, C. R. (2009). Nudge: Improving decisions about health, wealth, and happiness (Rev. and expanded ed). Penguin Books.
    [Google Scholar]
  28. Tuters, M., & Hagen, S. (2019). (((They))) rule: Memetic antagonism and nebulous othering on 4chan. New Media & Society, 146144481988874. https://doi.org/10.1177/1461444819888746
    [Google Scholar]
  29. van Es, K., López Coombs, N., & Boeschoten, T. (2017). Towards a Reflexive Digital Data Analysis. In The Datafied Society: Studying Culture through Data (pp. 171–180). Amsterdam University Press.
    [Google Scholar]
  30. van Es, K., Wieringa, M., & Schäfer, M. T. (2018). Tool Criticism: From Digital Methods to Digital Methodology. Proceedings of the 2nd International Conference on Web Studies – WS.2 2018, 24–27. https://doi.org/10.1145/3240431.3240436
    [Google Scholar]
  31. Venturini, T. (2011). What is second-degree objectivity and how could it be represented. http://www.medialab.sciences-po.fr/publications/Venturini-Second_Degree_Objectivi-ty_draft1.pdf
    [Google Scholar]
  32. Zelenkauskaite, A., Toivanen, P., Huhtamäki, J., & Valaskivi, K. (2020). Shades of hatred online: 4chan duplicate circulation surge during hybrid media events. First Monday. https://doi.org/10.5210/fm.v26i1.11075
    [Google Scholar]
  33. Zimmer, M. (2010). “But the data is already public”: On the ethics of research in Facebook. Ethics and Information Technology, 12(4), 313–325. https://doi.org/10.1007/s10676-010-9227-5
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.5117/CCR2022.2.007.HAGE
Loading
/content/journals/10.5117/CCR2022.2.007.HAGE
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error