Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing – A Monte Carlo Simulation Approach.

Stefan Geiß

doi:10.5117/CCR2021.1.003.GEIS

E-ISSN: 2665-9085

oa Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing – A Monte Carlo Simulation Approach.
By Stefan Geiß
Publisher: Amsterdam University Press
Source: Computational Communication Research, Volume 3, Issue 1, Mar 2021, p. 61 - 89
DOI: https://doi.org/10.5117/CCR2021.1.003.GEIS
Language: English
- Published online: 01 Mar 2021

Abstract

This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.

Article metrics loading...

/content/journals/10.5117/CCR2021.1.003.GEIS

2021-03-01

2025-06-02

The full text of this item is not currently available.

References

Altman, D. G.(1991). Practical statistics for medical research. Chapman and Hall.
[Google Scholar]
Barbiero, A., & Ferrari, P. A.(2015). GenOrd: Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions (1.4.0) [Computer software]. CRAN.R-project.org/package=GenOrd
[Google Scholar]
Berelson, B.(1971). Content analysis in communication research. Hafner.
[Google Scholar]
Chan, W., & Chan, D. W.-L.(2004). Bootstrap Standard Error and Confidence Intervals for the Correlation Corrected for Range Restriction: A Simulation Study. Psychological Methods, 9(3), 369–385. https://doi.org/10/dqvthj
[Google Scholar]
Cohen, J.(1988). Statistical power analysis for the behavioral sciences (2nd ed). L. Erlbaum Associates.
[Google Scholar]
Feng, G. C.(2013). Factors affecting intercoder reliability: A Monte Carlo experiment. Quality & Quantity, 47(5), 2959–2982. https://doi.org/10.1007/s11135-012-9745-9
[Google Scholar]
Feng, G. C.(2014). Intercoder reliability indices: Disuse, misuse, and abuse. Quality & Quantity, 48(3), 1803–1815. https://doi.org/10.1007/s11135-013-9956-8
[Google Scholar]
Feng, G. C., & Zhao, X.(2016). Do Not Force Agreement: A Response to. Methodology, 12(4), 145–148. https://doi.org/10/gdqc5m
[Google Scholar]
Fico, F. G., Lacy, S., & Riffe, D.(2008). A Content Analysis Guide for Media Economics Scholars. Journal of Media Economics, 21(2), 114–130. https://doi.org/10.1080/08997760802069994
[Google Scholar]
Fleiss, J. L., Levin, B., & Paik, M. C.(2003). Statistical methods for rates and proportions (3rd ed). Wiley.
[Google Scholar]
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T.(2015). mvtnorm: Multivariate Normal and t Distributions (1.0-3) [Computer software]. CRAN.R-project.org/package=mvtnorm
[Google Scholar]
Gustafson, P.(2004). Measurement error and misclassification in statistics and epidemiology: Impacts and Bayesian adjustments. Chapman & Hall/CRC.
[Google Scholar]
Gwet, K. L.(2014). Handbook of Inter-Rater Reliability (3rd ed.). Advanced Analytics.
[Google Scholar]
Hayes, A. F., & Krippendorff, K.(2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77–89. https://doi.org/10/cs2t97
[Google Scholar]
Kepplinger, H. M.(1989). Content Analysis and Reception Analysis. American Behavioral Scientist, 33, 175–182. https://doi.org/10/cczw6m
[Google Scholar]
Krippendorff, K. (2004a). Content analysis: An introduction to its methodology (2nd ed). SAGE.
[Google Scholar]
Krippendorff, K. (2004b). Reliability in Content Analysis: Some Common Misconceptions and Recommendations. Human Communication Research, 30(3), 411–433. https://doi.org/10.1111/j.1468-2958.2004.tb00738.x
[Google Scholar]
Krippendorff, K.(2012). Comment: A dissenting view on so-called paradoxes of reliability coefficients. In C. T.Salmon (Ed.), Communication Yearbook (Vol. 36, pp. 481–500). Routledge.
[Google Scholar]
Krippendorff, K.(2016). Misunderstanding Reliability. Methodology, 12(4), 139–144. https://doi.org/10.1027/1614-2241/a000119
[Google Scholar]
Krippendorff, K.(2017). Three concepts to retire. Annals of the International Communication Association, 41(1), 92–99. https://doi.org/10/gf659g
[Google Scholar]
Landis, J. R., & Koch, G. G.(1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1), 159. https://doi.org/10.2307/2529310
[Google Scholar]
Lombard, M., Snyder-Duch, J., & Bracken, C. C.(2002). Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability. Human Communication Research, 28(4), 587–604. https://doi.org/10.1111/j.1468-2958.2002.tb00826.x
[Google Scholar]
R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. www.R-project.org/
[Google Scholar]
Riffe, D., Lacy, S., & Fico, F.(1998). Analyzing media messages: Using quantitative content analysis in research. Erlbaum.
[Google Scholar]
Scharkow, M., & Bachl, M.(2017). How Measurement Error in Content Analysis and Self-Reported Media Use Leads to Minimal Media Effect Findings in Linkage Analyses: A Simulation Study. Political Communication, 34(3), 323–343. https://doi.org/10/ggbm28
[Google Scholar]
Schuck, A. R. T., Vliegenthart, R., & De Vreese, C. H.(2015). Matching Theory and Data: Why Combining Media Content with Survey Data Matters. British Journal of Political Science, 1–9. https://doi.org/10/gdqc3h
[Google Scholar]
Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H. G.(2020). In Validations We Trust? The Impact of Imperfect Human Annotations as a Gold Standard on the Quality of Validation of Automated Content Analysis. Political Communication, 37(4), 550–572. https://doi.org/10.1080/10584609.2020.1723752
[Google Scholar]
van ’t Veer, A. E., & Giner-Sorolla, R.(2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. https://doi.org/10/f85xtx
[Google Scholar]
Zhao, X., Liu, J. S., & Deng, K.(2012). Assumptions behind inter-coder reliability indices. In C. T.Salmon (Ed.), Communication Yearbook (Vol. 36, pp. 419–480). Routledge.
[Google Scholar]

/content/journals/10.5117/CCR2021.1.003.GEIS

Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing – A Monte Carlo Simulation Approach.

CCR 3, 61 (2021); https://doi.org/10.5117/CCR2021.1.003.GEIS

/content/journals/10.5117/CCR2021.1.003.GEIS

Data & Media loading...

Article Type: Research Article

Keyword(s): Content analysis; Effect size; Hypothesis testing; Intercoder agreement; Intercoder reliability; Monte Carlo simulation;; Power analysis; Sample size

oa Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing – A Monte Carlo Simulation Approach.

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

A framework for privacy preserving digital trace data collection through data donation

The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research

Fifteen Seconds of Fame: TikTok and the Supply Side of Social Video

OSD2F: An Open-Source Data Donation Framework

Conversational Agent Research Toolkit

Computational observation

Detecting Impoliteness and Incivility in Online Discussions

The Pervasive Presence of Chinese Government Content on Douyin Trending Videos

Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment

How Document Sampling and Vocabulary Pruning Affect the Results of Topic Models