Quantifying Interrater Agreement and Reliability Between Thoracic Pathologists: Paradoxical Behavior of Cohen’s Kappa in the Presence of a High Prevalence of the Histopathologic Feature in Lung Cancer

JTO Clinical and Research Reports(2024)

引用 0|浏览7
暂无评分
摘要
Introduction: Cohen’s kappa is often used to quantify the agreement between two pathologists. Nevertheless, a high prevalence of the feature of interest can lead to seemingly paradoxical results, such as low Cohen’s kappa values despite high “observed agreement.” Here, we investigate Cohen’s kappa using data from histologic subtyping assessment of lung adenocarcinomas and introduce alternative measures that can overcome this “kappa paradox.” Methods: A total of 50 frozen sections from stage I lung adenocarcinomas less than or equal to 3 cm in size were independently reviewed by two pathologists to determine the absence or presence of five histologic patterns (lepidic, papillary, acinar, micropapillary, solid). For each pattern, observed agreement (proportion of cases with concordant “absent” or “present” ratings) and Cohen’s kappa were calculated, along with Gwet’s AC1. Results: The prevalence of any amount of the histologic patterns ranged from 42% (solid) to 97% (acinar). On the basis of Cohen’s kappa, there was substantial agreement for four of the five patterns (lepidic, 0.65; papillary, 0.67; micropapillary, 0.64; solid, 0.61). Acinar had the lowest Cohen’s kappa (0.43, moderate agreement), despite having the highest observed agreement (88%). In contrast, Gwet’s AC1 values were close to or higher than Cohen’s kappa across patterns (lepidic, 0.64; papillary, 0.69; micropapillary, 0.71; solid, 0.73; acinar, 0.85). The proportion of positive versus negative agreement was 93% versus 50% for acinar. Conclusions: Given the dependence of Cohen’s kappa on feature prevalence, interrater agreement studies should include complementary indices such as Gwet’s AC1 and proportions of specific agreement, especially in settings with a high prevalence of the feature of interest.
更多
查看译文
关键词
Interobserver coefficient,Reproducibility,Predominant histologic subtypes,Diagnostic accuracy,Performance metrics,Sensitivity and specificity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要