Learning to Defer in Content Moderation: The Human-AI Interplay
CoRR(2024)
摘要
Successful content moderation in online platforms relies on a human-AI
collaboration approach. A typical heuristic estimates the expected harmfulness
of a post and uses fixed thresholds to decide whether to remove it and whether
to send it for human review. This disregards the prediction uncertainty, the
time-varying element of human review capacity and post arrivals, and the
selective sampling in the dataset (humans only review posts filtered by the
admission algorithm).
In this paper, we introduce a model to capture the human-AI interplay in
content moderation. The algorithm observes contextual information for incoming
posts, makes classification and admission decisions, and schedules posts for
human review. Only admitted posts receive human reviews on their harmfulness.
These reviews help educate the machine-learning algorithms but are delayed due
to congestion in the human review system. The classical learning-theoretic way
to capture this human-AI interplay is via the framework of learning to defer,
where the algorithm has the option to defer a classification task to humans for
a fixed cost and immediately receive feedback. Our model contributes to this
literature by introducing congestion in the human review system. Moreover,
unlike work on online learning with delayed feedback where the delay in the
feedback is exogenous to the algorithm's decisions, the delay in our model is
endogenous to both the admission and the scheduling decisions.
We propose a near-optimal learning algorithm that carefully balances the
classification loss from a selectively sampled dataset, the idiosyncratic loss
of non-reviewed posts, and the delay loss of having congestion in the human
review system. To the best of our knowledge, this is the first result for
online learning in contextual queueing systems and hence our analytical
framework may be of independent interest.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要