Annotation Sensitivity: Training Data Collection Methods Affect Model Performance
CoRR(2023)
摘要
When training data are collected from human annotators, the design of the
annotation instrument, the instructions given to annotators, the
characteristics of the annotators, and their interactions can impact training
data. This study demonstrates that design choices made when creating an
annotation instrument also impact the models trained on the resulting
annotations. We introduce the term annotation sensitivity to refer to the
impact of annotation data collection methods on the annotations themselves and
on downstream model performance and predictions. We collect annotations of hate
speech and offensive language in five experimental conditions of an annotation
instrument, randomly assigning annotators to conditions. We then fine-tune BERT
models on each of the five resulting datasets and evaluate model performance on
a holdout portion of each condition. We find considerable differences between
the conditions for 1) the share of hate speech/offensive language annotations,
2) model performance, 3) model predictions, and 4) model learning curves. Our
results emphasize the crucial role played by the annotation instrument which
has received little attention in the machine learning literature. We call for
additional research into how and why the instrument impacts the annotations to
inform the development of best practices in instrument design.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要