The Science of Data Collection: Insights from Surveys can Improve Machine Learning Models
arxiv(2024)
摘要
Whether future AI models make the world safer or less safe for humans rests
in part on our ability to efficiently collect accurate data from people about
what they want the models to do. However, collecting high quality data is
difficult, and most AI/ML researchers are not trained in data collection
methods. The growing emphasis on data-centric AI highlights the potential of
data to enhance model performance. It also reveals an opportunity to gain
insights from survey methodology, the science of collecting high-quality survey
data.
In this position paper, we summarize lessons from the survey methodology
literature and discuss how they can improve the quality of training and
feedback data, which in turn improve model performance. Based on the cognitive
response process model, we formulate specific hypotheses about the aspects of
label collection that may impact training data quality. We also suggest
collaborative research ideas into how possible biases in data collection can be
mitigated, making models more accurate and human-centric.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要