The Science of Data Collection: Insights from Surveys can Improve Machine Learning Models

arxiv(2024)

引用 0|浏览2
暂无评分
摘要
Whether future AI models make the world safer or less safe for humans rests in part on our ability to efficiently collect accurate data from people about what they want the models to do. However, collecting high quality data is difficult, and most AI/ML researchers are not trained in data collection methods. The growing emphasis on data-centric AI highlights the potential of data to enhance model performance. It also reveals an opportunity to gain insights from survey methodology, the science of collecting high-quality survey data. In this position paper, we summarize lessons from the survey methodology literature and discuss how they can improve the quality of training and feedback data, which in turn improve model performance. Based on the cognitive response process model, we formulate specific hypotheses about the aspects of label collection that may impact training data quality. We also suggest collaborative research ideas into how possible biases in data collection can be mitigated, making models more accurate and human-centric.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要