Large Language Models for Data Annotation: A Survey
CoRR(2024)
摘要
Data annotation is the labeling or tagging of raw data with relevant
information, essential for improving the efficacy of machine learning models.
The process, however, is labor-intensive and expensive. The emergence of
advanced Large Language Models (LLMs), exemplified by GPT-4, presents an
unprecedented opportunity to revolutionize and automate the intricate process
of data annotation. While existing surveys have extensively covered LLM
architecture, training, and general applications, this paper uniquely focuses
on their specific utility for data annotation. This survey contributes to three
core aspects: LLM-Based Data Annotation, Assessing LLM-generated Annotations,
and Learning with LLM-generated annotations. Furthermore, the paper includes an
in-depth taxonomy of methodologies employing LLMs for data annotation, a
comprehensive review of learning strategies for models incorporating
LLM-generated annotations, and a detailed discussion on primary challenges and
limitations associated with using LLMs for data annotation. As a key guide,
this survey aims to direct researchers and practitioners in exploring the
potential of the latest LLMs for data annotation, fostering future advancements
in this critical domain. We provide a comprehensive papers list at
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要