Leveraging Large Language Models for NLG Evaluation: A Survey
CoRR(2024)
摘要
In the rapidly evolving domain of Natural Language Generation (NLG)
evaluation, introducing Large Language Models (LLMs) has opened new avenues for
assessing generated content quality, e.g., coherence, creativity, and context
relevance. This survey aims to provide a thorough overview of leveraging LLMs
for NLG evaluation, a burgeoning area that lacks a systematic analysis. We
propose a coherent taxonomy for organizing existing LLM-based evaluation
metrics, offering a structured framework to understand and compare these
methods. Our detailed exploration includes critically assessing various
LLM-based methodologies, as well as comparing their strengths and limitations
in evaluating NLG outputs. By discussing unresolved challenges, including bias,
robustness, domain-specificity, and unified evaluation, this survey seeks to
offer insights to researchers and advocate for fairer and more advanced NLG
evaluation techniques.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要