Towards Large Language Model driven Reference-less Translation Evaluation for English and Indian Languages
arxiv(2024)
摘要
With the primary focus on evaluating the effectiveness of large language
models for automatic reference-less translation assessment, this work presents
our experiments on mimicking human direct assessment to evaluate the quality of
translations in English and Indian languages. We constructed a translation
evaluation task where we performed zero-shot learning, in-context
example-driven learning, and fine-tuning of large language models to provide a
score out of 100, where 100 represents a perfect translation and 1 represents a
poor translation. We compared the performance of our trained systems with
existing methods such as COMET, BERT-Scorer, and LABSE, and found that the
LLM-based evaluator (LLaMA-2-13B) achieves a comparable or higher overall
correlation with human judgments for the considered Indian language pairs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要