Fusion-Eval: Integrating Assistant Evaluators with LLMs
CoRR(2023)
Abstract
Evaluating natural language systems poses significant challenges,
particularly in the realms of natural language understanding and high-level
reasoning. In this paper, we introduce 'Fusion-Eval', an innovative approach
that leverages Large Language Models (LLMs) to integrate insights from various
assistant evaluators. The LLM is given the example to evaluate along with
scores from the assistant evaluators. Each of these evaluators specializes in
assessing distinct aspects of responses. Fusion-Eval achieves a 0.962
system-level Kendall-Tau correlation with humans on SummEval and a 0.744
turn-level Spearman correlation on TopicalChat, which is significantly higher
than baseline methods. These results highlight Fusion-Eval's significant
potential in the realm of natural language system evaluation.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined