Inter-rater Agreement Measures, and the Refinement of Metrics in the PLATO MT Evaluation Paradigm.

MTSummit(2005)

引用 33|浏览44
暂无评分
摘要
The PLATO machine translation (MT) evaluation (MTE) research program has as a goal the systematic development of a predictive relationship between discrete, well- defined MTE metrics and the specific information processing tasks that can be reliably performed with output. Traditional measures of quality, informed by the International Standards for Language Engineering (ISLE), namely, clarity, coherence, morphology, syntax, general and domain-specific lexical robustness, and named-entity translation, as well as a DARPA- inspired measure of adequacy are its core. For robust validation, indispensable for refinement of tests and guidelines, we measure inter-rater reliability on the assessments. Here we report on our results, focusing on the PLATO Clarity and Coherence assessments, and we discuss our method for iteratively refining both the linguistic metrics and the guidelines for applying them within the PLATO evaluation paradigm. Finally, we discuss reasons why kappa might not be the best measure of inter- rater agreement for our purposes, and suggest directions for future investigation.
更多
查看译文
关键词
computational linguistics,automation,machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要