XMU Evaluation Report for CCMT2020

Guocheng Zhang, Yingmin Wang, Enjun Zhong, Qiuyi Jiang, Fang Jiang, Dong Zhang,Hongkang Zhu, Yidong Chen, Xiaodong Shi

semanticscholar(2020)

引用 0|浏览0
暂无评分
摘要
This paper introduces the situation of XMU participating in the task of Chinese-English parallel corpus filtering in the 16th China Conference on Machine Translation. In this evaluation, we introduce a rule-based method to filter noisy sentences in a harsh way; meanwhile, we also design five heuristic methods to measure the degree of parallelism between noisy sentences from different focuses. Especially the token-based Levenshtein distance and Mahalanobis distance based on bilingual pre-trained model have a good performance in selecting high-quality parallel data. Finally, we perform weighted fusion for the methods that perform well in two ways: addition and multiplication. In the end, the system submitted in this paper ranked second overall.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要