DetTrans: A Lightweight Framework to Detect and Translate Noisy Inputs Simultaneously

IEEE ACM Trans. Audio Speech Lang. Process.(2023)

引用 0|浏览4
暂无评分
摘要
Neural machine translation (NMT) systems trained on clean data usually suffer from performance degradation when translating noisy inputs. Existing works attempt to improve the robustness of NMT normally via data augmentation, where synthetic noisy data are mixed with original clean data, either for training NMT with the standard NMT loss alone, or for tuning auxiliary tasks in a multi-task learning manner. Typical auxiliary tasks include detecting and correcting noises, exploiting noisy outputs for contrastive learning etc. The aforementioned two auxiliary tasks are generally designed independently, and the modules for detecting and correcting noises are heavyweight. In this article, we propose a new framework, DetTransNet (Detector-Translator Network), aiming to detect positions of noises in the input and translate the input simultaneously. The newly introduced noise detector module is essentially a lightweight binary classifier built upon the final layer of the encoder of the original Transformer model for the translation task, which is to identify at which position of the input has potential noise. The module has a very few parameters. In order to help the model capture the relationship between clean instances and their noisy counterparts, an extra loss is further introduced to enhance the interaction between clean and noisy data. In this way, we combine noise detection and contrastive learning together. As the model is able to identify and locate noises, a heuristic method is proposed to correct detected noises, in order to achieve better translations. Experiments show that DetTransNet is robust to four types of noises (deletion, insertion, swapping, keyboard), and obtain a substantial improvement of up to 1.6 BLEU points across different datasets.
更多
查看译文
关键词
noisy inputs,lightweight framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要