Searching Better Architectures for Neural Machine Translation

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING(2020)

引用 17|浏览115
暂无评分
摘要
Neural architecture search (NAS) has played important roles in the evolution of neural architectures. However, no much attention has been paid to improve neural machine translation (NMT) through NAS approaches. In this work, we propose a gradient-based NAS algorithm for NMT, which automatically discovers architectures with better performances. Compared with previous NAS work, we jointly search the network operations (e.g., LSTM, CNN, self-attention etc) as well as dropout rates to ensure better results. We show that with reasonable resources it is possible to discover novel neural network architectures for NMT, which achieve consistently better performances than Transformer [1], the state-of-the-art NMT model, across different tasks. On WMT'14 English-to-German translation, IWSLT'14 German-to-English translation and WMT'18 Finnish-to-English translation tasks, our discovered architectures could obtain 30.1, 36.1 and 26.4 BLEU scores, which are great improvement over Transformer baselines. We also empirically verify that the discovered model on one task can be transferred to other tasks.
更多
查看译文
关键词
Computer architecture,Decoding,Task analysis,Speech processing,Network architecture,Neural networks,Optimization,Neural architecture search (NAS),neural machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要