Vector Quantization Knowledge Transfer for End-to-End Text Image Machine Translation

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览3
暂无评分
摘要
End-to-end text image machine translation (TIMT) aims at translating source language embedded in images into target language without recognizing intermediate texts in images. However, the data scarcity of end-to-end TIMT task limits the translation performance. Existing research explores aligning continuous features from related tasks of text image recognition (TIR) or machine translation (MT) to alleviate the problem of data limitation, but the alignment in continuous vector space is extremely difficult and it inevitably introduces fitting errors resulting in significant performance degradation. To better align TIMT features with MT semantic features, we propose a novel Vector Quantization Knowledge Transfer (VQKT) method that employs a trainable codebook to quantize continuous features into discrete space. The quantization distribution of the MT feature is utilized as the teacher distribution to guide the TIMT model to generate similar discrete codes. Through alignment and knowledge transfer based on probability distribution, the TIMT model can better imitate the feature representation of the MT teacher model and generate high-quality target language translation. Extensive experiments demonstrate VQKT significantly outperforms the existing end-to-end TIMT performance.
更多
查看译文
关键词
Text image machine translation,vector quantization,quantization distribution,knowledge transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要