Heterogeneous attention based transformer for sign language translation

Hao Zhang, Yixiang Sun, Zenghui Liu,Qiyuan Liu,Xiyao Liu,Ming Jiang, Gerald Schafer,Hui Fang

APPLIED SOFT COMPUTING(2023)

引用 0|浏览2
暂无评分
摘要
Sign language translation (SLT) has attracted significant interest both from research and industry, enabling convenient communications with the deaf-mute community. While recent transformer-based models have shown improved sign translation performance, it is still under-explored how to design an efficient transformer-based deep network architecture that effectively extracts joint visual-text features by exploiting multi-level spatial and temporal contextual information. In this paper, we propose heterogeneous attention based transformer(HAT), a novel SLT model to generate attentions from diverse spatial and temporal contextual levels. Specifically, the proposed light dual-stream sparse attention-based module yields more effective visual-text representations compared to conventional transformers. Extensive experiments demonstrate that our HAT achieves state-of-the-art performance on the challenging PHOENIX2014T benchmark dataset with a BLEU-4 score of 25.33 on the test set.(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
更多
查看译文
关键词
Sign language translation,Transformer,Attention-based models,Dual-sparse attention,PHOENIX2014T
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要