A multimodal attention fusion network with a dynamic vocabulary for TextVQA
Pattern Recognition(2022)
摘要
•A novel encoder-decoder method for textVQA is proposed.•The proposed method utilizes the multimodal features to improve model accuracy.•Attention map loss is used to address the dynamic vocabulary problem.•Achieved the first place on ICDAR ST-VQA 2019 challenge.
更多查看译文
关键词
Dynamic vocabulary,Attention map,Multimodal fusion,ST-VQA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要