A multimodal attention fusion network with a dynamic vocabulary for TextVQA

Pattern Recognition(2022)

引用 12|浏览45
暂无评分
摘要
•A novel encoder-decoder method for textVQA is proposed.•The proposed method utilizes the multimodal features to improve model accuracy.•Attention map loss is used to address the dynamic vocabulary problem.•Achieved the first place on ICDAR ST-VQA 2019 challenge.
更多
查看译文
关键词
Dynamic vocabulary,Attention map,Multimodal fusion,ST-VQA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要