A Feature Sparse Co-Attention Network for Visual internet of things (VIoT) sensing

Computers and Electrical Engineering(2022)

引用 0|浏览18
暂无评分
摘要
Visual internet of things (VIoT) is a cross-modal learning task that requires a simultaneous understanding of image and question text information. The attention mechanism simulates human vision to retain the important part of the information and discard the unimportant part. However, since both important and unimportant features participate in the weighted summation, there is no true retention and discarding of information features. Furthermore, in many VIoT solutions, the utilization of the attention mechanism does not achieve the purpose of improving VIoT performance by aligning the key regions of image and question. To solve the problems mentioned above, this paper proposes an attention mechanism that sparse features to achieve attention with true retention and discard functions. At the same time, a feature sparse co-attention network is constructed to align the key regions of vision and text. The network is composed of image self-attention unit, question self-attention unit, and guiding attention unit. Each self-attention unit has a feature sparse function. These units are cascaded in depth to form a hierarchical structure that, as a whole, realizes co-attention. Several experiments conducted on the VQA-v2 dataset show that the proposed method outperforms the latest methods.
更多
查看译文
关键词
Visual internet of things (VIoT), Feature sparse co-attention network (FSCAN),Bi-LSTM,Image self-attention unit (ISAU),Question self-attention unit (QSAU),Guided attention unit (GAU)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要