Exploring attention on faces: similarities between humans and Transformers

2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)(2022)

引用 2|浏览1
暂无评分
摘要
Attention in Machine Learning allows a model to selectively up-weight informative parts of an input in relation to others. The Vision Transformer (ViT) is entirely based on attention. ViTs have shown state of the art performance in multiple fields including person re-identification, presentation attack detection and object recognition. Several works have shown that embedding human attention into a Machine Learning pipeline can improve performance or compensate for a lack of data. However the correlation between computer vision models and human attention has not yet been investigated. In this paper we explore the intersection of human and Transformer attention. For this we collect a new dataset: the University of Sassari Face Fixation Dataset (Uniss-FFD) of human fixations and show through a quantitative analysis that correlations exist between these two modalities. The dataset described in this paper is available at https://github.com/ CVLab-Uniss/Uniss-FFD.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要