Saliency-Guided Attention Network for Image-Sentence Matching
International Conference on Computer Vision, pp. 5754-5763, 2019.
This paper studies the task of matching image and sentence, where learning appropriate representations across the multi-modal data appears to be the main challenge. Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we propose Saliency-guided Attention Network (SAN) that asymmetrica...More
PPT (Upload PPT)