Multi-granularity Separation Network for Text-Based Person Retrieval with Bidirectional Refinement Regularization

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval(2023)

引用 0|浏览30
暂无评分
摘要
Text-based person retrieval is one of the fundamental tasks in the field of computer vision, which aims to retrieve the most relevant pedestrian image from all the candidates according to textual descriptions. Such a cross-modal retrieval task could be challenging since it requires one to properly select distinguishing clues and perform cross-modal alignments. To achieve cross-modal alignments, most previous works focus on different inter-modal constraints while overlooking the influence of intra-modal noise, yielding sub-optimal retrieved results in certain cases. To this end, we propose a novel framework termed Multi-granularity Separation Network with Bidirectional Refinement Regularization (MSN-BRR) to tackle the problem. The framework consists of two components: (1) Multi-granularity Separation Network, which extracts the multi-grained discriminative textual and visual representations at local and global semantic levels. (2) Bidirectional Refinement Regularization, which alleviates the influence of intra-modal noise and facilitates the proper alignments between the visual and textual representations. Extensive experiments on two widely used benchmarks, i.e., CUHK-PEDES and ICFG-PEDES show that our MSN-BRR method outperforms current state-of-the-art methods.
更多
查看译文
关键词
Refinement Regularization, Text-Based, Multi-granularity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要