TriReID: Towards Multi-Modal Person Re-Identification via Descriptive Fusion Model

International Conference on Multimedia Retrieval (ICMR)(2022)

引用 1|浏览26
暂无评分
摘要
The cross-modal person re-identification (ReID) aims to retrieve one person from one modality to the other single modality, such as text-based and sketch-based ReID tasks. However, for these different modalities of describing a person, combining multiple aspects can obviously make full use of complementary information and improve the identification performance. Therefore, to explore how to comprehensively consider multi-modal information, we advance a novel multi-modal person re-identification task, which utilizes both text and sketch as a descriptive query to retrieve desired images. In fact, the textual description and the visual description are understood together to retrieve the person in the database to be more aligned with real-world scenarios, which is promising but seldom considered. Besides, based on an existing sketch-based ReID dataset, we construct a new dataset, TriReID, to support this challenging task in a semi-automated way. Particularly, we implement an image captioning model under the active learning paradigm to generate sentences suitable for ReID, in which the quality scores of the three levels are customized. Moreover, we propose a novel framework named Descriptive Fusion Model (DFM) to solve the multi-modal ReID issue. Specifically, we first develop a flexible descriptive embedding function to fuse the text and sketch modalities. Further, the fused descriptive semantic feature is jointly optimized under the generative adversarial paradigm to mitigate the cross-modal semantic gap. Extensive experiments on the TriReID dataset demonstrate the effectiveness and rationality of our proposed solution.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要