Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising

Conference on Information and Knowledge Management(2022)

引用 2|浏览20
暂无评分
摘要
ABSTRACTVisual content in multimedia advertising effectively attracts the customer's attention. Search-based multimedia advertising is a cross-modal retrieval problem. Due to the modal gap between texts and images/videos, cross-modal image/video retrieval is a challenging problem. Recently, multi-modal dictionary BERT has bridged the model gap by unifying the images/videos and texts from different modalities through a multi-modal dictionary. In this work, we improve the multi-modal dictionary BERT by developing a multi-scale multi-modal dictionary and propose a Multi-scale Multi-modal Dictionary BERT (M^2D-BERT). The multi-scale dictionary partitions the feature space into different levels and is effective in describing the fine-level relevance and the coarse-level relevance between the text and images. Meanwhile, we constrain that the code-words in dictionaries from different scales to be orthogonal to each other. Thus, it ensures multiple dictionaries are complementary to each other. Moreover, we adopt a two-level residual quantization to enhance the capacity of each multi-modal dictionary. Systematic experiments conducted on large-scale cross-modal retrieval datasets demonstrate the excellent performance of our M2D-BERT.
更多
查看译文
关键词
multimedia advertising,retrieval,multi-scale,multi-modal,text-image
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要