Language-Agnostic Visual-Semantic Embeddings

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019)(2019)

引用 56|浏览37
暂无评分
摘要
This paper proposes a framework for training language-invariant cross-modal retrieval models. We also introduce a novel character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Our contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. We show that our models outperform the current state-of-the-art in both single and multi-language scenarios. This work can be seen as the basis of a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios. Code is available at https://github.com/jwehrmann/lavse.
更多
查看译文
关键词
language-agnostic visual-semantic embeddings,language-invariant cross-modal retrieval models,character-based word-embedding approach,similar words,word-embedding space,character level,text encoder decrease,language-invariant textual encoder,character-level-based word-embeddings,architecture language-invariant,retrieval research,multiple-language scenarios,cross-language alignment module
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要