Start From Scratch: Towards Automatically Identifying, Modeling, And Naming Visual Attributes

MM(2014)

引用 98|浏览162
暂无评分
摘要
Higher-level semantics such as visual attributes are crucial for fundamental multimedia applications. We present a novel attribute discovery approach that can automatically identify, model and name attributes from an arbitrary set of image and text pairs that can be easily gathered on the Web. Different from conventional attribute discovery methods, our approach does not rely on any pre-defined vocabularies and human labeling. Therefore, we are able to build a large visual knowledge base without any human efforts. The discovery is based on a novel deep architecture, named Independent Component Multimodal Autoencoder (ICMAE), that can continually learn shared higher-level representations across the visual and textual modalities. With the help of the resultant representations encoding strong visual and semantic evidences, we propose to (a) identify attributes and their corresponding high-quality training images, (b) iteratively model them with maximum compactness and comprehensiveness, and (c) name the attribute models with human understandable words. To date, the proposed system has discovered 1,898 attributes over 1 3 million pairs of image and text. Extensive experiments on various real-world multimedia datasets demonstrate the quality and effectiveness of the discovered attributes, facilitating multimedia applications such as image annotation and retrieval as compared to the state-of-the-art approaches.
更多
查看译文
关键词
attribute discovery,deep learning,multimodal analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要