Multimodal Data Mining in a Multimedia Database Based on Structured Max Margin Learning.
TKDD(2016)
摘要
Mining knowledge from a multimedia database has received increasing attentions recently since huge repositories are made available by the development of the Internet. In this article, we exploit the relations among different modalities in a multimedia database and present a framework for general multimodal data mining problem where image annotation and image retrieval are considered as the special cases. Specifically, the multimodal data mining problem can be formulated as a structured prediction problem where we learn the mapping from an input to the structured and interdependent output variables. In addition, in order to reduce the demanding computation, we propose a new max margin structure learning approach called Enhanced Max Margin Learning (EMML) framework, which is much more efficient with a much faster convergence rate than the existing max margin learning methods, as verified through empirical evaluations. Furthermore, we apply EMML framework to develop an effective and efficient solution to the multimodal data mining problem that is highly scalable in the sense that the query response time is independent of the database scale. The EMML framework allows an efficient multimodal data mining query in a very large scale multimedia database, and excels many existing multimodal data mining methods in the literature that do not scale up at all. The performance comparison with a state-of-the-art multimodal data mining method is reported for the real-world image databases.
更多查看译文
关键词
Algorithms,Experimentation,Multimodal data mining,image annotation,image retrieval,max margin
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络