Online multimodal matrix factorization for human action video indexing

Content-Based Multimedia Indexing(2014)

引用 1|浏览3
暂无评分
摘要
This paper addresses the problem of searching for videos containing instances of specific human actions. The proposed strategy builds a multimodal latent space representation where both visual content and annotations are simultaneously mapped. The hypothesis behind the method is that such a latent space yields better results when built from multiple data modalities. The semantic embedding is learned using matrix factorization through stochastic gradient descent, which makes it suitable to deal with large-scale collections. The method is evaluated on a large-scale human action video dataset with three modalities corresponding to action labels, action attributes and visual features. The evaluation is based on a query-by-example strategy, where a sample video is used as input to the system. A retrieved video is considered relevant if it contains an instance of the same human action present in the query. Experimental results show that the learned multimodal latent semantic representation produces improved performance when compared with an exclusively visual representation.
更多
查看译文
关键词
gradient methods,image representation,indexing,learning (artificial intelligence),matrix decomposition,query processing,video signal processing,action attributes,action labels,annotations,data modalities,human action video dataset,human action video indexing,latent space,matrix factorization,multimodal latent semantic representation,multimodal latent space representation,online multimodal matrix factorization,query-by-example strategy,semantic embedding,stochastic gradient descent,visual content,visual features,visual representation,Matrix factorization,human actions,information retrieval,latent space,multimodal data,query by example,video processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要