A Hierarchical CNN-RNN Approach for Visual Emotion Classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)(2020)

引用 21|浏览149
暂无评分
摘要
Visual emotion classification is predicting emotional reactions of people for the given visual content. Psychological studies show that human emotions are affected by various visual stimuli from low level to high level, including contrast, color, texture, scene, object, and association, among others. Traditional approaches regarded different levels of stimuli as independent components and ignored to effectively fuse different stimuli. This article proposes a hierarchical convolutional neural network (CNN)-recurrent neural network (RNN) approach to predict the emotion based on the fused stimuli by exploiting the dependency among different-level features. First, we introduce a dual CNN to extract different levels of visual stimulus, where two related loss functions are designed to learn the stimuli representation under a multi-task learning structure. Further, to model the dependency between the low- and high-level stimulus, a stacked bi-directional RNN is proposed to fuse the preceding learned features from the dual CNN. Comparison experiments on one large-scale and three small scale datasets show that the proposed approach brings significant improvement. Ablation experiments demonstrate the effectiveness of different modules from our model.
更多
查看译文
关键词
Visual emotion recognition,feature fusing,hierarchical CNN-RNN,multi-task learning,stacked bi-directional RNN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要