Hybrid CNN-transformer based meta-learning approach for personalized image aesthetics assessment

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION(2024)

引用 0|浏览5
暂无评分
摘要
Personalized Image Aesthetics Assessment (PIAA) is highly subjective, as people's aesthetic preferences vary greatly. Traditional generic models struggle to capture the unique preferences of each individual, and PIAA often deals with limited samples from individual users. Furthermore, it requires a holistic consideration of diverse visual features in images, including both local and global features. To address these challenges, we propose an innovative network that combines the power of transformer and Convolutional Neural Networks (CNNs) with Meta-Learning for PIAA (TCML-PIAA). Firstly, we leverage both Vision Transformer blocks and CNNs to extract long-term and short-term dependencies, mining richer and heterogeneous aesthetic attributes from these two branches. Secondly, to effectively fuse these distinct features, we introduce an Aesthetic Feature Interaction Module (AFIM), designed to seamlessly integrate the aesthetic features extracted from CNNs and ViT, enabling the interaction and fusion of aesthetic information across different modalities. We also incorporate a ChannelSpatial Attention Module (CSAM), embedding it within both the CNNs and the AFIM to enhance the perception of different regions in images, further exploring the aesthetic cues in images. Experimental results demonstrate that our TCML-PIAA outperforms existing state-of-the-art methods on benchmark databases.
更多
查看译文
关键词
Meta-Learning,Personalized image aesthetics assessment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要