Multi-modal Emotion Recognition Utilizing Korean-English Vision and Language Information Alignment Pre-trained Model.

2023 14th International Conference on Information and Communication Technology Convergence (ICTC)(2023)

引用 0|浏览2
暂无评分
摘要
Emotions in humans find expression through a variety of modalities. In this paper, we build a new dataset that combines existing vision and language datasets and use this newly formed multi-modal dataset for the task of multi-modal emotion recognition in empathetic conversation. We utilize a vision-language pre-trained model, VL-KE-T5, to build a model that can process image and text information simultaneously. Comparative experiments show that the proposed model outperforms models that handle image and text separately in emotion recognition.
更多
查看译文
关键词
emotion recognition,multi-modal information,VL-KE-T5
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要