ScSer: Supervised Contrastive Learning for Speech Emotion Recognition using Transformers

Varun Sai Alaparthi, Tejeswara Reddy Pasam, Deepak Abhiram Inagandla,Jay Prakash,Pramod Kumar Singh

2022 15TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI)(2022)

引用 4|浏览10
暂无评分
摘要
Emotion recognition from the speech is a key challenging task and an active area of research in effective HumanComputer Interaction (HCI). Though many deep learning and machine learning approaches have been proposed to tackle the problem, they lack in both accuracy and learning robust representations agnostic to changes in voice. Additionally, there is a lack of sufficient labelled speech data for bigger models. To overcome these issues, we propose supervised contrastive learning with transformers for the task of speech emotion recognition (ScSer) and evaluate it on different standard datasets. Further, we experiment the supervised contrastive setting with different augmentations from WavAugment library and some custom augmentations. Finally, we propose a custom augmentation random cyclic shift with which ScSer outperforms other competitive methods and produce a state of the art accuracy of 96% on RAVDESS dataset with 7600 samples (Big-Ravdess) and a 2-4% boost over other wav2vec methods.
更多
查看译文
关键词
Supervised contrastive learning,Augmentations,Transformers,Wav2Vec 2.0,HuBERT,RAVDESS,CNN,Speech Emotion Recognition (SER)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要