Lasting emotions - An investigation of short- and long-term affective content remanence in speech


引用 0|浏览1
Speech emotion recognition (SER) is a promising ongoing research area with important applications for forensics and law enforcement operations, among others. Approaches have been previously proposed to integrate SER systems to assist in surveillance tasks, emergency services, police investigations, or other operations, especially in the attempt to anticipate and prevent potential criminal acts or even to counter terrorist activities. One of the challenges presented by these tasks consists of discerning patterns in the temporal evolution of the affective content that would indicate suspicious behavior and warrant further inquiry. In this work, we gain insight into these patterns and prove that 1) if a human interaction is emotionally triggering for the subject, then their affective response will not decay instantly, but over a longer time period, and subsequent emotionally neutral interactions will still be accompanied by an aroused negative affective state (emotional remanence); and 2) if an emotionally charged event is forthcoming for the subject, as the event draws closer, the subject will experience higher intensity emotions and will exhibit a correspondingly increased affective response. In order to provide a reasonable partial proxy for the high-stakes conditions and triggers expected in real-life scenarios, we have developed a speech dataset comprising 270 recordings of 18 students behind on their university exams and about to attempt them for the second or third time; thus, the upcoming exams and the potential consequences of failing them represent the emotionally charged event. Human evaluators labeled the recordings in terms of the identified emotional classes (grouped into negative emotional classes and the neutral state) and of arousal-valence affect space values. Analyzing the annotations made by the evaluators, we prove that the subjects' affective response is significantly higher as the emotionally charged event approaches, and emotional remanence can be observed even 15 minutes after the initial interaction, or even after 30 minutes when under the added influence of the event's imminence. We show that the arousal increases (higher intensity affective response) as the event draws closer, while the valence decreases (more negative affective response), again supporting the second hypothesis, and suggesting that such patterns would be relevant for the targeted applications. We propose and implement a SER system using artificial neural networks (ANNs) based on multilayer perceptron (MLP) models, obtaining good performance (up to 72.7% accuracy) when training in a speaker-independent manner, and yielding classification and regression results consistent with those given by human evaluation, supporting the possibility and usefulness of using machine learning (ML) systems to monitor affective responses in order to automatically detect the patterns associated with the behaviors relevant for forensic and law enforcement applications and to facilitate intervention and prevention.
Speech emotion remanence, speech emotion recognition, machine learning, multilayer perceptrons, law enforcement
AI 理解论文
Chat Paper