1.
Mihalache, Serban; Burileanu, Dragos; Franti, Eduard; Dascalu, Monica; Bratan, Costin-Andrei
Lasting emotions - An investigation of short- and long-term affective content remanence in speech Journal Article
In: ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, vol. 25, no. 1, pp. 20-35, 2022, ISSN: 1453-8245.
Abstract | BibTeX | Tags: Speech emotion remanence; speech emotion recognition; machine learning; multilayer perceptrons; law enforcement
@article{WOS:000775912300002,
title = {Lasting emotions - An investigation of short- and long-term affective
content remanence in speech},
author = {Serban Mihalache and Dragos Burileanu and Eduard Franti and Monica Dascalu and Costin-Andrei Bratan},
issn = {1453-8245},
year = {2022},
date = {2022-01-01},
journal = {ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY},
volume = {25},
number = {1},
pages = {20-35},
publisher = {EDITURA ACAD ROMANE},
address = {CALEA 13 SEPTEMBRIE NR 13, SECTOR 5, BUCURESTI 050711, ROMANIA},
abstract = {Speech emotion recognition (SER) is a promising ongoing research area
with important applications for forensics and law enforcement
operations, among others. Approaches have been previously proposed to
integrate SER systems to assist in surveillance tasks, emergency
services, police investigations, or other operations, especially in the
attempt to anticipate and prevent potential criminal acts or even to
counter terrorist activities. One of the challenges presented by these
tasks consists of discerning patterns in the temporal evolution of the
affective content that would indicate suspicious behavior and warrant
further inquiry. In this work, we gain insight into these patterns and
prove that 1) if a human interaction is emotionally triggering for the
subject, then their affective response will not decay instantly, but
over a longer time period, and subsequent emotionally neutral
interactions will still be accompanied by an aroused negative affective
state (emotional remanence); and 2) if an emotionally charged event is
forthcoming for the subject, as the event draws closer, the subject will
experience higher intensity emotions and will exhibit a correspondingly
increased affective response. In order to provide a reasonable partial
proxy for the high-stakes conditions and triggers expected in real-life
scenarios, we have developed a speech dataset comprising 270 recordings
of 18 students behind on their university exams and about to attempt
them for the second or third time; thus, the upcoming exams and the
potential consequences of failing them represent the emotionally charged
event. Human evaluators labeled the recordings in terms of the
identified emotional classes (grouped into negative emotional classes
and the neutral state) and of arousal-valence affect space values.
Analyzing the annotations made by the evaluators, we prove that the
subjects' affective response is significantly higher as the emotionally
charged event approaches, and emotional remanence can be observed even
15 minutes after the initial interaction, or even after 30 minutes when
under the added influence of the event's imminence. We show that the
arousal increases (higher intensity affective response) as the event
draws closer, while the valence decreases (more negative affective
response), again supporting the second hypothesis, and suggesting that
such patterns would be relevant for the targeted applications. We
propose and implement a SER system using artificial neural networks
(ANNs) based on multilayer perceptron (MLP) models, obtaining good
performance (up to 72.7% accuracy) when training in a
speaker-independent manner, and yielding classification and regression
results consistent with those given by human evaluation, supporting the
possibility and usefulness of using machine learning (ML) systems to
monitor affective responses in order to automatically detect the
patterns associated with the behaviors relevant for forensic and law
enforcement applications and to facilitate intervention and prevention.},
keywords = {Speech emotion remanence; speech emotion recognition; machine learning; multilayer perceptrons; law enforcement},
pubstate = {published},
tppubtype = {article}
}
Speech emotion recognition (SER) is a promising ongoing research area
with important applications for forensics and law enforcement
operations, among others. Approaches have been previously proposed to
integrate SER systems to assist in surveillance tasks, emergency
services, police investigations, or other operations, especially in the
attempt to anticipate and prevent potential criminal acts or even to
counter terrorist activities. One of the challenges presented by these
tasks consists of discerning patterns in the temporal evolution of the
affective content that would indicate suspicious behavior and warrant
further inquiry. In this work, we gain insight into these patterns and
prove that 1) if a human interaction is emotionally triggering for the
subject, then their affective response will not decay instantly, but
over a longer time period, and subsequent emotionally neutral
interactions will still be accompanied by an aroused negative affective
state (emotional remanence); and 2) if an emotionally charged event is
forthcoming for the subject, as the event draws closer, the subject will
experience higher intensity emotions and will exhibit a correspondingly
increased affective response. In order to provide a reasonable partial
proxy for the high-stakes conditions and triggers expected in real-life
scenarios, we have developed a speech dataset comprising 270 recordings
of 18 students behind on their university exams and about to attempt
them for the second or third time; thus, the upcoming exams and the
potential consequences of failing them represent the emotionally charged
event. Human evaluators labeled the recordings in terms of the
identified emotional classes (grouped into negative emotional classes
and the neutral state) and of arousal-valence affect space values.
Analyzing the annotations made by the evaluators, we prove that the
subjects' affective response is significantly higher as the emotionally
charged event approaches, and emotional remanence can be observed even
15 minutes after the initial interaction, or even after 30 minutes when
under the added influence of the event's imminence. We show that the
arousal increases (higher intensity affective response) as the event
draws closer, while the valence decreases (more negative affective
response), again supporting the second hypothesis, and suggesting that
such patterns would be relevant for the targeted applications. We
propose and implement a SER system using artificial neural networks
(ANNs) based on multilayer perceptron (MLP) models, obtaining good
performance (up to 72.7% accuracy) when training in a
speaker-independent manner, and yielding classification and regression
results consistent with those given by human evaluation, supporting the
possibility and usefulness of using machine learning (ML) systems to
monitor affective responses in order to automatically detect the
patterns associated with the behaviors relevant for forensic and law
enforcement applications and to facilitate intervention and prevention.
with important applications for forensics and law enforcement
operations, among others. Approaches have been previously proposed to
integrate SER systems to assist in surveillance tasks, emergency
services, police investigations, or other operations, especially in the
attempt to anticipate and prevent potential criminal acts or even to
counter terrorist activities. One of the challenges presented by these
tasks consists of discerning patterns in the temporal evolution of the
affective content that would indicate suspicious behavior and warrant
further inquiry. In this work, we gain insight into these patterns and
prove that 1) if a human interaction is emotionally triggering for the
subject, then their affective response will not decay instantly, but
over a longer time period, and subsequent emotionally neutral
interactions will still be accompanied by an aroused negative affective
state (emotional remanence); and 2) if an emotionally charged event is
forthcoming for the subject, as the event draws closer, the subject will
experience higher intensity emotions and will exhibit a correspondingly
increased affective response. In order to provide a reasonable partial
proxy for the high-stakes conditions and triggers expected in real-life
scenarios, we have developed a speech dataset comprising 270 recordings
of 18 students behind on their university exams and about to attempt
them for the second or third time; thus, the upcoming exams and the
potential consequences of failing them represent the emotionally charged
event. Human evaluators labeled the recordings in terms of the
identified emotional classes (grouped into negative emotional classes
and the neutral state) and of arousal-valence affect space values.
Analyzing the annotations made by the evaluators, we prove that the
subjects' affective response is significantly higher as the emotionally
charged event approaches, and emotional remanence can be observed even
15 minutes after the initial interaction, or even after 30 minutes when
under the added influence of the event's imminence. We show that the
arousal increases (higher intensity affective response) as the event
draws closer, while the valence decreases (more negative affective
response), again supporting the second hypothesis, and suggesting that
such patterns would be relevant for the targeted applications. We
propose and implement a SER system using artificial neural networks
(ANNs) based on multilayer perceptron (MLP) models, obtaining good
performance (up to 72.7% accuracy) when training in a
speaker-independent manner, and yielding classification and regression
results consistent with those given by human evaluation, supporting the
possibility and usefulness of using machine learning (ML) systems to
monitor affective responses in order to automatically detect the
patterns associated with the behaviors relevant for forensic and law
enforcement applications and to facilitate intervention and prevention.