Facial expressions and speech are elements that provide emotional information about the user through multiple communication channels. In this paper, a novel multimodal emotion recognition system based on visual and auditory information processing is proposed. The proposed approach is used in real affective human robot communication in order to estimate five different emotional states (i.e., happiness, anger,fear, sadness and neutral), and it consists of two subsystems with similar structure. The first subsystem achieves a robust facial feature extraction based on consecutively applied filters to the edge image and the use of a Dynamic Bayessian Classifier.A similar classifier is used in the second subsystem, where the input is associated to a set of speech descriptors, such as speech-rate, energy and pitch. Both subsystems are finally combined in real time. The results of this multimodal approach show the robustness and accuracy of the methodology respect to single emotion recognition systems.
|Title of host publication||Proceedings of the Workshop on Multimodal and Semantics for Robotics Systems|
|Number of pages||9|
|ISBN (Electronic)||1613-0073, 2015|
|Publication status||Published - Jun 2015|
|Event||MuSRobS 2015 - Hamburg, Germany|
Duration: 1 Oct 2015 → 1 Oct 2015
|Period||1/10/15 → 1/10/15|