Resumen |
Human emotion recognition, which encompasses both verbal and non-verbal signals such as body language and facial expressions, remains a complex challenge for computing systems. This recognition can be derived from various sources, including audio, text, and physiological responses, making multimodal approaches particularly effective. The importance of this task has grown significantly due to its potential to improve human-computer interaction, providing better feedback and usability in various applications such as social media, education, robotics, marketing and entertainment. Despite its potential, emotional expression is an heterogeneous phenomenon influenced by factors such as age, gender, sociocultural origin and mental health. Our study addresses these complexities and presents our findings from the recent EmoSpeech competition. Our system achieved an F1 score of 0.7256 and a precision of 0.7043, with a precision of 0.7013, in the validation task. For multimodal task 1, our CogniCIC team ranked second with an official F1 score of 0.657527 and for task 2, with an F1 score of 0.712259. These results underline the effectiveness of our approach in multimodal emotion recognition and its potential for practical applications. © 2024 Copyright for this paper by its authors. |