Explainable Artificial Intelligence Techniques for Speech Emotion Recognition: A Focus on XAI Models

Authors

  • Michael Norval University of South Africa, Johannesburg
  • Zenghui Wang University of South Africa, Johannesburg

DOI:

https://doi.org/10.4114/intartif.vol28iss76pp85-123

Keywords:

Artificial Intelligence, Speech Emotion Recognition, Shapley additive explanations, Local Interpretable Model-agnostic

Abstract

This study employs Explainable Artificial Intelligence (XAI) techniques, including SHAP, LIME, and XGBoost, to interpret speech-emotion recognition (SER) models. Unlike previous work focusing on generic datasets, this research integrates these tools to explore the unique emotional nuances within an Afrikaans speech corpus. The complexity of architectures poses significant challenges regarding model interpretability. This paper explicitly aims to bridge the gaps in existing Speech Emotion Recognition (SER) systems by integrating advanced Explainable Artificial Intelligence (XAI) techniques. The objective is to develop an Ensemble stacking model that combines CNN, CLSTM, and XGBoost, augmented by SHAP and LIME, to enhance the interpretability, accuracy, and adaptability of SER systems, particularly for underrepresented languages like Afrikaans. Our research methodology involves utilising XAI methods to explain the decision-making processes of CNN and CLSTM models in speech emotion recognition (SER) to enhance trust, diagnostic insight, and theoretical understanding. We train the models for SER using a comprehensive dataset of emotional speech samples. Post-training, we apply SHAP and LIME to these models to generate explanations for their predictions, focusing on the importance of features
and the models’ decision logic. By comparing the explanations generated by SHAP and LIME, we assess the efficacy of each method in providing meaningful insights into the models’ operations. The comparative study of various models in SER demonstrates their capability to discern complex emotional states through diverse analytical approaches, from spatial feature extraction to temporal dynamics. Our research reveals that XAI techniques improve the interpretability of complex SER models. This enhanced transparency builds end-user trust and provides valuable insights. This study contributes to the importance of explainability in deploying AI technologies in emotionally sensitive applications, paving the way for more accountable and user-centric SER systems.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2025-06-17

How to Cite

Norval, M., & Wang, Z. (2025). Explainable Artificial Intelligence Techniques for Speech Emotion Recognition: A Focus on XAI Models. Inteligencia Artificial, 28(76), 85–123. https://doi.org/10.4114/intartif.vol28iss76pp85-123