Sparse Temporal Aware Capsule Network for Robust Speech Emotion Recognition

Published in Engineering Applications of Artificial Intelligence, 2025

📄 Journal Article 📅 January 2025 🏛 Engineering Applications of Artificial Intelligence

📝 Abstract

This paper introduces a novel Sparse Temporal-Aware Capsule Network (STACN) architecture designed to enhance the accuracy and reliability of speech emotion recognition systems. By incorporating temporal awareness and sparsity constraints into the capsule routing mechanism, STACN addresses key limitations of standard capsule networks in modeling the dynamic, time-varying nature of emotional speech. The sparsity regularization focuses the network's representational capacity on the most emotionally salient time-frequency regions, improving both performance and interpretability across diverse acoustic conditions.

📋 BibTeX Citation

@article{zhang2025stacn,
  title     = {Sparse Temporal Aware Capsule Network for Robust 
               Speech Emotion Recognition},
  author    = {Zhang, H. and Huang, H. and Zhao, Puyang and Yu, Z.},
  journal   = {Engineering Applications of Artificial Intelligence},
  year      = {2025},
  month     = {jan},
  doi       = {10.1016/j.engappai.2025.110060},
  url       = {https://doi.org/10.1016/j.engappai.2025.110060},
  publisher = {Elsevier}
}

Recommended citation: Zhang, H., Huang, H., Zhao, P., & Yu, Z. (2025). Sparse Temporal Aware Capsule Network for Robust Speech Emotion Recognition. Engineering Applications of Artificial Intelligence.
Download Paper