Reproducible and generalizable speech emotion recognition via an Intelligent Fusion Network

Published in Biomedical Signal Processing and Control, 2025

📄 Journal Article 📅 May 2025 🏛 Biomedical Signal Processing and Control Vol. 109, p. 107996 ✉️ Co-author

📝 Abstract

We propose the Intelligent Fusion Network (IFN), a novel architecture combining dual attention, feature refinement, and multiplicative fusion to enhance speech emotion recognition (SER) performance and reproducibility. Extensive experiments across six benchmark datasets demonstrate IFN's superior accuracy and generalizability, establishing it as a reliable and effective solution for advancing human-computer interaction.

🔀 Dual AttentionJointly attends to temporal and spectral dimensions for richer emotional feature extraction.
✨ Feature RefinementIterative refinement module reduces noise and sharpens emotionally discriminative representations.
✖️ Multiplicative FusionNon-linear fusion strategy captures complex interactions between heterogeneous feature streams.
📊 6 BenchmarksValidated across six diverse datasets, demonstrating broad generalizability and reproducibility.

📋 BibTeX Citation

@article{zhang2025ifn,
  title     = {Reproducible and generalizable speech emotion 
               recognition via an Intelligent Fusion Network},
  author    = {Zhang, H. and Zhao, Puyang and Tang, G. 
               and Li, Z. and Yuan, Z.},
  journal   = {Biomedical Signal Processing and Control},
  volume    = {109},
  pages     = {107996},
  year      = {2025},
  month     = {may},
  publisher = {Elsevier},
  url       = {https://www.sciencedirect.com/science/article/abs/pii/S1746809425005075}
}

Recommended citation: Zhang, H., Zhao, P., Tang, G., Li, Z., & Yuan, Z. (2025). Reproducible and generalizable speech emotion recognition via an Intelligent Fusion Network. Biomedical Signal Processing and Control, 109, 107996.
Download Paper