Reproducible and generalizable speech emotion recognition via an Intelligent Fusion Network
Published in Biomedical Signal Processing and Control, 2025
We propose the Intelligent Fusion Network (IFN), a novel architecture combining dual attention, feature refinement, and multiplicative fusion to enhance speech emotion recognition (SER) performance and reproducibility. Extensive experiments across six benchmark datasets demonstrate IFN’s superior accuracy and generalizability, establishing it as a reliable and effective solution for advancing human-computer interaction.
Recommended citation: Zhang, H., Zhao, P., Tang, G., Li, Z., & Yuan, Z. (2025). Reproducible and generalizable speech emotion recognition via an Intelligent Fusion Network. Biomedical Signal Processing and Control, 109, 107996. https://www.sciencedirect.com/science/article/abs/pii/S1746809425005075
