MULTI-MODAL EMOTION RECOGNITION MODEL USING GENERATIVE ADVERSARIAL NETWORKS (GANs) FOR AUGMENTING FACIAL EXPRESSIONS AND PHYSIOLOGICAL SIGNALS
Abstract
Emotion recognition is a critical area of research with applications in healthcare, human-computer interaction (HCI), security, and entertainment. This study addressed the limitations of single-modal emotion recognition systems by developing a multi-modal emotion recognition model that integrates facial expressions and physiological signals, enhanced by Generative Adversarial Networks (GANs). It aims at improving accuracy, reliability, and robustness in emotion detection, particularly underrepresented emotions. The study utilized the FER-2013 dataset for facial expressions and the DEAP dataset for physiological signals. GANs were employed to augment datasets, address class imbalances and enhance feature diversity. A hybrid multi-modal model was developed, combining Convolutional Neural Networks (CNNs) for facial expression recognition and Long Short-Term Memory (LSTM) networks for physiological signal analysis. Hybrid fusion was used to integrate features at multiple levels, maximizing the complementary strengths of each modality. The results demonstrate significant improvements in emotion recognition. Without GAN augmentation, the CNN and LSTM models achieved accuracies of 62% and 76%, respectively. The hybrid model outperformed, gaining 90% across all metrics. With GAN-augmented datasets, the CNN and LSTM models improved to 81% and 86%, respectively, while the hybrid (multi-modal) model achieved state-of-the-art performance with 93% accuracy and an F1-score of 92%. These findings underscore the efficacy of GANs in enhancing data diversity and the advantages of multi-modal integration for robust emotion recognition. The study contributes to knowledge by introducing a GAN-augmented hybrid multi-modal framework, advancing methodologies in emotion recognition. Recommendations for future work include addressing ethical considerations in emotion recognition systems.
References
Abdulyekeen, R. (2025). Artificial Intelligence Driven and Comparative Analysis of Pulmonary Disease Prediction Employing Random Forest for Accurate Diagnosis. FUDMA Journal of Sciences (FJS), 9, 229235. https://doi.org/https://doi.org/10.33003/fjs-2025-09(AHBSI)-3433 DOI: https://doi.org/10.33003/fjs-2025-09(AHBSI)-3433
Alharbawee, L., & Pugeault, N. (2024). Generative Adversarial Networks for Facial Expression Recognition in the Wild. International Journal of Computing and Digital Systems, 15(1), 12591282. https://doi.org/10.12785/ijcds/160193 DOI: https://doi.org/10.12785/ijcds/160193
Ali, K., & Hughes, C. E. (2023). A Unified Transformer-based Network for multimodal Emotion Recognition. 14(8). http://arxiv.org/abs/2308.14160 DOI: https://doi.org/10.36227/techrxiv.23916123
Bao, J., Tao, X., & Zhou, Y. (2024). An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment. IEEE Transactions on Computational Social Systems, 11(1), 171183. https://doi.org/10.1109/TCSS.2022.3221128 DOI: https://doi.org/10.1109/TCSS.2022.3221128
Cimtay, Y., Ekmekcioglu, E., & Caglar-Ozhan, S. (2020). Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access, 8, 168865168878. https://doi.org/10.1109/ACCESS.2020.3023871 DOI: https://doi.org/10.1109/ACCESS.2020.3023871
Eke, C. I., Norman, A. A., & Shuib, L. (2021). Context-Based Feature Technique for Sarcasm Identification in Benchmark Datasets Using Deep Learning and BERT Model. IEEE Access, 9, 4850148518. https://doi.org/10.1109/ACCESS.2021.3068323 DOI: https://doi.org/10.1109/ACCESS.2021.3068323
Guangcheng, B., Yan, B., Tong, L., Shu, J., Wang, L., Yang, K., & Zeng, Y. (2021). Data Augmentation for EEG-Based Emotion Recognition Using Generative Adversarial Networks. Frontiers in Computational Neuroscience, 15(December). https://doi.org/10.3389/fncom.2021.723843 DOI: https://doi.org/10.3389/fncom.2021.723843
Hongli, Z. (2020). Expression-eeg based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access, 8, 164130164143. https://doi.org/10.1109/ACCESS.2020.3021994 DOI: https://doi.org/10.1109/ACCESS.2020.3021994
Jang, E. H., Byun, S., Park, M. S., & Sohn, J. H. (2019). Reliability of Physiological Responses Induced by Basic Emotions: A Pilot Study. Journal of Physiological Anthropology, 38(1), 112. https://doi.org/10.1186/s40101-019-0209-y DOI: https://doi.org/10.1186/s40101-019-0209-y
Jianhua, Z., Yin, Z., Chen, P., & Nichele, S. (2020). Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Information Fusion, 59(March 2019), 103126. https://doi.org/10.1016/j.inffus.2020.01.011 DOI: https://doi.org/10.1016/j.inffus.2020.01.011
Khan, N., & Sarkar, N. (2022). Semi-Supervised Generative Adversarial Network for Stress Detection Using Partially Labeled Physiological Data. http://arxiv.org/abs/2206.14976
Kwaghtyo, K. D., & Eke, C. I. (2022). Smart farming prediction models for precision agriculture: a comprehensive survey. In Artificial Intelligence Review (Issue 0123456789). Springer Netherlands. https://doi.org/10.1007/s10462-022-10266-6 DOI: https://doi.org/10.1007/s10462-022-10266-6
Ma, F., Li, Y., Ni, S., Huang, S., & Zhang, L. (2022). Data Augmentation for AudioVisual Emotion Recognition with an Efficient Multimodal Conditional GAN. Applied Sciences (Switzerland), 12(1). https://doi.org/10.3390/app12010527 DOI: https://doi.org/10.3390/app12010527
Muhammad, F., Hussain, M., & Aboalsamh, H. (2023). A Bimodal Emotion Recognition Approach through the Fusion of Electroencephalography and Facial Sequences. Diagnostics, 13(5). https://doi.org/10.3390/diagnostics13050977 DOI: https://doi.org/10.3390/diagnostics13050977
Nakisa, B., Rastgoo, M. N., Rakotonirainy, A., Maire, F., & Chandran, V. (2020). Automatic Emotion Recognition Using Temporal Multimodal Deep Learning. IEEE Access, 8, 225463225474. https://doi.org/10.1109/ACCESS.2020.3027026 DOI: https://doi.org/10.1109/ACCESS.2020.3027026
Nemati, S., Rohani, R., Basiri, M. E., Abdar, M., Yen, N. Y., & Makarenkov, V. (2019). A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition. IEEE Access, 7, 172948172964. https://doi.org/10.1109/ACCESS.2019.2955637 DOI: https://doi.org/10.1109/ACCESS.2019.2955637
Salama, E. S., El-Khoribi, R. A., Shoman, M. E., & Wahby Shalaby, M. A. (2021). A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition. Egyptian Informatics Journal, 22(2), 167176. https://doi.org/10.1016/j.eij.2020.07.005 DOI: https://doi.org/10.1016/j.eij.2020.07.005
Siddiqui, M. F. H., Dhakal, P., Yang, X., & Javaid, A. Y. (2022). A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) Database. Multimodal Technologies and Interaction, 6(6). https://doi.org/10.3390/mti6060047 DOI: https://doi.org/10.3390/mti6060047
Soleimani, S. (2024). Deep Learning Architectures for Enhanced Emotion Recognition from EEG and Facial Expressions Sareh Soleimani.
Song, T., Zheng, W., Lu, C., Zong, Y., Zhang, X., & Cui, Z. (2019). MPED: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access, 7, 1217712191. https://doi.org/10.1109/ACCESS.2019.2891579 DOI: https://doi.org/10.1109/ACCESS.2019.2891579
Sung-Nien, Y., Shao-Wei, W., & Chang, Y. P. (2022). Improving Distinguishability of Photoplethysmography in Emotion Recognition Using Deep Convolutional Generative Adversarial Networks. IEEE Access, 10(November), 119630119640. https://doi.org/10.1109/ACCESS.2022.3221774 DOI: https://doi.org/10.1109/ACCESS.2022.3221774
Taisheng, Z., Song, L., Wang, J., Teng, W., Xu, X., & Ma, C. (2020). Data synthesis using dual discriminator conditional generative adversarial networks for imbalanced fault diagnosis of rolling bearings. Measurement: Journal of the International Measurement Confederation, 158(January). https://doi.org/10.1016/j.measurement.2020.107741 DOI: https://doi.org/10.1016/j.measurement.2020.107741
Ullah, A., Wang, J., Anwar, M. S., Whangbo, T. K., & Zhu, Y. (2021). Empirical Investigation of Multimodal Sensors in Novel Deep Facial Expression Recognition In-the-Wild. Journal of Sensors, 2021. https://doi.org/10.1155/2021/8893661 DOI: https://doi.org/10.1155/2021/8893661
Wei, W. (2024). Targeted generative adversarial network (TWGAN-GP)-based emotion recognition of ECG signals. E3S Web of Conferences, 522. https://doi.org/10.1051/e3sconf/202452201042 DOI: https://doi.org/10.1051/e3sconf/202452201042
Win, S. S. K., Siritanawan, P., & Kotani, K. (2023). Compound facial expressions image generation for complex emotions. Multimedia Tools and Applications, 82(8), 1154911588. https://doi.org/10.1007/s11042-022-14289-7 DOI: https://doi.org/10.1007/s11042-022-14289-7
Yan, X., Zhao, L. M., & Lu, B. L. (2021). Simplifying Multimodal Emotion Recognition with Single Eye Movement Modality. MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, 10571063. https://doi.org/10.1145/3474085.3475701 DOI: https://doi.org/10.1145/3474085.3475701
Younis, E. M. G., Mohsen, S., Houssein, E. H., & Ibrahim, O. A. S. (2024). Machine learning for human emotion recognition: a comprehensive review. In Neural Computing and Applications (Vol. 36, Issue 16). Springer London. https://doi.org/10.1007/s00521-024-09426-2 DOI: https://doi.org/10.1007/s00521-024-09426-2
Zhang, Y., Cheng, C., & Zhang, Y. (2021). Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network. IEEE Access, 9, 79437951. https://doi.org/10.1109/ACCESS.2021.3049516 DOI: https://doi.org/10.1109/ACCESS.2021.3049516
Zhang, Y., Tao, X., Ai, H., Chen, T., Zhang, Y., Tao, X., Ai, H., Chen, T., & Gan, Y. (2024). Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios.
Zheng, W. L., Zhu, J. Y., & Lu, B. L. (2019). Identifying stable patterns over time for emotion recognition from eeg. IEEE Transactions on Affective Computing, 10(3), 417429. https://doi.org/10.1109/TAFFC.2017.2712143 DOI: https://doi.org/10.1109/TAFFC.2017.2712143
Zhong, P., Wang, D., & Miao, C. (2022). EEG-Based Emotion Recognition Using Regularised Graph Neural Networks. IEEE Transactions on Affective Computing, 13(3), 12901301. https://doi.org/10.1109/TAFFC.2020.2994159 DOI: https://doi.org/10.1109/TAFFC.2020.2994159
Copyright (c) 2025 FUDMA JOURNAL OF SCIENCES

This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences