MULTI-MODAL EMOTION RECOGNITION MODEL USING GENERATIVE ADVERSARIAL NETWORKS (GANs) FOR AUGMENTING FACIAL EXPRESSIONS AND PHYSIOLOGICAL SIGNALS

Abya Newton Hegh; Adekunle Adedotun Adeyelu; Aamo Iorliam; Samera U. Otor

doi:10.33003/fjs-2025-0905-3412

Authors

Abya Newton Hegh
dekerakenneth2014@gmail.com

Benue State University, Makurdi
Adekunle Adedotun Adeyelu
Benue State University, Makurdi
Aamo Iorliam
Benue State University, Makurdi
Samera U. Otor
Benue State University, Makurdi

Keywords:

Multimodal Emotion Recognition, Deep Learning, Facial Expression Analysis, Generative Adversarial Networks (GANs), Feature Fusion, Time-Series Classification

Abstract

Emotion recognition is a critical area of research with applications in healthcare, human-computer interaction (HCI), security, and entertainment. This study addressed the limitations of single-modal emotion recognition systems by developing a multi-modal emotion recognition model that integrates facial expressions and physiological signals, enhanced by Generative Adversarial Networks (GANs). It aims at improving accuracy, reliability, and robustness in emotion detection, particularly underrepresented emotions. The study utilized the FER-2013 dataset for facial expressions and the DEAP dataset for physiological signals. GANs were employed to augment datasets, address class imbalances and enhance feature diversity. A hybrid multi-modal model was developed, combining Convolutional Neural Networks (CNNs) for facial expression recognition and Long Short-Term Memory (LSTM) networks for physiological signal analysis. Hybrid fusion was used to integrate features at multiple levels, maximizing the complementary strengths of each modality. The results demonstrate significant improvements in emotion recognition. Without GAN augmentation, the CNN and LSTM models achieved accuracies of 62% and 76%, respectively. The hybrid model outperformed, gaining 90% across all metrics. With GAN-augmented datasets, the CNN and LSTM models improved to 81% and 86%, respectively, while the hybrid (multi-modal) model achieved state-of-the-art performance with 93% accuracy and an F1-score of 92%. These findings underscore the efficacy of GANs in enhancing data diversity and the advantages of multi-modal integration for robust emotion recognition. The study contributes to knowledge by introducing a GAN-augmented hybrid multi-modal framework, advancing methodologies in emotion recognition. Recommendations for future work include addressing ethical considerations in emotion recognition systems.

Dimensions

REFERENCES

Abdulyekeen, R. (2025). Artificial Intelligence Driven and Comparative Analysis of Pulmonary Disease Prediction Employing Random Forest for Accurate Diagnosis. FUDMA Journal of Sciences (FJS), 9, 229235. https://doi.org/https://doi.org/10.33003/fjs-2025-09(AHBSI)-3433

Alharbawee, L., & Pugeault, N. (2024). Generative Adversarial Networks for Facial Expression Recognition in the Wild. International Journal of Computing and Digital Systems, 15(1), 12591282. https://doi.org/10.12785/ijcds/160193

Ali, K., & Hughes, C. E. (2023). A Unified Transformer-based Network for multimodal Emotion Recognition. 14(8). http://arxiv.org/abs/2308.14160

Bao, J., Tao, X., & Zhou, Y. (2024). An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment. IEEE Transactions on Computational Social Systems, 11(1), 171183. https://doi.org/10.1109/TCSS.2022.3221128

Cimtay, Y., Ekmekcioglu, E., & Caglar-Ozhan, S. (2020). Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access, 8, 168865168878. https://doi.org/10.1109/ACCESS.2020.3023871

Eke, C. I., Norman, A. A., & Shuib, L. (2021). Context-Based Feature Technique for Sarcasm Identification in Benchmark Datasets Using Deep Learning and BERT Model. IEEE Access, 9, 4850148518. https://doi.org/10.1109/ACCESS.2021.3068323

Guangcheng, B., Yan, B., Tong, L., Shu, J., Wang, L., Yang, K., & Zeng, Y. (2021). Data Augmentation for EEG-Based Emotion Recognition Using Generative Adversarial Networks. Frontiers in Computational Neuroscience, 15(December). https://doi.org/10.3389/fncom.2021.723843

Hongli, Z. (2020). Expression-eeg based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access, 8, 164130164143. https://doi.org/10.1109/ACCESS.2020.3021994

Jang, E. H., Byun, S., Park, M. S., & Sohn, J. H. (2019). Reliability of Physiological Responses Induced by Basic Emotions: A Pilot Study. Journal of Physiological Anthropology, 38(1), 112. https://doi.org/10.1186/s40101-019-0209-y

Jianhua, Z., Yin, Z., Chen, P., & Nichele, S. (2020). Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Information Fusion, 59(March 2019), 103126. https://doi.org/10.1016/j.inffus.2020.01.011

Khan, N., & Sarkar, N. (2022). Semi-Supervised Generative Adversarial Network for Stress Detection Using Partially Labeled Physiological Data. http://arxiv.org/abs/2206.14976

Kwaghtyo, K. D., & Eke, C. I. (2022). Smart farming prediction models for precision agriculture: a comprehensive survey. In Artificial Intelligence Review (Issue 0123456789). Springer Netherlands. https://doi.org/10.1007/s10462-022-10266-6

Ma, F., Li, Y., Ni, S., Huang, S., & Zhang, L. (2022). Data Augmentation for AudioVisual Emotion Recognition with an Efficient Multimodal Conditional GAN. Applied Sciences (Switzerland), 12(1). https://doi.org/10.3390/app12010527

Muhammad, F., Hussain, M., & Aboalsamh, H. (2023). A Bimodal Emotion Recognition Approach through the Fusion of Electroencephalography and Facial Sequences. Diagnostics, 13(5). https://doi.org/10.3390/diagnostics13050977

Nakisa, B., Rastgoo, M. N., Rakotonirainy, A., Maire, F., & Chandran, V. (2020). Automatic Emotion Recognition Using Temporal Multimodal Deep Learning. IEEE Access, 8, 225463225474. https://doi.org/10.1109/ACCESS.2020.3027026

Nemati, S., Rohani, R., Basiri, M. E., Abdar, M., Yen, N. Y., & Makarenkov, V. (2019). A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition. IEEE Access, 7, 172948172964. https://doi.org/10.1109/ACCESS.2019.2955637

Salama, E. S., El-Khoribi, R. A., Shoman, M. E., & Wahby Shalaby, M. A. (2021). A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition. Egyptian Informatics Journal, 22(2), 167176. https://doi.org/10.1016/j.eij.2020.07.005

Siddiqui, M. F. H., Dhakal, P., Yang, X., & Javaid, A. Y. (2022). A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) Database. Multimodal Technologies and Interaction, 6(6). https://doi.org/10.3390/mti6060047

Soleimani, S. (2024). Deep Learning Architectures for Enhanced Emotion Recognition from EEG and Facial Expressions Sareh Soleimani.

Song, T., Zheng, W., Lu, C., Zong, Y., Zhang, X., & Cui, Z. (2019). MPED: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access, 7, 1217712191. https://doi.org/10.1109/ACCESS.2019.2891579

Sung-Nien, Y., Shao-Wei, W., & Chang, Y. P. (2022). Improving Distinguishability of Photoplethysmography in Emotion Recognition Using Deep Convolutional Generative Adversarial Networks. IEEE Access, 10(November), 119630119640. https://doi.org/10.1109/ACCESS.2022.3221774

Taisheng, Z., Song, L., Wang, J., Teng, W., Xu, X., & Ma, C. (2020). Data synthesis using dual discriminator conditional generative adversarial networks for imbalanced fault diagnosis of rolling bearings. Measurement: Journal of the International Measurement Confederation, 158(January). https://doi.org/10.1016/j.measurement.2020.107741

Ullah, A., Wang, J., Anwar, M. S., Whangbo, T. K., & Zhu, Y. (2021). Empirical Investigation of Multimodal Sensors in Novel Deep Facial Expression Recognition In-the-Wild. Journal of Sensors, 2021. https://doi.org/10.1155/2021/8893661

Wei, W. (2024). Targeted generative adversarial network (TWGAN-GP)-based emotion recognition of ECG signals. E3S Web of Conferences, 522. https://doi.org/10.1051/e3sconf/202452201042

Win, S. S. K., Siritanawan, P., & Kotani, K. (2023). Compound facial expressions image generation for complex emotions. Multimedia Tools and Applications, 82(8), 1154911588. https://doi.org/10.1007/s11042-022-14289-7

Yan, X., Zhao, L. M., & Lu, B. L. (2021). Simplifying Multimodal Emotion Recognition with Single Eye Movement Modality. MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, 10571063. https://doi.org/10.1145/3474085.3475701

Younis, E. M. G., Mohsen, S., Houssein, E. H., & Ibrahim, O. A. S. (2024). Machine learning for human emotion recognition: a comprehensive review. In Neural Computing and Applications (Vol. 36, Issue 16). Springer London. https://doi.org/10.1007/s00521-024-09426-2

Zhang, Y., Cheng, C., & Zhang, Y. (2021). Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network. IEEE Access, 9, 79437951. https://doi.org/10.1109/ACCESS.2021.3049516

Zhang, Y., Tao, X., Ai, H., Chen, T., Zhang, Y., Tao, X., Ai, H., Chen, T., & Gan, Y. (2024). Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios.

Zheng, W. L., Zhu, J. Y., & Lu, B. L. (2019). Identifying stable patterns over time for emotion recognition from eeg. IEEE Transactions on Affective Computing, 10(3), 417429. https://doi.org/10.1109/TAFFC.2017.2712143

Zhong, P., Wang, D., & Miao, C. (2022). EEG-Based Emotion Recognition Using Regularised Graph Neural Networks. IEEE Transactions on Affective Computing, 13(3), 12901301. https://doi.org/10.1109/TAFFC.2020.2994159

MULTI-MODAL EMOTION RECOGNITION MODEL USING GENERATIVE ADVERSARIAL NETWORKS (GANs) FOR AUGMENTING FACIAL EXPRESSIONS AND PHYSIOLOGICAL SIGNALS

Authors

Keywords:

Abstract

REFERENCES

Published

How to Cite

Issue

Section

How to Cite

Most read articles by the same author(s)

Make a Submission

Browse

Developed By

Information

Latest publications