BAYESIAN-OPTIMIZED ENSEMBLE SUPPORT VECTOR MACHINE MODEL FOR PHISHING EMAIL DETECTION
DOI:
https://doi.org/10.33003/fjs-2025-0912-4356Keywords:
Phishing Detection, Ensemble Learning, Bayesian OptimizationAbstract
With the rapid growth of email use, phishing and malware attacks have become more frequent and sophisticated, often slipping past traditional defenses such as blacklists and rule-based filters. Existing detection models, including SVM, XGBoost, and CNN, have improved accuracy but still depend heavily on manually crafted features and struggle to adapt to new or evolving attack patterns. This challenge creates the need for a more flexible and intelligent detection approach capable of learning and adapting to emerging email threats. This study aims to develop an ensemble phishing email detection model combining SVM and XGBoost, optimize it using Bayesian tuning, and evaluate its performance through accuracy, precision, recall, F1-score, and ROC-AUC metrics. This study used an ensemble approach that combines SVM and XGBoost to detect phishing emails. Various SVM models, including Baseline, Grid Search, SGD, and Bayesian-optimized versions, were developed and tested. An optimized Bayesian model was developed to improve accuracy, with performance evaluated using accuracy, precision, recall, F1-score, and ROC-AUC. A well-known Kaggle phishing dataset was used for fair comparison. After cleaning and reducing 10,000 emails with 1,250 features to 9,872 emails and 500 cleaned features, the Baseline SVM reached 0.9287 accuracy, Grid Search SVM improved to 0.96, and SGD SVM slightly dropped to 0.92. The Bayesian SVM performed best at 0.9667, showing greater stability and generalization. The Bayesian-optimized Hybrid Ensemble SVM–XGBoost achieved 0.992 accuracy and 0.9992 ROC-AUC, confirming its strong reliability and effectiveness in phishing detection. Stacking substantially enhanced model stability, generalization, and real-time reliability
References
Abdillah, R., Shukur, Z., Mohd, M., & Murah, T. M. Z. (2022). Phishing Classification Techniques: A
Systematic Literature Review. In IEEE Access (Vol. 10, pp. 41574–41591). Institute of Electrical
and Electronics Engineers Inc. https://doi.org/10.1109/ACCESS.2022.3166474
Alam, S., Jameel, A., Parveen, Z., & Alnfrawy, E. (2025). Date of publication xxxx 00, 0000, date of
current version xxxx 00, 0000. SHRED: An Ensemble-Based Machine Learning Model to Sift
Email Messages for Real-Time Spam Detection. https://doi.org/10.1109/ACCESS.2025.DOI
Anirudh, S., Radha Nishant, P., Baitha, S., & Dinesh Kumar, K. (2024). An Ensemble Classification
Model for Phishing Mail Detection. Procedia Computer Science, 233, 970–978.
https://doi.org/10.1016/j.procs.2024.03.286
Birthriya, S. K., Ahlawat, P., & Jain, A. K. (2025). Phishing Website Detection with XGBoost and
Adaptive Hyperparameter Optimization using the Bat Algorithm. Procedia Computer Science,
258, 1774–1782. https://doi.org/10.1016/j.procs.2025.04.429
Butt, U. A., Amin, R., Aldabbas, H., Mohan, S., Alouffi, B., & Ahmadian, A. (2023). Cloud-based
email phishing attack using machine and deep learning algorithm. Complex and Intelligent
Systems, 9(3), 3043–3070. https://doi.org/10.1007/s40747-022-00760-3
Chinta, P. C. R., Moore, C. S., Karaka, L. M., Sakuru, M., Bodepudi, V., & Maka, S. R. (2025).
Building an Intelligent Phishing Email Detection System Using Machine Learning and Feature
Engineering. European Journal of Applied Science, Engineering and Technology, 3(2), 41–54.
https://doi.org/10.59324/ejaset.2025.3(2).04
Fares, H., Kilani, J., Fagroud, F. E., Toumi, H., Lakrami, F., Baddi, Y., & Aknin, N. (2024). Machine
Learning Approach for Email Phishing Detection. Procedia Computer Science, 251, 746–751.
https://doi.org/10.1016/j.procs.2024.11.179
Ibrahim R. B, M. S. A. I. M. U. (2023). Development of an Ensemble Classification Model Based On
Hybrid Filter-Wrapper Feature Selection For Email Phshing Detection.
Iqbal, A., Younas, M., Iftikhar, S., Fatima, F., & Saleem, R. (2025). Spam detection using hybrid
model on fusion of spammer behavior and linguistics features. Egyptian Informatics Journal, 29.
https://doi.org/10.1016/j.eij.2024.100605
Kalabarige, L. R., Rao, R. S., Abraham, A., & Gabralla, L. A. (2022). Multilayer Stacked Ensemble
Learning Model to Detect Phishing Websites. IEEE Access, 10, 79543–79552.
https://doi.org/10.1109/ACCESS.2022.3194672
Kiseki, D. W., Havyarimana, V., Zabagunda, D. L., Wail, W. I., & Niyonsaba, T. (2024). Artificial
Intelligence in Cybersecurity to Detect Phishing. Journal of Computer and Communications,
12(12), 91–115. https://doi.org/10.4236/jcc.2024.1212007
Ntayagabiri, J. P., Bentaleb, Y., Ndikumagenge, J., & El Makhtoum, H. (2025). OMIC: A Bagging-
Based Ensemble Learning Framework for Large-Scale IoT Intrusion Detection. Journal of
Future Artificial Intelligence and Technologies, 1(4), 401–416.
https://doi.org/10.62411/faith.3048-3719-63
Qiqieh, I., Alzubi, O., Alzubi, J., Sreedhar, K. C., & Al-Zoubi, A. M. (2024). An intelligent cyber
threat detection: A swarm-optimized machine learning approach. Alexandria Engineering
Journal. https://doi.org/10.1016/j.aej.2024.12.039
Ramesh, K., & Hafeez, K. (2024). Phishing Detection and Mitigation: A Cybersecurity and Machine
Learning Approach MSc Research Project MSc Cyber Security.
Rashid, M. U., Qureshi, S., Abid, A., Alqahtany, S. S., Alqazzaz, A., ul Hassan, M., Al Reshan, M. S.,
& Shaikh, A. (2025). Hybrid Android Malware Detection and Classification Using Deep Neural
Networks. International Journal of Computational Intelligence Systems, 18(1).
https://doi.org/10.1007/s44196-025-00783-x
Sankaine, L., Ndia, J. G., & Kaburu, D. (2025). An English-Swahili Email Spam Detection Model for
Improved Accuracy Using Convolutional Neural Networks. Mesopotamian Journal of
CyberSecurity, 5(2), 590–605. https://doi.org/10.58496/MJCS/2025/036
Saravana Kumar, S. (2022). Adaptive Ensemble Learning Framework For Evolving Social
Engineering Threats. www.ijesat.com
Tusher, E. H., Ismail, M. A., Rahman, M. A., Alenezi, A. H., & Uddin, M. (2024). Email Spam: A
Comprehensive Review of Optimize Detection Methods, Challenges, and Open Research
Problems. IEEE Access, 12, 143627–143657. https://doi.org/10.1109/ACCESS.2024.3467996
van Geest, R. J., Cascavilla, G., Hulstijn, J., & Zannone, N. (2024). The applicability of a hybrid
framework for automated phishing detection. Computers and Security, 139.
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2025 Igba Aji L., Idris Ismaila, Subairu Sikiru. O., Noel Moses. D., Ahmed Sulemam

This work is licensed under a Creative Commons Attribution 4.0 International License.