EXPLAINABLE MACHINE LEARNING FOR STUDENT ACADEMIC PERFORMANCE PREDICTION IN DATA-CONSTRAINED EDUCATIONAL SETTINGS
DOI:
https://doi.org/10.33003/fjs-2026-1006-4633Keywords:
Student Performance Prediction, Machine Learning, Fairness Auditing, Interpretability Stability, Dataset ConstraintAbstract
The research explores the effect of a limited dataset in terms of size on machine learning models used for predicting student academic performance in schools. Model generalization can easily be achieved when the dataset size is large. This is not the case in educational data mining, due to the limited data in the field, making model generalization, interpretability, and fairness difficult to achieve. A stratified subsampling technique was used at sizes of 50, 100, 150, 200, and 300 to handle the data scarcity problem. In the same vein, an oversampling strategy was used to balance classes that were not well represented to eliminate bias. SHAP and permutation importance were used to perform interpretability of results, while Spearman rank correlation and Jaccard similarity were used for explanation stability. A fairness audit was also carried out to identify how other socio-demographic factors, other than gender, affect academic performance. The dataset used in this research is the mathematics scores of Portuguese students. Standardized scaling, ordinal encoding, one-hot encoding, and target attribute definition (pass or fail) are performed on the dataset. Logistic Regression, Random Forest, and XGBoost models were trained and evaluated on the dataset. The results from the models were evaluated using accuracy, F1-score, AUC, and Brier score metrics. Results show that Random Forest and XGBoost performed better in terms of accuracy, robustness, and calibration, even with small datasets, when compared to the Logistic Regression model.
References
Ahmed, W., Wani, M. A., Plawiak, P., Meshoul, S., Mahmoud, A., & Hammad, M. (2025). Machinelearning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Scientific Reports, 15(1), 26879.
Biswas, S., Grundlingh, N., Boardman, J., White, J., & Le, L. (2025). A Target Permutation Test for Statistical Significance of Feature Importance in Differentiable Models. Electronics, 14(3), 571. https://doi.org/10.3390/electronics14030571
Chandralekha, E., Dhineesh, I., Reddy, G. L., & Ganesh, T. (2025, June). IoT-Enabled Device for Predictive Monitoring and Disease Management in Cow. In 2025 3rd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS) (pp. 531-537). IEEE.
Esomonu, N. P. M. (2025). Utilizing AI and Big Data for Predictive Insights on Institutional Performance and Student Success: A Data-Driven Approach to Quality Assurance. AI and Ethics, Academic Integrity and the Future of Quality Assurance in Higher Education, 29.
Kalita, E., Alfarwan, A. M., El Aouifi, H., Kukkar, A., Hussain, S., Ali, T., & Gaftandzhieva, S. (2025, June). Predicting student academic performance using Bi-LSTM: a deep learning framework with SHAP-based interpretability and statistical validation. In Frontiers in Education (Vol. 10, p. 1581247). Frontiers Media SA.
Kesgin, K., Kiraz, S., Kosunalp, S., & Stoycheva, B. (2025). Beyond Performance: Explaining and Ensuring Fairness in Student Academic Performance Prediction with Machine Learning. Applied Sciences, 15(15), 8409.
Lünich, M., & Keller, B. (2024, June). Explainable artificial intelligence for academic performance prediction. an experimental study on the impact of accuracy and simplicity of decision trees on causability and fairness perceptions. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 1031-1042).
Ngulube, P. (2025). Predicting Academic Success and Identifying At-Risk Students: A Systematic Review of Data Analytics and Machine Learning Approaches in Higher Education Institutions. Educational Administration: Theory and Practice, 31(1), 117-134.
Raftopoulos, G., Davrazos, G., & Kotsiantis, S. (2024). Fair and transparent student admission prediction using machine learning models. Algorithms, 17(12), 572.
Ramaswami, G., Susnjak, T., & Mathrani, A. (2022). Supporting students’ academic performance using explainable machine learning with automated prescriptive analytics. Big Data and Cognitive Computing, 6(4), 105.
Sanfo, J. B. M. (2025). Application of explainable artificial intelligence approach to predict student learning outcomes. Journal of Computational Social Science, 8(1), 9.
Taylanova, S. Z. (2024). The Rationale for the Present Study is Based on the Following Pedagogical Conditions for Developing Students’ Technical Thinking in English Language Classes. Best Journal of Innovation in Science, Research and Development, 3(12), 101-110.
Wang, X., & Tris, K. (2025). Integrating shapley value and least core attribution for robust explainable AI in rent prediction. Buildings, 15(17), 3133. doi:https://doi.org/10.3390/buildings15173133
Zollanvari, A. (2023). Ensemble Learning. In Machine Learning with Python: Theory and Implementation (pp. 209-236). Cham: Springer International Publishing.
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2026 Godwin Otu, Frederick Okonkwo, Lucky Okonkwo, Opeyemi Adekogba, Joshua Anche

This work is licensed under a Creative Commons Attribution 4.0 International License.