AN ENHANCED CLASSIFICATION AND REGRESSION TREE ALGORITHM USING GINI EXPONENTIAL

  • Safinatu Bello Kaduna State University, Kaduna
  • Ahmad Abubakar Aliyu
  • Muhammad Aminu Ahmad
  • Adamu Abdullahi
  • Sa’adatu Abdulkadir
  • Abubakar Muazu Ahmed
  • Suleiman Dauda
Keywords: Gini index, Information gain, Decision Tree, Classification, Regression Tree

Abstract

Decision tree algorithms, particularly Classification and Regression Trees (CART), are widely used in machine learning for their simplicity, interpretability, and ability to handle both categorical and numerical data. However, traditional decision trees often encounter limitations when dealing with complex, high-dimensional, or imbalanced datasets, as conventional impurity measures such as the Gini Index and Information Gain may fail to capture subtle variations in the data effectively. This study enhances the traditional Classification and Regression Trees (CART) model by introducing the Gini Exponential Criterion, which incorporates an exponential weighting factor into the split point calculation process. This novel approach amplifies the influence of highly discriminative features, resulting in more refined splits and improved decision boundaries. The enhanced CART model was evaluated on two benchmark datasets: the wine quality dataset and the hypothyroid dataset, with preprocessing steps like feature scaling and SMOTE for class imbalance, and hyperparameter tuning via Bayesian Optimization. On the wine quality dataset, the enhanced model improved accuracy from 57% (traditional CART) to 86%, while on the hypothyroid dataset, it achieved an impressive accuracy of 98%. These results highlight the model's ability to handle complex and imbalanced data effectively. Feature importance analysis and decision tree visualization further demonstrated the model's interpretability. The study concludes that the Gini Exponential Criterion significantly improves CART's performance, offering better generalization and clearer decision boundaries. This advancement is particularly valuable for applications requiring precise and interpretable predictions, such as healthcare diagnostics and quality assessment. Future work could explore integrating this criterion into ensemble methods and...

References

Abedinia, A., & Seydi, V. (2024). Building semi-supervised decision trees with semi-cart algorithm. International Journal of Machine Learning and Cybernetics, 1-18. https://doi.org/10.1007/s13042-024-02161-z DOI: https://doi.org/10.1007/s13042-024-02161-z

Ali Fernando, W., Jollyta, D., Priyanto, D., & Oktarina, D. (2024). The Influence of Data Categorization and Attribute Instances Reduction Using The Gini Index On The Accuracy of The Classification Algorithm Model. Jurna lIlmiah Kursor, 12(3), 111-122. https://doi.org/10.21107/kursor.v12i3.372 DOI: https://doi.org/10.21107/kursor.v12i3.372

Ali, M. S. A. M., Zabidi, A., Tahir, N. M., Yassin, I. M., Eskandari, F., Saadon, A., ... & Ridzuan, A. R. (2024). Short-term Gini coefficient estimation using nonlinear autoregressive multilayer perceptron model. Heliyon, 10(4). https://doi.org/10.1016/j.heliyon.2024.e26438 DOI: https://doi.org/10.1016/j.heliyon.2024.e26438

Altaf, I., Butt, M. A., & Zaman, M. (2022, June). Systematic consequence of different splitting indices on the classification performance of random decision forest. In 2022 2nd International Conference on Intelligent Technologies (CONIT) (pp. 1-5). IEEE. http://dx.doi.org/10.1109/CONIT55038.2022.9848372 DOI: https://doi.org/10.1109/CONIT55038.2022.9848372

Awujoola, O. J., Ogwueleka, F. N., Irhebhude, M. E., & Misra, S. (2021). Wrapper based approach for network intrusion detection model with combination of dual filtering technique of resample and SMOTE. In Artificial Intelligence for Cyber Security: Methods, Issues and Possible Horizons or Opportunities (pp. 139-167). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-72236-4_6

Awujoola, J. O., Enem, T. A., Ogwueleka, F. N., Abioye, O., & Adelegan, R. O. (2024). Machine LearningEnabled Predictive Analytics for Quality Assurance in Industry 4.0 and Smart Manufacturing: A Case Study on Red and White Wine Quality Classification. In Industry 4.0, Smart Manufacturing, and Industrial Engineering (pp. 65-97). CRC Press. DOI: https://doi.org/10.1201/9781003473886-4

Bittencourt, J. C. N., Costa, D. G., Portugal, P., &Vasques, F. (2024). Towards lightweight fire detection at the extreme edge based on decision trees. In 2024 IEEE 22nd Mediterranean Electrotechnical Conference (MELECON) (pp. 873-878). IEEE. http://doi.org/10.1109/melecon56669.2024.10608598 DOI: https://doi.org/10.1109/MELECON56669.2024.10608598

Bouke, M. A., Abdullah, A., Frnda, J., Cengiz, K., & Salah, B. (2023). BukaGini: A stability-aware Gini index feature selection algorithm for robust model performance. IEEE Access, 11, 59386-59396. http://doi.org/10.1109/ACCESS.2023.3284975 DOI: https://doi.org/10.1109/ACCESS.2023.3284975

Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28. DOI: https://doi.org/10.38094/jastt20165

Disha, R. A., & Waheed, S. (2022). Performance analysis of machine learning models for intrusion detection system using Gini impurity-based weighted random forest (GIWRF) feature selection technique. Cybersecurity, 5(1), 1. https://doi.org/10.1186/s42400-021-00103-8 DOI: https://doi.org/10.1186/s42400-021-00103-8

Fayaz, S. A., Zaman, M., & Butt, M. A. (2021). An application of logistic model tree (LMT) algorithm to ameliorate Prediction accuracy of meteorological data. International Journal of Advanced Technology and Engineering Exploration, 8(84), 1424-1440. https://doi.org/10.19101/IJATEE.2021.874586. DOI: https://doi.org/10.19101/IJATEE.2021.874586

Hasija, Y. (2023). All about Bioinformatics: From Beginner to Expert. Elsevier. DOI: https://doi.org/10.1016/B978-0-443-15250-4.00012-5

Iorliam, I. B., Ikyo, B. A., Iorliam, A., Okube, E. O., Kwaghtyo, K. D., & Shehu, Y. I. (2021). Application of machine learning techniques for Okra shelf-life prediction. Journal of Data Analysis and Information Processing, 9(3), 136-150. https://doi.org/10.4236/jdaip.2021.93009 DOI: https://doi.org/10.4236/jdaip.2021.93009

Juszczuk, P., Kozak, J., Dziczkowski, G., Gowania, S., Jach, T., &Probierz, B. (2021). Real-world data difficulty estimation with the use of entropy. Entropy, 23(12), 1621. https://doi.org/10.3390/e23121621 DOI: https://doi.org/10.3390/e23121621

Lee, S., Lee, C., Mun, K. G., & Kim, D. (2022). Decision tree algorithm considering distances between classes. IEEE Access, 10, 69750-69756. http://dx.doi.org/10.1109/ACCESS.2022.3187172 DOI: https://doi.org/10.1109/ACCESS.2022.3187172

Lestari, A. (2020). Increasing accuracy of C4.5 algorithm using information gain ratio and AdaBoost for classification of chronic kidney disease. Journal of Soft Computing Exploration, 1(1), 32-38. https://doi.org/10.52465/joscex.v1i1.6 DOI: https://doi.org/10.52465/joscex.v1i1.6

Mustafa, O. M., Ahmed, O. M., & Saeed, V. A. (2024). Comparative analysis of decision tree algorithms using Gini and entropy criteria on the forest cover types dataset. In The International Conference on Innovations in Computing Research (pp. 185-193). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-65522-7_17 DOI: https://doi.org/10.1007/978-3-031-65522-7_17

Mienye, I. D., Sun, Y., & Wang, Z. (2019). Prediction performance of improved decision tree-based algorithms: A review. Procedia Manufacturing, 35, 698-703. https://doi.org/10.1016/j.promfg.2019.06.011 DOI: https://doi.org/10.1016/j.promfg.2019.06.011

Northcutt, C., Jiang, L., & Chuang, I. (2021). Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research, 70, 1373-1411. https://doi.org/10.1613/jair.1.12125 DOI: https://doi.org/10.1613/jair.1.12125

Priyanka, & Kumar, D. (2020). Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3), 246-269. DOI: https://doi.org/10.1504/IJIDS.2020.108141

Rahmati, O., Avand, M., Yariyan, P., Tiefenbacher, J. P., Azareh, A., & Bui, D. T. (2022). Assessment of Gini-, entropy-and ratio-based classification trees for groundwater potential modelling and prediction. Geocarto International, 37(12), 3397-3415. http://dx.doi.org/10.1080/10106049.2020.1861664 DOI: https://doi.org/10.1080/10106049.2020.1861664

Reddy, G. S., & Chittineni, S. (2021). Entropy-based C4.5-SHO algorithm with information gain optimization in data mining. PeerJ Computer Science, 7, e424. https://doi.org/10.7717/peerj-cs.424 DOI: https://doi.org/10.7717/peerj-cs.424

Sharief, F., Ijaz, H., Shojafar, M., & Naeem, M. A. (2024). Multi-class imbalanced data handling with concept drift in fog computing: A taxonomy, review, and future directions. ACM Computing Surveys, 57(1), 1-48. http://dx.doi.org/10.1145/3689627 DOI: https://doi.org/10.1145/3689627

Tangirala, S. (2020). Evaluating the impact of Gini index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications, 11(2), 612-619. http://dx.doi.org/10.14569/IJACSA.2020.0110277 DOI: https://doi.org/10.14569/IJACSA.2020.0110277

Yang, S., Li, N., Sun, D., Du, Q., & Liu, W. (2021). A differential privacy preserving algorithm for greedy decision tree. In 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE) (pp. 229-237). IEEE. http://dx.doi.org/10.1109/ICBASE53849.2021.00050 DOI: https://doi.org/10.1109/ICBASE53849.2021.00050

Published
2025-03-31
How to Cite
Bello, S., Aliyu, A. A., Ahmad, M. A., Abdullahi, A., Abdulkadir, S., Ahmed, A. M., & Dauda, S. (2025). AN ENHANCED CLASSIFICATION AND REGRESSION TREE ALGORITHM USING GINI EXPONENTIAL. FUDMA JOURNAL OF SCIENCES, 9(3), 259 - 267. https://doi.org/10.33003/fjs-2025-0903-3321