AN ENHANCED CLASSIFICATION AND REGRESSION TREE ALGORITHM USING GINI EXPONENTIAL
Abstract
Decision tree algorithms, particularly Classification and Regression Trees (CART), are widely used in machine learning for their simplicity, interpretability, and ability to handle both categorical and numerical data. However, traditional decision trees often encounter limitations when dealing with complex, high-dimensional, or imbalanced datasets, as conventional impurity measures such as the Gini Index and Information Gain may fail to capture subtle variations in the data effectively. This study enhances the traditional Classification and Regression Trees (CART) model by introducing the Gini Exponential Criterion, which incorporates an exponential weighting factor into the split point calculation process. This novel approach amplifies the influence of highly discriminative features, resulting in more refined splits and improved decision boundaries. The enhanced CART model was evaluated on two benchmark datasets: the wine quality dataset and the hypothyroid dataset, with preprocessing steps like feature scaling and SMOTE for class imbalance, and hyperparameter tuning via Bayesian Optimization. On the wine quality dataset, the enhanced model improved accuracy from 57% (traditional CART) to 86%, while on the hypothyroid dataset, it achieved an impressive accuracy of 98%. These results highlight the model's ability to handle complex and imbalanced data effectively. Feature importance analysis and decision tree visualization further demonstrated the model's interpretability. The study concludes that the Gini Exponential Criterion significantly improves CART's performance, offering better generalization and clearer decision boundaries. This advancement is particularly valuable for applications requiring precise and interpretable predictions, such as healthcare diagnostics and quality assessment. Future work could explore integrating this criterion into ensemble methods and...
References
Abedinia, A., & Seydi, V. (2024). Building semi-supervised decision trees with semi-cart algorithm. International Journal of Machine Learning and Cybernetics, 1-18. https://doi.org/10.1007/s13042-024-02161-z DOI: https://doi.org/10.1007/s13042-024-02161-z
Ali Fernando, W., Jollyta, D., Priyanto, D., & Oktarina, D. (2024). The Influence of Data Categorization and Attribute Instances Reduction Using The Gini Index On The Accuracy of The Classification Algorithm Model. Jurna lIlmiah Kursor, 12(3), 111-122. https://doi.org/10.21107/kursor.v12i3.372 DOI: https://doi.org/10.21107/kursor.v12i3.372
Ali, M. S. A. M., Zabidi, A., Tahir, N. M., Yassin, I. M., Eskandari, F., Saadon, A., ... & Ridzuan, A. R. (2024). Short-term Gini coefficient estimation using nonlinear autoregressive multilayer perceptron model. Heliyon, 10(4). https://doi.org/10.1016/j.heliyon.2024.e26438 DOI: https://doi.org/10.1016/j.heliyon.2024.e26438
Altaf, I., Butt, M. A., & Zaman, M. (2022, June). Systematic consequence of different splitting indices on the classification performance of random decision forest. In 2022 2nd International Conference on Intelligent Technologies (CONIT) (pp. 1-5). IEEE. http://dx.doi.org/10.1109/CONIT55038.2022.9848372 DOI: https://doi.org/10.1109/CONIT55038.2022.9848372
Awujoola, O. J., Ogwueleka, F. N., Irhebhude, M. E., & Misra, S. (2021). Wrapper based approach for network intrusion detection model with combination of dual filtering technique of resample and SMOTE. In Artificial Intelligence for Cyber Security: Methods, Issues and Possible Horizons or Opportunities (pp. 139-167). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-72236-4_6
Awujoola, J. O., Enem, T. A., Ogwueleka, F. N., Abioye, O., & Adelegan, R. O. (2024). Machine LearningEnabled Predictive Analytics for Quality Assurance in Industry 4.0 and Smart Manufacturing: A Case Study on Red and White Wine Quality Classification. In Industry 4.0, Smart Manufacturing, and Industrial Engineering (pp. 65-97). CRC Press. DOI: https://doi.org/10.1201/9781003473886-4
Bittencourt, J. C. N., Costa, D. G., Portugal, P., &Vasques, F. (2024). Towards lightweight fire detection at the extreme edge based on decision trees. In 2024 IEEE 22nd Mediterranean Electrotechnical Conference (MELECON) (pp. 873-878). IEEE. http://doi.org/10.1109/melecon56669.2024.10608598 DOI: https://doi.org/10.1109/MELECON56669.2024.10608598
Bouke, M. A., Abdullah, A., Frnda, J., Cengiz, K., & Salah, B. (2023). BukaGini: A stability-aware Gini index feature selection algorithm for robust model performance. IEEE Access, 11, 59386-59396. http://doi.org/10.1109/ACCESS.2023.3284975 DOI: https://doi.org/10.1109/ACCESS.2023.3284975
Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28. DOI: https://doi.org/10.38094/jastt20165
Disha, R. A., & Waheed, S. (2022). Performance analysis of machine learning models for intrusion detection system using Gini impurity-based weighted random forest (GIWRF) feature selection technique. Cybersecurity, 5(1), 1. https://doi.org/10.1186/s42400-021-00103-8 DOI: https://doi.org/10.1186/s42400-021-00103-8
Fayaz, S. A., Zaman, M., & Butt, M. A. (2021). An application of logistic model tree (LMT) algorithm to ameliorate Prediction accuracy of meteorological data. International Journal of Advanced Technology and Engineering Exploration, 8(84), 1424-1440. https://doi.org/10.19101/IJATEE.2021.874586. DOI: https://doi.org/10.19101/IJATEE.2021.874586
Hasija, Y. (2023). All about Bioinformatics: From Beginner to Expert. Elsevier. DOI: https://doi.org/10.1016/B978-0-443-15250-4.00012-5
Iorliam, I. B., Ikyo, B. A., Iorliam, A., Okube, E. O., Kwaghtyo, K. D., & Shehu, Y. I. (2021). Application of machine learning techniques for Okra shelf-life prediction. Journal of Data Analysis and Information Processing, 9(3), 136-150. https://doi.org/10.4236/jdaip.2021.93009 DOI: https://doi.org/10.4236/jdaip.2021.93009
Juszczuk, P., Kozak, J., Dziczkowski, G., Gowania, S., Jach, T., &Probierz, B. (2021). Real-world data difficulty estimation with the use of entropy. Entropy, 23(12), 1621. https://doi.org/10.3390/e23121621 DOI: https://doi.org/10.3390/e23121621
Lee, S., Lee, C., Mun, K. G., & Kim, D. (2022). Decision tree algorithm considering distances between classes. IEEE Access, 10, 69750-69756. http://dx.doi.org/10.1109/ACCESS.2022.3187172 DOI: https://doi.org/10.1109/ACCESS.2022.3187172
Lestari, A. (2020). Increasing accuracy of C4.5 algorithm using information gain ratio and AdaBoost for classification of chronic kidney disease. Journal of Soft Computing Exploration, 1(1), 32-38. https://doi.org/10.52465/joscex.v1i1.6 DOI: https://doi.org/10.52465/joscex.v1i1.6
Mustafa, O. M., Ahmed, O. M., & Saeed, V. A. (2024). Comparative analysis of decision tree algorithms using Gini and entropy criteria on the forest cover types dataset. In The International Conference on Innovations in Computing Research (pp. 185-193). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-65522-7_17 DOI: https://doi.org/10.1007/978-3-031-65522-7_17
Mienye, I. D., Sun, Y., & Wang, Z. (2019). Prediction performance of improved decision tree-based algorithms: A review. Procedia Manufacturing, 35, 698-703. https://doi.org/10.1016/j.promfg.2019.06.011 DOI: https://doi.org/10.1016/j.promfg.2019.06.011
Northcutt, C., Jiang, L., & Chuang, I. (2021). Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research, 70, 1373-1411. https://doi.org/10.1613/jair.1.12125 DOI: https://doi.org/10.1613/jair.1.12125
Priyanka, & Kumar, D. (2020). Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3), 246-269. DOI: https://doi.org/10.1504/IJIDS.2020.108141
Rahmati, O., Avand, M., Yariyan, P., Tiefenbacher, J. P., Azareh, A., & Bui, D. T. (2022). Assessment of Gini-, entropy-and ratio-based classification trees for groundwater potential modelling and prediction. Geocarto International, 37(12), 3397-3415. http://dx.doi.org/10.1080/10106049.2020.1861664 DOI: https://doi.org/10.1080/10106049.2020.1861664
Reddy, G. S., & Chittineni, S. (2021). Entropy-based C4.5-SHO algorithm with information gain optimization in data mining. PeerJ Computer Science, 7, e424. https://doi.org/10.7717/peerj-cs.424 DOI: https://doi.org/10.7717/peerj-cs.424
Sharief, F., Ijaz, H., Shojafar, M., & Naeem, M. A. (2024). Multi-class imbalanced data handling with concept drift in fog computing: A taxonomy, review, and future directions. ACM Computing Surveys, 57(1), 1-48. http://dx.doi.org/10.1145/3689627 DOI: https://doi.org/10.1145/3689627
Tangirala, S. (2020). Evaluating the impact of Gini index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications, 11(2), 612-619. http://dx.doi.org/10.14569/IJACSA.2020.0110277 DOI: https://doi.org/10.14569/IJACSA.2020.0110277
Yang, S., Li, N., Sun, D., Du, Q., & Liu, W. (2021). A differential privacy preserving algorithm for greedy decision tree. In 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE) (pp. 229-237). IEEE. http://dx.doi.org/10.1109/ICBASE53849.2021.00050 DOI: https://doi.org/10.1109/ICBASE53849.2021.00050
Copyright (c) 2025 FUDMA JOURNAL OF SCIENCES

This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences