AN IMPROVED HEART DISEASE PREDICTION USING INFORMATION GAIN-BASED FEATURE SELECTION
DOI:
https://doi.org/10.33003/fjs-2025-0912-4267Keywords:
Heart Disease Prediction, Cardiovascular Disease, Information Gain, Feature Selection, Machine Learning, Clinical Data Analysis, Kaggle DatasetAbstract
Heart disease remains one of the leading causes of mortality worldwide, accounting for a significant proportion of deaths annually. Early and accurate prediction of heart disease risk is therefore essential for guiding timely clinical intervention and reducing healthcare burdens. However, predictive models often suffer from reduced performance due to redundant and irrelevant features present in medical datasets. This study addresses this challenge by applying Information Gain-based feature selection to improve the reliability of heart disease prediction. The research utilized the Kaggle Heart Disease Dataset, which consists of demographic and clinical attributes including age, sex, chest pain type, resting blood pressure, cholesterol level, exercise-induced angina, and ST-slope characteristics. Information Gain, an entropy-based ranking criterion, was employed to identify and retain the most informative features while discarding less relevant variables. By reducing dimensionality, the approach enhanced both model interpretability and computational efficiency. Experimental evaluation demonstrated that models trained on the Information Gain-selected features achieved higher accuracy and better generalization compared to models trained on the full dataset. The feature selection process also highlighted the clinical risk factors most strongly associated with heart disease, such as chest pain type, ST-slope, and exercise-induced angina. The results confirm that Information Gain-based feature selection significantly improves predictive performance and provides valuable insights into the attributes most indicative of heart disease risk. This approach contributes to the development of lightweight, interpretable, and effective predictive systems that can support clinical decision-making and early diagnosis.
References
Abbasi, M., et al. (2025). Early diagnosis of cardiac disorders using ML decision support system. BMC Medical Informatics and Decision Making.
Ahmad, M., et al. (2025). Feature-selection strategies for optimized heart-disease diagnosis. Computational Intelligence and Neuroscience.
Alsabhan, A., & Alfadhly, S. (2025). Effectiveness of ML models in heart disease diagnosis. Frontiers in Digital Health.
Ashika, T., & Grace, G. H. (2025). Enhancing heart-disease prediction with stacked ensemble and MCDM-based ranking: An optimized RST-ML approach. Frontiers in Digital Health, 3, Article 1609308. https://doi.org/10.3389/fdgth.2025.1609308
Başar, R., Uçar, T., & Demir, F. (2025). Leveraging machine-learning techniques to predict heart disease: An evaluation of clinical attributes and explainability. Information, 16(8), Article 639. https://doi.org/10.3390/info16080639
Dey, D., Slomka, P. J., Leeson, P., Comaniciu, D., Shrestha, S., Sengupta, P. P., & Marwick, T. H. (2022). Artificial intelligence in cardiovascular imaging: JACC state-of-the-art review. Journal of the American College of Cardiology, 79(25), 2519–2536. https://doi.org/10.1016/j.jacc.2022.04.033
García-Ordás, J., et al. (2024). Deep learning with feature augmentation for heart-disease prediction. BMC Medical Informatics and Decision Making.
Karmakar, P., et al. (2024). A data-balancing approach for expert-system design. Artificial Intelligence in Medicine.
Kumar, A., et al. (2025). A hybrid framework for heart disease prediction using classical and quantum-inspired machine learning techniques. Scientific Reports, 15, Article 25040. https://doi.org/10.1038/s41598-025-09957-1
Rehman, M. U., Naseem, S., Butt, A. U. R., Mahmood, T., & Khan, A. (2025). Predicting coronary heart disease with advanced machine learning classifiers for improved cardiovascular risk assessment. Scientific Reports, 15, Article 13361. https://doi.org/10.1038/s41598-025-13361-2
Shesharao, S., et al. (2024). Advancements in machine learning for heart disease prediction. Computers in Biology and Medicine.
Teja, V., et al. (2025). Optimizing diagnosis of heart disease with advanced machine learning. Scientific Reports.
Wan, S. (2025). Machine-learning approaches for cardiovascular disease: Perspectives on rigorous validation. Artificial Intelligence in Medicine, 150, Article 102500. https://doi.org/10.1016/j.artmed.2025.102500
World Health Organization. (2023). Cardiovascular diseases (CVDs): Key facts. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
Zhang, X., et al. (2022). Machine-learning models for hypertension and cardiovascular risk prediction in China. BMC Cardiovascular Disorders, 22, 305. https://doi.org/10.1186/s12872-022-02789-5
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2025 Umar Murtala Mani, Zahraddeen Sufyanu, Usman Mahmud, Usman Umar, Surayya Tajoudden Bashir

This work is licensed under a Creative Commons Attribution 4.0 International License.