THE EFFECT OF DATASETS ON BREAST CANCER DETECTION MODELS
Abstract
Datasets are a major requirement in the development of breast cancer classification/detection models using machine learning algorithms. These models can provide an effective, accurate and less expensive diagnosis method and reduce life losses. However, using the same machine learning algorithms on different datasets yields different results. This research developed several machine learning models for breast cancer classification/detection using Random forest, support vector machine, K Nearest Neighbors, Gaussian Naïve Bayes, Perceptron and Logistic regression. Three widely used test data sets were used; Wisconsin Breast Cancer (WBC) Original, Wisconsin Diagnostic Breast Cancer (WDBC) and Wisconsin Prognostic Breast Cancer (WPBC). The results show that datasets affect the performance of machine learning classifiers. Also, the machine learning classifiers have different performances with a given breast cancer dataset
References
Agarap, A. F. M. (2018, February). On breast cancer detection: an application of machine learning algorithms on the wisconsin diagnostic dataset. In Proceedings of the 2nd International Conference on Machine Learning and Soft Computing (pp. 5-9).
Al-Quraishi, T., Abawajy, J. H., Chowdhury, M. U., Rajasegarar, S., & Abdalrada, A. S. (2018, February). Breast cancer recurrence prediction using random forest model. In International Conference on Soft Computing and Data Mining (pp. 318-329). Springer, Cham.
Analytics, C. (2016). Anaconda software distribution. Obtido de https://continuum. io.
Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Computer Science, 83, 1064-1069.
Chaurasia, V., Pal, S., & Tiwari, B. B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 12(2), 119-126.
Williams, K., Idowu, P. A., Balogun, J. A., & Oluwaranti, A. I. (2015). Breast cancer risk prediction using data mining classification techniques. Transactions on Networks and Communications, 3(2), 01-01.
Kumar, G. R., Ramachandra, G. A., & Nagamani, K. (2013). An efficient prediction of breast cancer data using data mining techniques. International Journal of Innovations in Engineering and Technology (IJIET), 2(4), 139.
Khan, R. A., Ahmad, N., & Minallah, N. (2013). Classification and regression analysis of the prognostic breast cancer using generation optimizing algorithms. International Journal of Computer Applications, 68(25), 42-47.
Pampel, F. C. (2020). Logistic regression: A primer (Vol. 132). Sage publications.
Pawlovsky, A. P., & Nagahashi, M. (2014, June). A method to select a good setting for the kNN algorithm when using it for breast cancer prognosis. In IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (pp. 189-192). IEEE.
Sahran, S., Qasem, A., Omar, K., Albashih, D., Adam, A., Abdullah, S. N. H. S., ... & Pauzi, S. H. M. (2018). Machine learning methods for breast cancer diagnostic. In Breast Cancer and Surgery. IntechOpen.
Saritas, M. M., & Yasar, A. (2019). Performance analysis of ANN and Naive Bayes classification algorithm for data classification. International Journal of Intelligent Systems and Applications in Engineering, 7(2), 88-91
Scholkopf, B., & Smola, A. J. (2018). Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning series.
Shen, R., Yang, Y., & Shao, F. (2014, August). Intelligent breast cancer prediction model using data mining techniques. In 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics (Vol. 1, pp. 384-387). IEEE.
Teran, E., Wang, Z., & Jiménez, D. A. (2016, October). Perceptron learning for reuse prediction. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 1-12). IEEE.
Wang, H., & Yoon, S. W. (2015). Breast cancer prediction using data mining method. In IIE Annual Conference. Proceedings (p. 818). Institute of Industrial and Systems Engineers (IISE).
WHO. (2020). Promoting Cancer Early Diagnosis Retrieved December 24, 2020, from https://www.who.int/activities/promoting-cancer-early-diagnosis
Zand, H. K. K. (2015). A comparative survey on data mining techniques for breast cancer diagnosis and prediction. Indian Journal of Fundamental and Applied Life Sciences, 5(S1), 4330-4339.
Copyright (c) 2020 FUDMA JOURNAL OF SCIENCES
This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences