Comparison of Machine Learning Models for Breast Cancer Diagnosis Classification

https://doi.org/10.47194/ijgor.v6i4.431

Authors

Keywords:

Breast cancer, classification, machine learning.

Abstract

Breast cancer remains one of the most pressing global public health challenges, with approximately 2.3 million women diagnosed worldwide in 2022 and around 670,000 deaths attributed to the disease. Despite the widespread application of machine learning algorithms for breast cancer classification, findings across studies remain highly varied, and there is still no consistent conclusion regarding which algorithm is most superior for breast cancer diagnosis. This study aims to analyze and compare the performance of four machine learning algorithms Logistic Regression, Support Vector Machine (SVM), Random Forest, and K-Nearest Neighbors (KNN) in predicting breast cancer. The dataset used was the Breast Cancer Wisconsin (Diagnostic) Data Set obtained from Kaggle, containing morphological characteristics of tumor cells. Data preprocessing involved cleaning, label encoding, feature normalization using StandardScaler, and an 80:20 train-test split. Model performance was evaluated using confusion matrix, precision, recall, F1-score, accuracy, and ROC-AUC. The results showed that all four models achieved excellent performance with overall accuracy ranging from 95.61% to 97.37%. SVM emerged as the most accurate model (97.37%) with perfect recall (1.00) for the Benign class. Logistic Regression demonstrated the highest ROC-AUC value (0.9960), indicating excellent discriminative ability. Random Forest and KNN showed slightly lower performance, particularly in detecting Malignant cases with recall of 0.90. These findings confirm that machine learning can serve as an effective tool to support breast cancer diagnosis, with algorithm selection depending on data characteristics and clinical priorities.

References

Aga, S. S., Yasmeen, N., Al-Mansour, M., Khan, M. A., Nissar, S., Khawaji, B., ... & Abushouk, A. (2024). Knowledge, awareness and attitude towards breast cancer: Risk factors, signs and screening among Health and Allied students: A prospective study. Journal of Family Medicine and Primary Care, 13(5), 1804-1824.

Amrane, M., Oukid, S., Gagaoua, I., & Ensari, T. (2018, April). Breast cancer classification using machine learning. In 2018 electric electronics, computer science, biomedical engineerings' meeting (EBBT) (pp. 1-4). IEEE.

Ansari, G. A., Bhat, S. S., Ansari, M. D., Ahmad, S., & Abdeljab, H. A. M. (2024). Prediction and diagnosis of breast cancer using machine learning techniques. Data Metadata, 3, 346.

Choi, R. Y., Coyner, A. S., Kalpathy-Cramer, J., Chiang, M. F., & Campbell, J. P. (2020). Introduction to machine learning, neural networks, and deep learning. Translational vision science & technology, 9(2), 14-14.

Faouzi, J., & Colliot, O. (2023). Classic machine learning methods. Machine learning for brain disorders, 25-75.

Fortin, J., Leblanc, M., Elgbeili, G., Cordova, M. J., Marin, M. F., & Brunet, A. (2021). The mental health impacts of receiving a breast cancer diagnosis: A meta-analysis. British Journal of Cancer, 125(11), 1582-1592.

Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024). An overview on the advancements of support vector machine models in healthcare applications: a review. Information, 15(4), 235.

Kenny, K., Arisandi, D., & Sutrisno, T. (2024). Evaluasi penilaian kinerja karyawan dengan metode naïve bayes. Computatio: Journal of Computer Science and Information Systems, 8(1), 110-118.

Muntiari, N. R., & Hanif, K. H. (2022). Klasifikasi penyakit kanker payudara menggunakan perbandingan algoritma machine learning. Jurnal Ilmu Komputer dan Teknologi, 3(1), 1-6.

Sembiring, M. A., Saputra, H., Yusda, R. A., Sutarman, S., & Nababan, E. B. (2024). Performance of Robust Support Vector Machine Classification Model on Balanced, Imbalanced and Outliers Datasets. JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), 10(1), 208-215.

Wang, L., Wang, Y., Li, Y., Zhou, L., Liu, S., Cao, Y., ... & Zhu, T. (2024). RETRACTED ARTICLE: A prospective diagnostic model for breast cancer utilizing machine learning to examine the molecular immune infiltrate in HSPB6. Journal of Cancer Research and Clinical Oncology, 150(10), 475.

World Health Organization. (2023). Breast cancer. https://www.who.int/news-room/fact-sheets/detail/breast-cancer

Published

2025-11-28