PREDICTING OF BREAST CANCER RISK USING MACHINE LEARNING WITH FEATURE SELECTION THROUGH XGBOOST

Cahya Mutiara Al Azhar, Pujiono Pujiono

Abstract


Abstract: Breast cancer is the leading cause of death for women globally, exacerbated by late detection. This study proposes a breast cancer risk prediction framework using XGBoost with SelectKBest feature selection. It aims to improve the accuracy and efficiency of early detection through exploratory data analysis, coding, SMOTE to address class imbalance, and feature selection (k=29). As a result, the XGBoost model achieved 98.1% accuracy, 98.1% recall, 98.1% f1-score, and 98.2% precision on test data, highlighting the importance of feature selection. These results are promising in patient prioritization (triage) for further examination, helping medical personnel identify high-risk patients, thus improving resource allocation efficiency. These findings validate SelectKBest and pave the way for the development of a machine learning-based clinical decision support system for breast cancer early detection workflows. This research contributes significantly to the application of machine learning to support early breast cancer detection.

           
Keywords: breast cancer; feature selection; machine learning; risk prediction; XGBOOST.

 

 

Abstrak: Kanker payudara menjadi penyebab utama kematian wanita global, diperparah deteksi yang terlambat. Penelitian ini mengusulkan kerangka prediksi risiko kanker payudara menggunakan XGBoost dengan seleksi fitur SelectKBest. Tujuannya meningkatkan akurasi dan efisiensi deteksi dini melalui analisis data eksploratif, pengkodean, SMOTE untuk mengatasi ketidakseimbangan kelas, dan seleksi fitur (k=29). Hasilnya, model XGBoost mencapai akurasi 98.1%, recall 98.1%, f1-score 98.1%, dan presisi 98.2% pada data uji, menyoroti pentingnya seleksi fitur. Hasil ini menjanjikan dalam penentuan prioritas pasien (triage) untuk pemeriksaan lebih lanjut, membantu tenaga medis mengidentifikasi pasien berisiko tinggi, sehingga meningkatkan efisiensi alokasi sumber daya. Temuan ini memvalidasi SelectKBest dan membuka jalan bagi pengembangan sistem pendukung keputusan klinis berbasis machine learning untuk alur kerja deteksi dini kanker payudara. Penelitian ini berkontribusi signifikan dalam penerapan machine learning untuk mendukung deteksi dini kanker payudara.

 

Kata kunci: kanker payudara; pembelajaran mesin; prediksi risiko ; seleksi fitur; XGBOOST.

 


Full Text:

PDF

References


Y. S. Prabandari et al., “‘Alas … my sickness becomes my family’s burden’: A nested qualitative study on the experience of advanced breast cancer patients across the disease trajectory in Indonesia,” The Breast, vol. 63, pp. 168–176, Jun. 2022, doi: 10.1016/j.breast.2022.04.001.

M. Arnold et al., “Current and future burden of breast cancer: Global statistics for 2020 and 2040,” The Breast, vol. 66, pp. 15–23, Dec. 2022, doi: 10.1016/j.breast.2022.08.010.

B. E. Patiño-Palma, L. López‐Montoya, R. Escamilla-Ugarte, and A. Gómez-Rodas, “Trends in physical activity research for breast cancer - A bibliometric analysis of the past ten years,” Heliyon, vol. 9, no. 12, p. e22499, Dec. 2023, doi: 10.1016/j.heliyon.2023.e22499.

S. M. Malakouti, M. B. Menhaj, and A. A. Suratgar, “ML: Early Breast Cancer Diagnosis,” Curr. Probl. Cancer Case Rep., vol. 13, p. 100278, Mar. 2024, doi: 10.1016/j.cpccr.2024.100278.

Md. M. Hassan et al., “A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction,” Decis. Anal. J., vol. 7, p. 100245, Jun. 2023, doi: 10.1016/j.dajour.2023.100245.

A. De Luca et al., “Neoadjuvant chemotherapy for breast cancer in Italy: A Senonetwork analysis of 37,215 patients treated from 2017 to 2022,” The Breast, vol. 78, p. 103790, Dec. 2024, doi: 10.1016/j.breast.2024.103790.

H. Xie, Y. Deng, J. Li, K. Xie, T. Tao, and J. Zhang, “Predicting the risk of primary Sjögren’s syndrome with key N7-methylguanosine-related genes: A novel XGBoost model,” Heliyon, vol. 10, no. 10, p. e31307, May 2024, doi: 10.1016/j.heliyon.2024.e31307.

M. Darwich and M. Bayoumi, “An evaluation of the effectiveness of machine learning prediction models in assessing breast cancer risk,” Inform. Med. Unlocked, vol. 49, p. 101550, 2024, doi: 10.1016/j.imu.2024.101550.

V. Nemade and V. Fegade, “Machine Learning Techniques for Breast Cancer Prediction,” Procedia Comput. Sci., vol. 218, pp. 1314–1320, 2023, doi: 10.1016/j.procs.2023.01.110.

S. Jafari, J.-H. Yang, and Y.-C. Byun, “Optimized XGBoost modeling for accurate battery capacity degradation prediction,” Results Eng., vol. 24, p. 102786, Dec. 2024, doi: 10.1016/j.rineng.2024.102786.

C.-J. Tseng and C. Tang, “An optimized XGBoost technique for accurate brain tumor detection using feature selection and image segmentation,” Healthc. Anal., vol. 4, p. 100217, Dec. 2023, doi: 10.1016/j.health.2023.100217.

N. Q. K. Le, D. T. Do, T.-T.-D. Nguyen, and Q. A. Le, “A sequence-based prediction of Kruppel-like factors proteins using XGBoost and optimized features,” Gene, vol. 787, p. 145643, Jun. 2021, doi: 10.1016/j.gene.2021.145643.

V. Jaiswal, P. Saurabh, U. K. Lilhore, M. Pathak, S. Simaiya, and S. Dalal, “A breast cancer risk predication and classification model with ensemble learning and big data fusion,” Decis. Anal. J., vol. 8, p. 100298, Sep. 2023, doi: 10.1016/j.dajour.2023.100298.

M. Shanbehzadeh, H. Kazemi-Arpanahi, M. Bolbolian Ghalibaf, and A. Orooji, “Performance evaluation of machine learning for breast cancer diagnosis: A case study,” Inform. Med. Unlocked, vol. 31, p. 101009, 2022, doi: 10.1016/j.imu.2022.101009.

D. Tarwidi, S. R. Pudjaprasetya, D. Adytia, and M. Apri, “An optimized XGBoost-based machine learning method for predicting wave run-up on a sloping beach,” MethodsX, vol. 10, p. 102119, 2023, doi: 10.1016/j.mex.2023.102119.

A. M. Mequanenit, A. M. Ayalew, A. O. Salau, E. A. Nibret, and M. Meshesha, “Prediction of mung bean production using machine learning algorithms,” Heliyon, vol. 10, no. 24, p. e40971, Dec. 2024, doi: 10.1016/j.heliyon.2024.e40971.

Z. Wang, X. Wu, and Y. Wu, “A spatiotemporal XGBoost model for PM2.5 concentration prediction and its application in Shanghai,” Heliyon, vol. 9, no. 12, p. e22569, Dec. 2023, doi: 10.1016/j.heliyon.2023.e22569.

T. Chen, X. Zhou, and G. Wang, “Using an innovative method for breast cancer diagnosis based on Extreme Gradient Boost optimized by Simplified Memory Bounded A*,” Biomed. Signal Process. Control, vol. 87, p. 105450, Jan. 2024, doi: 10.1016/j.bspc.2023.105450.

S. Batool and S. Zainab, “A comparative performance assessment of artificial intelligence based classifiers and optimized feature reduction technique for breast cancer diagnosis,” Comput. Biol. Med., vol. 183, p. 109215, Dec. 2024, doi: 10.1016/j.compbiomed.2024.109215.

P. T. Teo et al., “Determining risk and predictors of head and neck cancer treatment-related lymphedema: A clinicopathologic and dosimetric data mining approach using interpretable machine learning and ensemble feature selection,” Clin. Transl. Radiat. Oncol., vol. 46, p. 100747, May 2024, doi: 10.1016/j.ctro.2024.100747.

V. Safavi, A. Mohammadi Vaniar, N. Bazmohammadi, J. C. Vasquez, O. Keysan, and J. M. Guerrero, “Early prediction of battery remaining useful life using CNN-XGBoost model and Coati optimization algorithm,” J. Energy Storage, vol. 98, p. 113176, Sep. 2024, doi: 10.1016/j.est.2024.113176.

X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Mach. Learn. Appl., vol. 6, p. 100154, Dec. 2021, doi: 10.1016/j.mlwa.2021.100154.

P. Paulus, Y. Ruppert, A. Andreicovici, M. Vielhaber, and J. Griebsch, “Comparison of machine learning based methods on prediction quality of thin-walled geometries using laser-based Direct Energy Deposition,” Procedia CIRP, vol. 124, pp. 781–784, 2024, doi: 10.1016/j.procir.2024.08.224.

A. Maleki, M. Raahemi, and H. Nasiri, “Breast cancer diagnosis from histopathology images using deep neural network and XGBoost,” Biomed. Signal Process. Control, vol. 86, p. 105152, Sep. 2023, doi: 10.1016/j.bspc.2023.105152.




DOI: https://doi.org/10.33330/jurteksi.v11i2.3661

Article Metrics

Abstract view : 14 times
PDF - 32 times

Refbacks

  • There are currently no refbacks.


Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM) Universitas Royal

Copyright © LPPM UNIVERSITAS ROYAL

 

Lisensi Creative Commons
Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-BerbagiSerupa 4.0 Internasional.
pkv games bandarqq qiu qiu https://kemenagtabalong.id/ https://aoi.ngo/ https://zeronet.id/ https://bilderhoster.org/ https://kemenagbandaaceh.com/ https://perdosrijaya.org/ https://dwr-rental.com/ http://www.anatolekatok.com/ http://www.leonatamusic.com/ http://www.vaudiosoftllc.com/ https://indonesianfeministjournal.org/ https://ugcolleges.com/ https://www.bovendigoelkab.go.id/cak/ http://www.sipp.pn-nunukan.go.id/ https://journal.lemigas.esdm.go.id/public/ https://jurnal.kemendag.go.id/plugins/sob/ https://www.sipp.pn-lamongan.go.id/ https://www.mediaelangnusantara.com/ https://digimarly.com/ slot resmi misterhoki