PREDICTION OF STROKE USING LOGISTIC REGRESSION WITH A MACHINE LEARNING APPROACH

  • Ishiqa Rana Aphrodita Universitas Amikom Yogyakarta
  • Ika Nur Fajri Universitas Amikom Yogyakarta
  • Agung Nugroho Universitas Amikom Yogyakarta
Keywords: Stroke, Logistic Regression, Machine Learning, Streamlit, Prediction

Abstract

Abstract: Stroke is one of the leading causes of death and disability in various parts of the world, including in Indonesia. Along with the development of digital technology, the use of Machine Learning in the health sector is growing, one of which is in an effort to predict the occurrence of stroke. This study aims to implement the Logistic Regression algorithm in predicting the likelihood of a person having a stroke based on data from the Brain Stroke dataset. The research process includes data preprocessing (missing value handling, normalization, and label encoding), dividing the data into 80% training data and 20% test data, as well as model training. The model was then evaluated using several measures such as accuracy, precision, recall, F1-score, and ROC-AUC, as well as a confusion matrix. The results of the study showed that Logistic Regression was able to provide stroke classification results with an accuracy of 82.4%, precision of 80.1%, recall of 78.6%, F1-score of 79.3%, and a ROC-AUC value of 0.87. Then, the model is integrated into applications that use Streamlit, so it can be used interactively to predict stroke risk in new data. The results of this study show that the combination of Machine Learning and web-based applications has the potential to support efforts to detect early stroke risk.

           
Keywords: logistic regression; machine learning; prediction; streamlit; stroke.

 

 

Abstrak: Stroke adalah salah satu penyebab utama kematian dan kecacatan di berbagai belahan dunia, termasuk di Indonesia. Seiring perkembangan teknologi digital, penggunaan Machine Learning dalam bidang kesehatan semakin berkembang, salah satunya dalam upaya memprediksi terjadinya penyakit stroke. Penelitian ini bertujuan untuk mengimplementasikan algoritma Logistic Regression dalam memprediksi kemungkinan seseorang mengalami stroke berdasarkan data dari dataset Brain Stroke. Proses penelitian meliputi preprocessing data (penanganan missing value, normalisasi, dan label encoding), membagi data menjadi 80% data latih dan 20% data uji, serta pelatihan model. Model kemudian dievaluasi menggunakan beberapa ukuran seperti akurasi, precision, recall, F1-score, dan ROC-AUC, serta confusion matrix. Hasil penelitian menunjukkan bahwa Logistic Regression mampu memberikan hasil klasifikasi penyakit stroke dengan akurasi sebesar 82,4%, precision 80,1%, recall 78,6%, F1-score 79,3%, dan nilai ROC-AUC sebesar 0,87. Kemudian, model tersebut diintegrasikan ke dalam aplikasi yang menggunakan Streamlit, sehingga dapat digunakan secara interaktif untuk memprediksi risiko stroke pada data baru. Hasil penelitian ini menunjukkan bahwa kombinasi Machine Learning dan aplikasi berbasis web berpotensi mendukung upaya deteksi dini risiko stroke.

 

Kata kunci: logistic regression; machine learning; prediksi; streamlit; stroke.

References

1. Setyopranoto, I., et al. (2019). Prevalence of stroke and associated risk factors in Sleman District of Yogyakarta Special Region, Indonesia. Stroke Res Treat, 2019. https://doi.org/10.1155/2019/2642458

2. Guhdar, M., Melhum, A. I., & Ibrahim, A. L. (2023). Optimizing Accuracy of Stroke Prediction Using Logistic Regression. Journal of Technology and Informatics (JoTI), 4(2), 41–47. https://doi.org/10.37802/joti.v4i2.278

3. Okwori, O. A., Agana, M. A., Ofem, A., & Ofem, O. I. (2024). Prediction of Patient’s Stroke Vulnerability Status Using Logistic Regression.

4. Moelyo, A. G., Sitaresmi, M. N., & Julia, M. (2025). Growth faltering or deceleration toward target height: Linear growth interpretation using WHO growth standard 2006 for Indonesian children. PLoS One, 20(4). https://doi.org/10.1371/journal.pone.0290053

5. Hassan, A., Ahmad, S. G., Munir, E. U., Khan, I. A., & Ramzan, N. (2024). Predictive modelling and identification of key risk factors for stroke using machine learning. Sci Rep, 14(1). https://doi.org/10.1038/s41598-024-61665-4

6. Fu, Y. (2024). A machine learning approach for predicting stroke. Medical Data Mining, 7(3). https://doi.org/10.53388/MDM202407015

7. Li, S. (2023). The prediction of stroke and feature importance analysis based on multiple machine learning algorithms. Applied and Computational Engineering, 18(1), 37–41. https://doi.org/10.54254/2755-2721/18/20230961

8. Swain, K., et al. (2024). Enhancing Stroke Prediction Using LightGBM With SMOTE-ENN and Fine-Tuning: A Comprehensive Analysis. Cureus Journal of Computer Science. https://doi.org/10.7759/s44389-024-02268-y

9. Li, L. (2024). Stroke Prediction Based on Logistic Regression Model.

10. Tazin, T., et al. (2021). Stroke Disease Detection and Prediction Using Robust Learning Approaches. J Healthc Eng, 2021. https://doi.org/10.1155/2021/7633381

11. Mitra, R., & Rajendran, T. (2021). Efficient Prediction of Stroke Patients Using Logistic Regression Algorithm in Comparison to Decision Tree Algorithm.

12. Tashkova, A., Eftimov, S., Ristov, B., & Kalajdziski, S. (2025). Comparative Analysis of Stroke Prediction Models Using Machine Learning. arXiv. http://arxiv.org/abs/2505.09812

13. Huang, X., et al. (2022). Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults. Front Cardiovasc Med, 9. https://doi.org/10.3389/fcvm.2022.901240

14. Masuda, M., et al. (2023). Recurrent cardiac arrests caused by Kounis syndrome without typical allergic symptoms. J Cardiol Cases, 27(2), 47–51. https://doi.org/10.1016/j.jccase.2022.10.004
Published
2025-09-30
How to Cite
Rana Aphrodita, I., Nur Fajri, I., & Nugroho, A. (2025). PREDICTION OF STROKE USING LOGISTIC REGRESSION WITH A MACHINE LEARNING APPROACH . JURTEKSI (jurnal Teknologi Dan Sistem Informasi), 11(4), 755 - 762. https://doi.org/10.33330/jurteksi.v11i4.4161
Section
Articles