NAÏVE BAYES-BASED STUDENT ACHIEVEMENT PREDICTION SYSTEM
Abstract
Abstract: SMP Muhammadiyah 5 Samarinda still relies on manual evaluation with limited data analysis tools in predicting student academic achievement. This study aims develop a system for predicting the learning achievement of students at SMP Muhammadiyah 5 Samarinda using the Naive Bayes classification method. The dataset used consists of 192 student exam scores covering academic scores, attendance, parents’ education and income, and living conditions as independent variables, while the dependent variable is the achievement label (achieved or not achieved). The preprocessing stage includes label normalization, feature selection, and median imputation to handle missing data. The dataset was divided into 75% training data and 25%. The model was implemented as a pipeline consisting of a median imputer and a Gaussian Naive Bayes classifier. The evaluation results showed that the model achieved an accuracy of 79.2%, with a perfect recall value (1.00) in the high-achieving class and (0.64) in the low-achieving class. This shows that the model is quite effective in identifying high-achieving students. The trained model was then integrated into a Flask-based web application, which enables online predictions through a simple form interface, facilitating contextual interpretation. This system is expected to assist in educational decision-making by helping teachers identify students’ achievement levels early on and design more targeted learning interventions.
Keywords: academic performance; educational data mining; naive bayes; prediction system; student achievement
Abstrak: SMP Muhammadiyah 5 Samarinda masih bergantung pada evaluasi manual dengan alat analisis data terbatas dalam melakukan prediksi prestasi akademik siswa. Penelitian ini bertujuan mengembangkan sistem prediksi prestasi belajar siswa SMP Muhammadiyah 5 Samarinda menggunakan metode klasifikasi Naive Bayes. Dataset yang digunakan terdiri atas 192 data nilai ujian siswa yang mencakup skor akademik, kehadiran, pendidikan dan pendapatan orang tua, serta kondisi tempat tinggal sebagai variabel independen, sedangkan variabel dependen berupa label prestasi (berprestasi atau tidak berprestasi). Tahap preprocessing meliputi normalisasi label, seleksi fitur, serta imputasi median untuk menangani data yang hilang. Dataset dibagi menjadi 75% data latih dan 25%. Model diimplementasikan dalam bentuk pipeline yang terdiri atas median imputer dan Gaussian Naive Bayes classifier. Hasil evaluasi menunjukkan bahwa model mencapai akurasi sebesar 79,2%, dengan nilai recall sempurna (1,00) pada kelas berprestasi dan lebih rendah (0,64) pada kelas tidak berprestasi. Hal ini menunjukkan bahwa model cukup efektif dalam mengidentifikasi siswa berprestasi. Model yang telah dilatih kemudian diintegrasikan ke dalam aplikasi web berbasis Flask, yang memungkinkan prediksi secara daring melalui antarmuka formulir sederhana untuk mendukung interpretasi kontekstual. Sistem ini diharapkan dapat membantu untuk pengambilan keputusan dalam pendidikan dengan membantu guru mengidentifikasi tingkat prestasi siswa sejak dini dan merancang intervensi pembelajaran yang lebih terarah.
Kata kunci: prestasi akademik; penambangan data Pendidikan; naive bayes; sistem prediksi; prestasi siswa
References
Wati, A., et al., “Application of Decision Tree Algorithm for Stu-dent Graduation Prediction,” Journal of Educational Data Min-ing, vol. 13, no. 1, pp. 45–53, 2021.
Susanti, D., and Hidayat, T., “Predicting Student Academic Performance Using Support Vec-tor Machine,” International Jour-nal of Computer Applications, vol. 182, no. 24, pp. 30–35, 2020.
Handayani, S., “Student Achievement Prediction Using K-Nearest Neighbor Algorithm,” In-donesian Journal of Artificial In-telligence and Data Mining, vol. 4, no. 2, pp. 67–74, 2021.
Ningsih, R., and Pratama, A., “Logistic Regression Approach for Academic Performance Clas-sification,” Journal of Information Systems Education and Research, vol. 10, no. 3, pp. 112–119, 2022.
Sari, F., et al., “Feature Selection and Preprocessing to Improve Na-ive Bayes Performance in Student Achievement Prediction,” Journal of Physics: Conference Series, vol. 1567, no. 022045, pp. 1–7, 2020.
Arifin, M., and Nuraini, A., “Web-Based Prediction System for Student Academic Achieve-ment Using Naive Bayes,” Inter-national Journal of Emerging Technologies in Learning (iJET), vol. 16, no. 8, pp. 125–137, 2021.
H. Pratiwi, M. I. Sa’ad, and Salmon, “Strategi Manajemen Pendidikan Berbasis Machine Learning untuk Prediksi Prestasi Siswa,” BEduManageRs Journal: Borneo Educational Management and Research Journal, vol. 6, no. 1, pp. 21–30, Jun. 2025.
H. Pratiwi, M. I. Sa’ad, and M. A. Zakaria, “Sistem Pakar Berbasis Web Untuk Diagnosis Pe-nanganan Pasca Panen Kepala Sawit Menggunakan Metode Nive Bayes,” TAMIKA : Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akutansi, vol. 4, no. 2, pp. 259-267, Des. 2024.
P. Zhang dan Q. Yang, “Naive Bayes untuk Prediksi Prestasi Siswa,” International Journal of Data Mining in Education, vol. 8, no. 2, hlm. 101–110, 2020.
M. A. Khan, S. Hussain, dan R. Ahmad, “Penerapan Bayesian Classifiers untuk Analitik Akad-emik,” Education and Information Technologies, vol. 25, no. 5, hlm. 3921–3938, 2020.
S. Sharma dan P. Gupta, “Teknik Decision Tree untuk Memprediksi Hasil Pendidikan,” International Journal of Computer Applications, vol. 178, no. 4, hlm. 15–22, 2019.
L. Wang dan H. Chen, “Pendeka-tan Support Vector Machine un-tuk Memprediksi Keberhasilan Siswa dalam E-learning,” Proce-dia Computer Science, vol. 174, hlm. 655–664, 2020.
A. Rahman dan T. Setiawan, “Random Forest untuk Prediksi Prestasi Akademik di Perguruan Tinggi,” Journal of Computer Science and Applications, vol. 11, no. 3, hlm. 89–96, 2021.
D. H. Nugroho, “Implementasi Algoritma K-Nearest Neighbor dalam Prediksi Prestasi Siswa,” Indonesian Journal of Artificial Intelligence, vol. 6, no. 2, hlm. 73–81, 2021.








