COMPARISON SVM, RF, BERT PUBLIC SENTIMENT DATA MBG IN X
Abstract
Abstract: MBG is a strategic program of the Prabowo-Gibran administration. This program has become a widely discussed issue in the public. To better understand public perception of this program, sentiment analysis is necessary. This study aims to compare the performance of algorithms machine learning SVM, RF, And BERT with preprocessing data analyzing public sentiment of the MBG program in media X. The total dataset for this study was 39,858 out of 42,465 successfully crawled tweets. The research methods included data collection, preprocessing data (cleaning, case folding, word normalization, stopword removal and stemming), feature extraction, model training (fine-tuning), handling class imbalance with SMOTE, and evaluation using accuracy, precision, recall, and f1-score. The research results show that without SMOTE, the best performing models are BERT with 89% accuracy, SVM 87%, and RF 78.4%. After SMOTE, the best algorithms were SVM with 92.94%, BERT with 88.3%, and RF with 86.59%. The results confirmed that SVM is the best algorithm if at leastclass imbalance. BERT is the best algorithm before and after SMOTE, because BERT is more effective in capturing the nuances of language on social media, so BERT is the most recommended in MBG sentiment analysis.
Keywords: sentiment analysis; machine learning; SVM, RF, and BERT
Abstrak: MBG merupakan program strategis pemerintahan Prabowo - Gibran. Program ini menjadi isu yang banyak diperbincangkan publik. Untuk mengetahui lebih dalam persepsi masyrakat tentang program ini, perlu dilakukan analisis sentiment. Penelitian ini bertujuan membandingkan kinerja algoritma machine learning SVM, RF, dan BERT dengan preprocessing data menganalisis sentiment public program MBG di media X. Total dataset penelitian ini adalah 39.858 dari 42.465 tweet yang berhasil di crawling. Metode penelitian mencakup pengumpulan data, preprocessing data (cleaning, case folding, normalisasi kata, stopword removal dan stemming), ekstraksi fitur, pelatihan model (fine-tuning), penanganan class imbalance dengan SMOTE, dan evaluasi menggunakan akurasi, presisi, recall, dan f1-score. Hasil peneltian menunjukkan, tanpa SMOTE model dengan kinerja terbaik adalah BERT dengan akurasi 89%, SVM 87%, dan RF 78,4%. Setelah SMOTE algoritma terbaik adalah SVM 92,94%, BERT 88,3% dan RF 86,59%. Hasil penelitian menegaskan bahwa SVM adalah algoritma terbaik jika minimal class imbalance. BERT adalah algoritma terbaik sebelum dan sesudah SMOTE, karena BERT lebih efektif dalam menangkap nuansa bahasa pada media sosial, sehingga BERT paling di rekomendasikan dalam analisis sentimen MBG.
Kata kunci: analisis sentimen; machine learning; SVM, RF, dan BERT
References
Hidayat EY, Hardiansyah RW, Affandy A. Analisis Sentimen Twitter untuk Menilai Opini Ter-hadap Perusahaan Publik Menggunakan Algoritma Deep Neural Network. Jurnal Nasional Teknologi Dan Sistem Informasi 2021;7:108–18. https://doi.org/10.25077/teknosi.v7i2.2021.108-118.
Pinza-Jiménez CJ, Garcés-Gómez YA. Assessing the performance of random forest regression for esti-mating canopy height in tropical dry forests. International Journal of Electrical and Computer Engi-neering 2023;13:6787–96. https://doi.org/10.11591/ijece.v13i6.pp6787-6796.
Pusung EM, Dewi IN. Optimasi RoBERTa dengan Hyperparameter Tuning untuk Deteksi Emosi ber-basis Teks. Jurnal Nasional Teknologi Dan Sistem Informasi 2025;10:240–8. https://doi.org/10.25077/teknosi.v10i3.2024.240-248.
Putra LGR, Prasetya DD, Mayadi M. Student Dropout Prediction Using Random Forest and XGBoost Method. INTENSIF: Jurnal Ilmiah Penelitian Dan Penerapan Teknologi Sistem Informasi 2025;9:147–57. https://doi.org/10.29407/intensif.v9i1.21191.
Anjani AF, Anggraeni D, Tirta IM. Implementasi Random Forest Menggunakan SMOTE untuk Analisis Sentimen Ulasan Aplikasi Sister for Students UNEJ. Jurnal Nasional Teknologi Dan Sistem Informasi 2023;9:163–72. https://doi.org/10.25077/teknosi.v9i2.2023.163-172.
Oktafiani R, Rianto R. Per-bandingan Algoritma Support Vector Machine (SVM) dan Deci-sion Tree untuk Sistem Rekomen-dasi Tempat Wisata. Jurnal Na-sional Teknologi Dan Sistem In-formasi 2023;9:113–21. https://doi.org/10.25077/teknosi.v9i2.2023.113-121.
Friadi J, Kurniawan DE. Analisis Sentimen Ulasan Wisatawan Ter-hadap Alun-Alun Kota Batam: Perbandingan Kinerja Metode Na-ive Bayes dan Support Vector Ma-chine. Jurnal Sistem Informasi Bisnis 2024;14:403–7. https://doi.org/10.21456/vol14iss4pp403-407.
Riyadi S, Salsabila LK, Damarjati C, Karim RA. INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi 233 Sentiment Analysis of YouTube Users on Blackpink Kpop Group Using IndoBERT 1 *. INTENSIF 2024;8:2549–6824. https://doi.org/10.29407/intensif.v8n2.22678.
Furqan M, Kurniawan R, HP KI. Evaluasi Performa Support Vector Machine Classifier Terhadap Pen-yakit Mental. JURNAL SISTEM INFORMASI BISNIS 2020;10:203–10. https://doi.org/10.21456/vol10iss2pp203-210.
Nissa NK, Yulianti E. Multi-label text classification of Indonesian customer reviews using bidirec-tional encoder representations from transformers language model. International Journal of Electrical and Computer Engineering 2023;13:5641–52. https://doi.org/10.11591/ijece.v13i5.pp5641-5652.
Hossain MT, Talukder MAR, Ja-han N. Depression prognosis using natural language processing and machine learning from social me-dia status. International Journal of Electrical and Computer Engineer-ing 2022;12:2847–55. https://doi.org/10.11591/ijece.v12i3.pp2847-2855.
Tally MT, Amintoosi H. A hybrid method of genetic algorithm and support vector machine for intru-sion detection. International Jour-nal of Electrical and Computer Engineering 2021;11:900–8. shttps://doi.org/10.11591/ijece.v11i1.pp900-908.
Sudarma M, Sulaksono J. Imple-mentation of TF-IDF Algorithm to detect Human Eye Factors Affect-ing the Health Service System. INTENSIF: Jurnal Ilmiah Penelitian Dan Penerapan Teknologi Sistem Informasi 2020;4:123–30. https://doi.org/10.29407/intensif.v4i1.13858.
Yatoo NA, Ali IS, Mirza I. Com-paring hyperparameter optimized support vector machine, multi-layer perceptron and bagging clas-sifiers for diabetes mellitus predic-tion. International Journal of Elec-trical and Computer Engineering 2024;14:5834–47. https://doi.org/10.11591/ijece.v14i5.pp5834-5847.
Thomas B, Chandra J. Random forest application on cognitive level classification of E-learning content. International Journal of Electrical and Computer Engineering 2020;10:4372–80. https://doi.org/10.11591/ijece.v10i4.pp4372-4380.
Nabiilah GZ, Alam IN, Purwanto ES, Hidayat MF. Indonesian mul-tilabel classification using In-doBERT embedding and MBERT classification. International Journal of Electrical and Computer Engi-neering 2024;14:1071–8. https://doi.org/10.11591/ijece.v14i1.pp1071-1078.








