SENTIMENT ANALYSIS USING MACHINE LEARNING FOR DIGITAL SERVICE DEVELOPMENT

Rugaiyah Balqis; Jahda Rusti Putri; Mira Afrina; Ali  Ibrahim; Fathoni Fathoni

doi:10.33330/jurteksi.v12i2.4476

Rugaiyah Balqis Sriwijaya University
Jahda Rusti Putri Sriwijaya University
Mira Afrina Sriwijaya University
Ali Ibrahim Sriwijaya University
Fathoni Fathoni Sriwijaya University

DOI: https://doi.org/10.33330/jurteksi.v12i2.4476

Keywords: sentiment analysis; machine learning; SMOTE; TF-IDF; text classification

Abstract

Abstract: The rapid growth of e-commerce mobile applications has generated large volumes of user reviews, making manual sentiment analysis increasingly impractical. This study aims to compare the effectiveness of three machine learning algorithms Support Vector Machine (SVM), Random Forest, and Naive Bayes for automated sentiment classification of Indonesian-language mobile application reviews. A dataset of 3,000 user reviews from the RupaRupa application on the Google Play Store was collected and preprocessed through normalization, tokenization, stopword removal, and stemming. TF-IDF vectorization was applied for feature extraction, while the Synthetic Minority Over-sampling Technique (SMOTE) was used to address class imbalance across three sentiment categories: positive, negative, and neutral. The results show that SVM achieved the highest accuracy of 90.02%, while Random Forest obtained the best F1-score of 88.08% when sufficient training data were available. Naive Bayes demonstrated relatively stable performance across varying training data sizes. Furthermore, TF-IDF keyword analysis revealed that negative reviews were primarily associated with delivery issues, technical problems, and pricing concerns. These findings demonstrate the effectiveness of machine learning approaches for sentiment classification and provide practical insights for improving mobile application services.

Keywords: sentiment analysis; machine learning; SMOTE; TF-IDF; text classification

Abstrak: Pertumbuhan pesat aplikasi mobile e-commerce telah menghasilkan volume ulasan pengguna yang sangat besar, sehingga analisis sentimen secara manual menjadi semakin tidak praktis. Penelitian ini bertujuan untuk membandingkan efektivitas tiga algoritma machine learning Support Vector Machine (SVM), Random Forest, dan Naive Bayes dalam melakukan klasifikasi sentimen otomatis terhadap ulasan aplikasi mobile berbahasa Indonesia. Dataset yang digunakan terdiri dari 3.000 ulasan pengguna aplikasi RupaRupa yang dikumpulkan dari Google Play Store. Data kemudian diproses melalui tahapan preprocessing yang meliputi normalisasi, tokenisasi, penghapusan stopword, dan stemming. Ekstraksi fitur dilakukan menggunakan metode Term Frequency–Inverse Document Frequency (TF-IDF), sedangkan ketidakseimbangan kelas ditangani menggunakan Synthetic Minority Over-sampling Technique (SMOTE) pada tiga kategori sentimen, yaitu positif, negatif, dan netral. Hasil penelitian menunjukkan bahwa SVM mencapai tingkat akurasi tertinggi sebesar 90,02%, sementara Random Forest memperoleh nilai F1-score terbaik sebesar 88,08% ketika tersedia data pelatihan yang memadai. Naive Bayes menunjukkan performa yang relatif stabil pada berbagai ukuran data pelatihan. Selain itu, analisis kata kunci berbasis TF-IDF mengungkapkan bahwa ulasan negatif terutama berkaitan dengan masalah pengiriman, kendala teknis aplikasi, dan isu harga. Temuan ini menunjukkan bahwa pendekatan machine learning efektif untuk klasifikasi sentimen serta memberikan wawasan yang bermanfaat dalam meningkatkan kualitas layanan aplikasi mobile.

Kata Kunci: analisis sentimen; pembelajaran mesin; SMOTE; TF-IDF; klasifikasi teks.

References

S. Sharma and A. Sharma, “Insights into customer engagement in a mobile app context: review and research agenda,” 2024, Cogent OA. doi: 10.1080/233 11975.2024.2382922.

J. S. Nayyar, T. Khosla, and V. K. Saini, “Trend Analysis of E Comme rce,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, no. 5, pp. 6455–6463, May 2023, doi: 10.22214/ijraset .2023.53203.

R. Moosa, “Service Quality Preferences Among Customers at Islamic Banks in South Africa,” International Journal of Professional Business Review, vol. 8, no. 10, p. e03281, Oct. 2023, doi: 10.2 6668/businessreview/2023.v8i10.3281.

S. Nauhaus, J. Luger, and S. Raisch, “Strategic Decision Making in the Digital Age: Expert Sentiment and Corporate Capital Allocation,” Journal of Management Studies, vol. 58, no. 7, pp. 1933–1961, Nov. 2021, doi: 10.1111/joms.12742.

F. B. Harlan, Y. Tarigan, S. Riadi, and A. M. Sitompul, “Analysis of E-Commerce Logistic Service Quality on Customer Satisfaction, Loyalty, and Brand Image in Indonesia,” International Review of Management and Marketing , vol. 15, no. 1, pp. 118–127, 2025, doi: 10.32479/irmm.17503.

Z. Jiang, V. Liu, and M. Erne, “Examining the Usefulness of Customer Reviews for Mobile Applications: The Role of Developer Responsiveness,” Journal of Database Management, vol. 35, no. 1, 2024, doi: 10.4018/JDM .343543.

E. S. Yusifov, “An Intelligent System for Assessing the Emotional Conno tation of Textual Statements,” Wave Electronics and its Application in Infor mation and Telecommunication Syste ms, WECONF - Conference Proceedi ngs, 2022, doi: 10.1109/ WECONF55058.2022.9803516.

E. Madyatmadja, Shinta, D. Susanti, F. Anggreani, and D. Sembiring, “Sentiment Analysis on User Reviews of Mutual Fund Applications,” Journal of Computer Science, vol. 18, pp. 885–895, Feb. 2022, doi: 10.3844/jcs sp.2022.885.895.

K. L. Tan, C. P. Lee, and K. M. Lim, “A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research,” Applied Sciences, vol. 13, no. 7, 2023, doi: 10.3390/app13074550.

L. Xiao, Q. Li, Q. Ma, J. Shen, Y. Yang, and D. Li, “Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec,” PLoS One, vol. 19, Feb. 2024, doi: 10.1371/journal.pone.0305095.

N. Raveendhran and N. Krishnan, “A novel hybrid SMOTE oversampling approach for balancing class distribu tion on social media text,” Bulletin of Electrical Engineering and Informatics, vol. 14, no. 1, pp. 638–646, Feb. 2025, doi: 10.11591/eei.v14i1.8380.

X. Zhang, “Performance Evaluation of Reddit Comments Using Machine Learning and Natural Language Processing Methods in Sentiment Analysis,” Mechanisms and Machine Science, vol. 173, pp. 14–24, 2025, doi: 10.1007/978-3-031-77489-8_2.

N. Birannavar, “Performance Evaluation of Sentiment Analysis on Reddit Comments: Insights and Improvement Opportunities for Naive Bayes, SVM, and BERT Models,” ICCECE 2025 - International Conference on Computer, Electrical and Communication Engineering, 2025, doi: 10.1109/ICCECE61355.2025.1 0940395.

S. Ramakrishnan, “Improving Multi-Label Emotion Classification on Imbalanced Social Media Data With BERT and Clipped Asymmetric Loss,” IEEE Access, vol. 13, pp. 60589–60601, 2025, doi: 10.1109/ACCESS.2025.3557091.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed. Tools Appl., vol. 82, pp. 3713–3744, 2017, [Online]. Available: https://api.semanticscholar.org/CorpusID:7678100