PERBANDINGAN ALGORITMA KLASIFIKASI SUPPORT VECTOR MACHINE DAN NAIVE BAYES PADA IMBALANCE DATA

Chika Enggar Puspita; Oktariani Nurul Pratiwi; Edi Sutoyo

doi:10.33330/jurteksi.v8i1.1185

Chika Enggar Puspita Telkom University
Oktariani Nurul Pratiwi Telkom University
Edi Sutoyo Telkom University

DOI: https://doi.org/10.33330/jurteksi.v8i1.1185

Abstract

Abstract: Question classification is a computer science system, which aims to analyze questions and can label each question based on existing categories. Questions can be collected from several materials or topics that are many and different. Therefore, the researcher intends to create a classification system for quiz questions Data Warehouse and Business Intelligence which can be grouped into topics Data Warehouse, Business Intelligence, Data Analytics, and Performance Measurement. One way to solve this problem is by approach machine learning. In this study, researchers used a comparison of machine learning algorithms, namely the algorithm NaÃ¯veBayes and SupportVectorMachine using SMOTE and methods Cross-Validation The results of this study show the best accuracy results and are very helpful. The results obtained in the method cross-validation before SMOTE resulted in an accuracy rate of 82.02% for the results after going through the SMOTE stage of 94.79% on the algorithm NaÃ¯ve Bayes, while the algorithm SupportVectorMachine get accuracy of 81.39% in the process before SMOTE for the results after going through SMOTE of 96.52%.

Keywords: Cross-Validation; Machine Learning; Naive Bayes; Support Vector Machine; Question Classification

Abstrak: Klasifikasi pertanyaan merupakan sebuah sistem ilmu komputer, yang bertujuan untuk menganalisis pertanyaan serta dapat memberi label pada setiap pertanyaan berdasarkan kategori yang ada. Pertanyaan soal dapat dikumpulkan dari beberapa materi atau topik yang banyak dan berbeda. Oleh karena itu, bermaksud untuk membuat sistem klasifikasi pertanyaan soal kuis Data Warehouse dan Business Intelligence yang dapat dikelompokkan menjadi topik Data Warehouse, Business Intelligence, Data Analitik, dan Pengukuran Kinerja. Cara yang dapat dilakukan untuk permasalahan ini dengan menggunakan pendekatan MachineLearning. Pada penelitian kali ini menggunakan perbandingan algoritma MachineLearning yaitu algoritma NaÃ¯veBayes dan SupportVectorMachine menggunakan metode SMOTE dan Cross-Validation. Hasil penelitian ini menunjukkan hasil akurasi yang terbaik dan sangat membantu. Hasil yang diperoleh pada metode cross-validation sebelum SMOTE menghasilkan tingkat akurasi sebesar 82.02% untuk hasil sesudah melalui tahap SMOTE sebesar 94.79 % pada algoritma NaÃ¯ve Bayes, sedangkan pada algoritma Support Vector Machine menghasilkan akurasi sebesar pada proses sebelum SMOTE 81.39% untuk hasil sesudah melalui SMOTE sebesar 96.52%.

Kata kunci: Klasifikasi Pertanyaan; Pembelajaran Mesin; Naive Bayes; Support Vector Machine; Cross-Validation

Author Biography

Chika Enggar Puspita, Telkom University

Mahasiswa S1 Sistem Informasi Telkom University

References

Suharyanto and adele B. L. Mailangkay, â€œPenerapan E-Learning Sebagai Alat Bantu Mengajar Dalam Dunia Pendidikan,â€ J. Ilm. Widya, vol. 3, pp. 17â€“21, 2016, doi: 10.1016/j.neubiorev.2016.02.001.

G. Tika and Adiwijaya, â€œKlasifikasi Topik Berita Berbahasa Indonesia Menggunakan Multilayer Perceptron,â€ e-Proceeding Eng., vol. 6, no. 2, p. 2137, 2019.

N. K. Wangsanegara and B. Subaeki, â€œIMPLEMENTASI NATURAL LANGUAGE PROCESSING DALAM PENGUKURAN KETEPATAN EJAAN YANG DISEMPURNAKAN (EYD) PADA ABSTRAK SKRIPSI MENGGUNAKAN ALGORITMA FUZZY LOGIC,â€ J. Tek. Inform., vol. 8, no. 2, 2015, doi: 10.15408/jti.v8i2.3185.

A. Aninditya, M. A. Hasibuan, and E. Sutoyo, â€œText Mining Approach Using TF-IDF and Naive Bayes for Classification of Exam Questions Based on Cognitive Level of Bloomâ€™s Taxonomy,â€ in 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), 2019, pp. 112â€“117.

S. F. Kusuma, D. Siahaan, and U. L. Yuhana, â€œAutomatic Indonesiaâ€™s questions classification based on bloomâ€™s taxonomy using Natural Language Processing a preliminary study,â€ in 2015 International Conference on Information Technology Systems and Innovation, ICITSI 2015 - Proceedings, 2016, doi: 10.1109/ICITSI.2015.7437696.

H. SITEFANUS, â€œANALISIS KINERJA METODE CROSS VALIDATION DAN K-NEAREST NEIGHBOR DALAM KLASIFIKASI DATA,â€ pp. 7â€“37, 2020.

A. DPanicker, A. U, and S. Venkitakrishnan, â€œQuestion Classification using Machine Learning Approaches,â€ Int. J. Comput. Appl., vol. 48, no. 13, pp. 1â€“4, 2017, doi: 10.5120/7405-0101.

N. F. Hardifa and K. M. Lhaksmana, â€œTopic Classification of Islamic Question and Answer Using Naive Bayes Classifier,â€ vol. 4, no. August, pp. 199â€“204, 2019, doi: 10.21108/indojc.2019.4.2.346.

A. Anika, M. H. Rahman, S. Islam, A. S. Mohammad Mahdee Jameel, and C. R. Rahman, â€œA Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification,â€ 2019 IEEE Int. Conf. Signal Process. Information, Commun. Syst. SPICSCON 2019, pp. 82â€“85, 2019, doi: 10.1109/SPICSCON48833.2019.9065107.

D. Juang, â€œAnalisis Spam dengan Menggunakan NaÃ¯ve Bayes,â€ J. Teknovasi, vol. 3, no. 2, pp. 51â€“57, 2016.

I. R. Vanani, â€œText analytics of customers on twitter: Brand sentiments in customer support,â€ J. Inf. Technol. Manag., vol. 11, no. 2, pp. 43â€“58, 2019, doi: 10.22059/JITM.2019.291087.2410.

E. Sutoyo and A. Almaarif, â€œTwitter sentiment analysis of the relocation of Indonesiaâ€™s capital city,â€ Bull. Electr. Eng. Informatics, vol. 9, no. 4, pp. 1620â€“1630, 2020.

D. Rustiana and N. Rahayu, Analisis Sentimen Pasar Otomotif Mobil: Tweet Twitter Menggunakan NaÃ¯ve Bayes, vol. 8, no. 1. 2017.

B. Herwijayanti, D. E. Ratnawati, and L. Muflikhah, â€œKlasifikasi Berita Online d engan menggunakan Pembobotan TF-IDF dan Cosine Similarity,â€ vol. 2, no. 1, pp. 306â€“312, 2018.

S. Maldonado, J. LÃ³pez, and C. Vairetti, â€œAn alternative SMOTE oversampling strategy for high-dimensional datasets,â€ Appl. Soft Comput. J., vol. 76, pp. 380â€“389, 2019, doi: 10.1016/j.asoc.2018.12.024.

E. Sutoyo and M. A. Fadlurrahman, â€œPenerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Television Advertisement Performance Rating Menggunakan Artificial Neural Network,â€ JEPIN (Jurnal Edukasi dan Penelit. Inform., vol. 6, no. 3, pp. 379â€“385.

M. M. Saritas and A. Yasar, â€œPerformance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification,â€ Int. J. Intell. Syst. Appl. Eng., vol. 7, no. 2, pp. 88â€“91, Jun. 2019, doi: 10.18201//ijisae.2019252786.

A. Handayanto, K. Latifa, N. D. Saputro, and R. R. Waliansyah, â€œAnalisis dan Penerapan Algoritma Support Vector Machine (SVM) dalam Data Mining untuk Menunjang Strategi Promosi,â€ JUITA J. Inform., vol. 7, no. 2, p. 71, 2019, doi: 10.30595/juita.v7i2.4378.

L. Demidova, E. Nikulchev, and Y. Sokolova, â€œBig Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles,â€ Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 5, pp. 294â€“312, 2016, doi: 10.14569/ijacsa.2016.070541.

O. Ghorbanzadeh, H. Rostamzadeh, T. Blaschke, K. Gholaminia, and J. Aryal, â€œA new GIS-based data mining technique using an adaptive neuro-fuzzy inference system (ANFIS) and k-fold cross-validation approach for land subsidence susceptibility mapping,â€ Nat. Hazards, vol. 94, no. 2, pp. 497â€“517, 2018, doi: 10.1007/s11069-018-3449-y.