PERBANDINGAN ALGORITMA KLASIFIKASI SUPPORT VECTOR MACHINE DAN NAIVE BAYES PADA IMBALANCE DATA

Chika Enggar Puspita, Oktariani Nurul Pratiwi, Edi Sutoyo

Abstract


Abstract: Question classification is a computer science system, which aims to analyze questions and can label each question based on existing categories. Questions can be collected from several materials or topics that are many and different. Therefore, the researcher intends to create a classification system for quiz questions Data Warehouse and Business Intelligence which can be grouped into topics Data Warehouse, Business Intelligence, Data Analytics, and Performance Measurement. One way to solve this problem is by approach machine learning. In this study, researchers used a comparison of machine learning algorithms, namely the algorithm NaïveBayes and SupportVectorMachine using SMOTE and methods Cross-Validation The results of this study show the best accuracy results and are very helpful. The results obtained in the method cross-validation before SMOTE resulted in an accuracy rate of 82.02% for the results after going through the SMOTE stage of 94.79% on the algorithm Naïve Bayes, while the algorithm SupportVectorMachine get accuracy of 81.39% in the process before SMOTE for the results after going through SMOTE of 96.52%. 


Keywords: Cross-Validation; Machine Learning; Naive Bayes; Support Vector Machine; Question Classification

 

 

Abstrak: Klasifikasi pertanyaan merupakan sebuah sistem ilmu komputer, yang bertujuan untuk menganalisis pertanyaan serta dapat memberi label pada setiap pertanyaan berdasarkan kategori yang ada. Pertanyaan soal dapat dikumpulkan dari beberapa materi atau topik yang banyak dan berbeda. Oleh karena itu, bermaksud untuk membuat sistem klasifikasi pertanyaan soal kuis Data Warehouse dan Business Intelligence yang dapat dikelompokkan menjadi topik Data Warehouse, Business Intelligence, Data Analitik, dan Pengukuran Kinerja. Cara  yang dapat dilakukan untuk permasalahan ini dengan menggunakan pendekatan MachineLearning. Pada penelitian kali ini menggunakan perbandingan algoritma MachineLearning yaitu algoritma NaïveBayes dan SupportVectorMachine menggunakan metode SMOTE dan Cross-Validation. Hasil penelitian ini menunjukkan hasil akurasi yang terbaik dan sangat membantu. Hasil yang diperoleh pada metode cross-validation sebelum SMOTE menghasilkan tingkat akurasi sebesar 82.02% untuk hasil sesudah melalui tahap SMOTE sebesar 94.79 %  pada algoritma Naïve Bayes, sedangkan pada algoritma Support Vector Machine menghasilkan akurasi sebesar pada proses sebelum SMOTE 81.39% untuk hasil sesudah melalui SMOTE sebesar 96.52%.

 

Kata kunci: Klasifikasi Pertanyaan; Pembelajaran Mesin; Naive Bayes; Support Vector Machine; Cross-Validation


Full Text:

PDF

References


Suharyanto and adele B. L. Mailangkay, “Penerapan E-Learning Sebagai Alat Bantu Mengajar Dalam Dunia Pendidikan,” J. Ilm. Widya, vol. 3, pp. 17–21, 2016, doi: 10.1016/j.neubiorev.2016.02.001.

G. Tika and Adiwijaya, “Klasifikasi Topik Berita Berbahasa Indonesia Menggunakan Multilayer Perceptron,” e-Proceeding Eng., vol. 6, no. 2, p. 2137, 2019.

N. K. Wangsanegara and B. Subaeki, “IMPLEMENTASI NATURAL LANGUAGE PROCESSING DALAM PENGUKURAN KETEPATAN EJAAN YANG DISEMPURNAKAN (EYD) PADA ABSTRAK SKRIPSI MENGGUNAKAN ALGORITMA FUZZY LOGIC,” J. Tek. Inform., vol. 8, no. 2, 2015, doi: 10.15408/jti.v8i2.3185.

A. Aninditya, M. A. Hasibuan, and E. Sutoyo, “Text Mining Approach Using TF-IDF and Naive Bayes for Classification of Exam Questions Based on Cognitive Level of Bloom’s Taxonomy,” in 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), 2019, pp. 112–117.

S. F. Kusuma, D. Siahaan, and U. L. Yuhana, “Automatic Indonesia’s questions classification based on bloom’s taxonomy using Natural Language Processing a preliminary study,” in 2015 International Conference on Information Technology Systems and Innovation, ICITSI 2015 - Proceedings, 2016, doi: 10.1109/ICITSI.2015.7437696.

H. SITEFANUS, “ANALISIS KINERJA METODE CROSS VALIDATION DAN K-NEAREST NEIGHBOR DALAM KLASIFIKASI DATA,” pp. 7–37, 2020.

A. DPanicker, A. U, and S. Venkitakrishnan, “Question Classification using Machine Learning Approaches,” Int. J. Comput. Appl., vol. 48, no. 13, pp. 1–4, 2017, doi: 10.5120/7405-0101.

N. F. Hardifa and K. M. Lhaksmana, “Topic Classification of Islamic Question and Answer Using Naive Bayes Classifier,” vol. 4, no. August, pp. 199–204, 2019, doi: 10.21108/indojc.2019.4.2.346.

A. Anika, M. H. Rahman, S. Islam, A. S. Mohammad Mahdee Jameel, and C. R. Rahman, “A Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification,” 2019 IEEE Int. Conf. Signal Process. Information, Commun. Syst. SPICSCON 2019, pp. 82–85, 2019, doi: 10.1109/SPICSCON48833.2019.9065107.

D. Juang, “Analisis Spam dengan Menggunakan Naïve Bayes,” J. Teknovasi, vol. 3, no. 2, pp. 51–57, 2016.

I. R. Vanani, “Text analytics of customers on twitter: Brand sentiments in customer support,” J. Inf. Technol. Manag., vol. 11, no. 2, pp. 43–58, 2019, doi: 10.22059/JITM.2019.291087.2410.

E. Sutoyo and A. Almaarif, “Twitter sentiment analysis of the relocation of Indonesia’s capital city,” Bull. Electr. Eng. Informatics, vol. 9, no. 4, pp. 1620–1630, 2020.

D. Rustiana and N. Rahayu, Analisis Sentimen Pasar Otomotif Mobil: Tweet Twitter Menggunakan Naïve Bayes, vol. 8, no. 1. 2017.

B. Herwijayanti, D. E. Ratnawati, and L. Muflikhah, “Klasifikasi Berita Online d engan menggunakan Pembobotan TF-IDF dan Cosine Similarity,” vol. 2, no. 1, pp. 306–312, 2018.

S. Maldonado, J. López, and C. Vairetti, “An alternative SMOTE oversampling strategy for high-dimensional datasets,” Appl. Soft Comput. J., vol. 76, pp. 380–389, 2019, doi: 10.1016/j.asoc.2018.12.024.

E. Sutoyo and M. A. Fadlurrahman, “Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Television Advertisement Performance Rating Menggunakan Artificial Neural Network,” JEPIN (Jurnal Edukasi dan Penelit. Inform., vol. 6, no. 3, pp. 379–385.

M. M. Saritas and A. Yasar, “Performance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification,” Int. J. Intell. Syst. Appl. Eng., vol. 7, no. 2, pp. 88–91, Jun. 2019, doi: 10.18201//ijisae.2019252786.

A. Handayanto, K. Latifa, N. D. Saputro, and R. R. Waliansyah, “Analisis dan Penerapan Algoritma Support Vector Machine (SVM) dalam Data Mining untuk Menunjang Strategi Promosi,” JUITA J. Inform., vol. 7, no. 2, p. 71, 2019, doi: 10.30595/juita.v7i2.4378.

L. Demidova, E. Nikulchev, and Y. Sokolova, “Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles,” Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 5, pp. 294–312, 2016, doi: 10.14569/ijacsa.2016.070541.

O. Ghorbanzadeh, H. Rostamzadeh, T. Blaschke, K. Gholaminia, and J. Aryal, “A new GIS-based data mining technique using an adaptive neuro-fuzzy inference system (ANFIS) and k-fold cross-validation approach for land subsidence susceptibility mapping,” Nat. Hazards, vol. 94, no. 2, pp. 497–517, 2018, doi: 10.1007/s11069-018-3449-y.




DOI: https://doi.org/10.33330/jurteksi.v8i1.1185

Article Metrics

Abstract view : 1152 times
PDF - 972 times

Refbacks

  • There are currently no refbacks.


Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM) Universitas Royal

Copyright © LPPM UNIVERSITAS ROYAL

 

Lisensi Creative Commons
Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-BerbagiSerupa 4.0 Internasional.