RFE, BOXCOX, AND PCA COMPARISON FOR MULTICLASS CLASSIFI-CATION SUPPORT VECTOR MACHINE OPTIMIZATION

Abstract

Abstract: The technique of multiclass classification based on SVMs has been widely used. SVM optimization will be accomplished by examining the extraction features of Principal Component Analysis (PCA), Box-Cox Transformation, and Recursive Feature Elimination (RFE). The dataset contains 13,611 rows and 17 variables, generated from the UCI repository's multiclass dry bean data. Barbunya, Bombay, Cal, Dermas, Horoz, Seker, and Sira are just a few of the dry bean kinds available. The dataset was tested using SVM Linear kernel and SVM Radial Basis.According to the results, the combination of scale-center-BoxCox-SVM Radial extraction achieves the maximum accuracy of 93.16 percent and the shortest processing time of 6.10 minutes. 96.00 percent, 100 percent, 96.71 percent, 95.16 percent, 97.60 percent, 97.74 percent, and 91.95 percent, according to bean class.RFE-SVM Radial has a 91.18 percent accuracy and a processing time of 6.55 minutes. BoxCox outperforms conventional techniques in terms of prediction accuracy while requiring less training time.

           
Keywords: Bean, PCA, BoxCox, SVM, RFE

 

 

Abstrak: Klasifikasi Multikelas menggunakan SVM telah banyak digunakan. Pada penelitian ini akan diuji fitur ekstraksi Principal Component Analysis, Box Cox Transformation dan fitur eliminisi Recursive Feature Elimination untuk mendapatkan optimasi SVM. Dataset berasal dari data multikelas kacang kering UCI repository dengan jumlah 13.611 baris dan 17 variabel. Kelas kacang kering yakni :  Barbunya, Bombay, Cal, Dermas, Horoz, Seker dan Sira. Dataset diuji menggunakan kernel SVM Linier dan SVM Radial Basis. Didapatkan hasil, bahwa kombinasi fitur ekstraksi : scale-center-BoxCox-SVM Radial memiliki akurasi terbaik yakni 93,16% dan waktu proses 6,10 menit. Klasifikasi berdasarkan kelas kacang berturut-turut 96,00%,100%, 96,71%, 95,16%, 97,60%, 97,74% dan 91,95%. RFE- SVM Radial hanya memberikan akurasi sebesar 91,18 % dengan waktu proses sebesar 6.55 menit. Penggunaan BoxCox dibandingkan dengan lainnya, memberikan hasil prediksi lebih baik dan namun tidak mempercepat waktu pelatihan.

 

Kata kunci: BoxCox; Kacang; PCA; RFE; SVM

References

H. F. Pardede, E. Suryawati, D. Krisnandi, R. S. Yuwana, and V. Zilvan, “Machine Learning Based Plant Diseases Detection: A Review,†Proceeding - 2020 Int. Conf. Radar, Antenna, Microwave, Electron. Telecommun. ICRAMET 2020, pp. 212–217, Nov. 2020, doi: 10.1109/ICRAMET51080.2020.9298619.

M. Koklu and I. A. Ozkan, “Multiclass classification of dry beans using computer vision and machine learning techniques,†Comput. Electron. Agric., vol. 174, Jul. 2020, doi: 10.1016/J.COMPAG.2020.105507

K. G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis, “Machine Learning in Agriculture: A Review,†Sensors 2018, Vol. 18, Page 2674, vol. 18, no. 8, p. 2674, Aug. 2018, doi: 10.3390/S18082674.

E. R. Arboleda, A. C. Fajardo, and R. P. Medina, “Classification of coffee bean species using image processing, artificial neural network and K nearest neighbors,†2018 IEEE Int. Conf. Innov. Res. Dev. ICIRD 2018, pp. 1–5, Jun. 2018, doi: 10.1109/ICIRD.2018.8376326.

I. Wardhana, M. Ariawijaya, V. A. Isnaini, and R. P. Wirman, “Gradient Boosting Machine, Random Forest dan Light GBM untuk Klasifikasi Kacang Kering,†J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 1, pp. 92–99, Feb. 2022, doi: 10.29207/RESTI.V6I1.3682.

A. Kayabaşı, “An application of ANN trained by ABC algorithm for classification of wheat grains,†2018, Accessed: Nov. 16, 2021. [Online]. Available: http://earsiv.kmu.edu.tr/xmlui/handle/11492/1807.

S. Anita and Albarda, “Classification Cherry’s Coffee using k-Nearest Neighbor (KNN) and Artificial Neural Network (ANN),†2020 Int. Conf. Inf. Technol. Syst. Innov. ICITSI 2020 - Proc., pp. 117–122, Oct. 2020, doi: 10.1109/ICITSI50517.2020.9264927.

P. Müller et al., “Scent classification by K nearest neighbors using ion-mobility spectrometry measurements,†Expert Syst. Appl., vol. 115, pp. 593–606, Jan. 2019, doi: 10.1016/J.ESWA.2018.08.042.

S. Ramaswamy et al., “Multiclass cancer diagnosis using tumor gene expression signatures,†Proc. Natl. Acad. Sci., vol. 98, no. 26, pp. 15149–15154, Dec. 2001, doi: 10.1073/PNAS.211566398.

B. Direito, C. A. Teixeira, F. Sales, M. Castelo-Branco, and A. Dourado, “A Realistic Seizure Prediction Study Based on Multiclass SVM,†http://dx.doi.org/10.1142/S012906571750006X, vol. 27, no. 3, Feb. 2017, doi: 10.1142/S012906571750006X.

A. C. Lorena and A. C. P. L. F. De Carvalho, “Comparing Techniques for Multiclass Classification Using Binary SVM Predictors,†Lect. Notes Artif. Intell. (Subseries Lect. Notes Comput. Sci., vol. 2972, pp. 272–281, 2004, doi: 10.1007/978-3-540-24694-7_28.

M. Mustaqeem and M. Saqib, “Principal component based support vector machine (PC-SVM): a hybrid technique for software defect detection,†Cluster Comput., vol. 24, no. 3, pp. 2581–2595, Sep. 2021, doi: 10.1007/S10586-021-03282-8/TABLES/7.

L. J. Cao, K. S. Chua, W. K. Chong, H. P. Lee, and Q. M. Gu, “A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine,†Neurocomputing, vol. 55, no. 1–2, pp. 321–336, Sep. 2003, doi: 10.1016/S0925-2312(03)00433-8.

C. Jing and J. Hou, “SVM and PCA based fault classification approaches for complicated industrial process,†Neurocomputing, vol. 167, pp. 636–642, Nov. 2015, doi: 10.1016/J.NEUCOM.2015.03.082.

Y. Zhang, R. Xiong, H. He, and M. G. Pecht, “Lithium-Ion Battery Remaining Useful Life Prediction with Box-Cox Transformation and Monte Carlo Simulation,†IEEE Trans. Ind. Electron., vol. 66, no. 2, pp. 1585–1597, Feb. 2019, doi: 10.1109/TIE.2018.2808918.

K. B. Duan, J. C. Rajapakse, H. Wang, and F. Azuaje, “Multiple SVM-RFE for gene selection in cancer classification with expression data,†IEEE Trans. Nanobioscience, vol. 4, no. 3, pp. 228–233, Sep. 2005, doi: 10.1109/TNB.2005.853657.

M. D. Shieh and C. C. Yang, “Multiclass SVM-RFE for product form feature selection,†Expert Syst. Appl., vol. 35, no. 1–2, pp. 531–541, Jul. 2008, doi: 10.1016/J.ESWA.2007.07.043.

H. Sanz, C. Valim, E. Vegas, J. M. Oller, and F. Reverter, “SVM-RFE: Selection and visualization of the most relevant features through non-linear kernels,†BMC Bioinformatics, vol. 19, no. 1, pp. 1–18, Nov. 2018, doi: 10.1186/S12859-018-2451-4/FIGURES/16.

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machines,†Mach. Learn. 2002 461, vol. 46, no. 1, pp. 389–422, 2002, doi: 10.1023/A:1012487302797.

Published
2022-04-22
How to Cite
Wardhana, I., Isnaini, V. A., & Wirman, R. P. (2022). RFE, BOXCOX, AND PCA COMPARISON FOR MULTICLASS CLASSIFI-CATION SUPPORT VECTOR MACHINE OPTIMIZATION. JURTEKSI (Jurnal Teknologi Dan Sistem Informasi), 8(2), 231 - 238. https://doi.org/10.33330/jurteksi.v8i2.1378
Section
Articles