ANALYSIS OF SVM AND NAIVE BAYES ALGORITHM IN CLASSIFICATION OF NAD LOANS IN SAVE AND LOAN COOPERATIVES

: Non-performing loan (NPL) is a risk that credit unions must face and to avoid that, prospective debtors need to be surveyed. With previous loan data, support vector machine and naïve bayes can be used as classification methods to give a decision about NPL. We use a data set with 61 data and process the data with orange 3.30 application to see the difference between SVM using linear (SVM-L), polynomial (SVM-P), RBF (SVM-R) and sigmoid (SVM-S) kernel with naïve bayes. We use a cross validation technique with various folds to measure the classification results and a convusion matrix to measure the data training classification results. Naïve bayes scores the highest in terms of accuracy and SVM-R scores the highest in terms of F1, precision and recall. SVM-P scores the lowest in terms of accuracy, F1, precision and recall. Naïve bayes scores the highest in terms of proportion of predicted for true negative class and proportion of actual for true positive class. SVM-S scores the highest in terms of proportion of predicted for true positive class and proportion of actual for true negative class. SVM-P scores the lowest in both proportion of predicted and proportion of actual.


INTRODUCTION
A cooperative, as a legal entity/business entity established by individuals or cooperative legal entities, has the aim of meeting the aspirations and needs of its members both in terms of the economy and social and cultural aspects, in accordance with the principles and spirit of cooperatives [1].
As a type of cooperative, KSP (saving and loan cooperative), which is profit-oriented (profit and loss), deals with financial problems, especially savings and loans for its members, which is an economic activity that is often carried out by Indonesian people [2].
Bad credit is one of the risks that are often encountered by savings and loan cooperatives. Bad loans occur when cooperatives as creditors have difficulty in asking for loan installments from their members for one reason or another. To avoid this, cooperatives are required to conduct a survey of prospective debtors by looking at the criteria for character, capacity, capital, collateral, and economic conditions so that the principle of providing healthy credit can be fulfilled [3].
One technique that can be used to determine whether a prospective debtor is eligible or not to receive a loan is data mining [4]. Of the several data mining methods, the support vector machine (SVM) is one method that can be used in forming a bad credit classification model [5]. As a nonparametric method, SVM can capture linear patterns from data classification and overcome overfitting so as to produce good performance [6].
Research that applies SVM in determining creditworthiness states that SVM is able to produce an accuracy of 90.42% in classifying customer data for the Daruzzakah Rensing Multipurpose Cooperative [7].
Research that applies SVM to predicting the value of working capital credit approvals states that SVM is able to produce an average accuracy of 69% in predicting the value of working capital loans approved by one of the commercial banks [8].Research that applies SVM to predicting credit risk states that SVM is able to produce an average accuracy of 80.95% in predicting 63 cases of loan applications at a bank in Palu City [9].
Research that applies SVM to the classification of credit loan approvals states that SVM is able to produce an average accuracy of 94.29% in classifying data on the LendingClud loan dataset [10].Research that applies SVM to the prediction of credit payment failure states that SVM is able to produce an average accuracy of 82.06% in classifying Equity Bank customer data in the city of Nairobi [11].
In addition to SVM, another method that is often used in research on cases of bad credit classification because of its high level of accuracy is Naive Bayes [12]. This method uses the principles of probability and statistics to calculate the prior and posterior probability values in the data, which are then compared to obtain the likelihood value, and then the evidence value is calculated in order to obtain a classification decision [13].Research that applies nave bayes to the selection of bank credit debtors states that nave bayes is able to produce an accuracy of 95% in classifying 500 loan data at a bank in the city of Bandung [14]. Research that applies nave Bayes to credit risk prediction states that nave Bayes is able to produce an accuracy of 84% in predicting credit risk in the Jakarta Teacher Family Cooperative data [15].
Research that applies Naive Bayes to the prediction of non- performing loans states that Naive Bayes is able to produce an accuracy of 68.73% in predicting the risk of non-performing loans based on the category of level of guarantee provided, loan amount, and loan interest rate [16]. Research that applies naive Bayes to predicting the potential risk of car ownership credit states that naive Bayes is able to produce an average accuracy of 78.47% in predicting 560 customer data points from a car leasing company in the city of Cikupa, Tangerang [17].
Research comparing the SVM and Naive Bayes algorithms states that SVM has better accuracy (85.62%) than Naive Bayes (83.24%) in predicting the quality of credit applications using 193 data points on customer loan history at a savings and loan cooperative [18].
Research comparing the SVM and naive bayes algorithms states that SVM has better accuracy (89.86%) than naive bayes (77.29%) in classifying customer financing approvals using 869 data on customer financing in one of the sharia cooperatives [19]. This study compares the SVM method with Naive Bayes in classifying bad credit customers in terms of the value of train time, test time, accuracy, precision, and recall of each method. By using the data set obtained from the Mutiara Sejahtera cooperative in the form of a history of lending funds and data on members of the cooperative, it is analyzed which method is better in classifying bad loans based on the data set.

METHOD
The data set used is the data of Mutiara Sejahtera cooperative members, with a total of 61 data points. This data consists of 5 feature classes such as permanent employees, length of membership, number of loans and loans from other places, and 1 target class, namely bad loans. Table 1 shows the 10 sample data sets used.
The data set is processed using the Orange 3.30 application with a model form as shown in Figure 1. The widgets used are File, Data Sampler, SVM-L learner, SVM-P learner, SVM-R learner, SVM-S learner, Nave Bayes learner, Test and Score, and confusion matrix. The File Widget is used to open the Cooperative Data file and select category F as the target class and remove the unneeded No and Name categories. The Data Sampler widget is used to randomly take training data and test data with a proportion of 80% of the data set as training data and 20% of the data set as test data. The File Widget is used to open the Cooperative Data file and select category F as the target class and remove the unneeded No and Name categories. The Data Sampler widget is used to randomly take training data and test data with a proportion of 80% of the data set as training data and 20% of the data set as test data.
The SVM-L learner widget is used to process data using a linear SVM kernel. The linear kernel in the SVM method can be calculated by equation (1) [20]: An SVM-P learner widget is used to process data using an SVM kernel polynomial. The kernel polynomial in the SVM method can be calculated by equation (2) [21]: The SVM-R learner widget is used to process data using the SVM kernel RBF. The kernel RBF in the SVM method can be calculated by equation (3 The test and score widget is used to evaluate the classification using the cross validation technique with a variation of the fold value of 2, 3, 5, 10 and 20. A cross validation technique is used to generate classification accuracy, F1, precision, and recall values for each method, which are calculated using equations (5) to equation (8) [24]: (8) Where: TP = True Positive; customers whose loans are bad are classified as bad loans. TN = True Negative; customers whose credit is not bad are classified as non-performing loans. FP = False Positive; customers whose loans are not bad are classified as bad loans. FN = False Negative; customers whose bad loans are classified as non-performing loans.
The confusion matrix widget is used to generate the percentage of predicted and proportion of actual from each learner widget. The value of the proportion of predicted and the proportion of actual is used as an evaluation of the classification results based on the predicted results of the training data for each method.

RESULT AND DISCUSSION
By using cross validation with the number of folds of 2, 3, 5, 10, and 20, the accuracy, F1, precision, and recall values for each method are obtained as shown in Table 2.
The average value of accuracy, F1, precision, and recall of each method in Table 2 and Table 3 is calculated by adding up each value of accuracy, F1, precision, and recall obtained from cross validation with a number of folds of 2, 3, 5, 10, and 20 and then dividing by 5. Table 3 shows the results of the comparison of the average values of accuracy, F1, precision, and recall of each method. From Table 3, it is found that the highest value is the nave Bayes method (0.955102041), while the lowest value is the SVM kernel polynomial method (0.893877551). For the value of F1, from Table 3 it is obtained that the highest value is the SVM kernel RBF method (0.95514294), while the lowest value is the SVM kernel polynomial method (0.893884313). Table 4 and Table 5 show the results of the convusion matrix of each method in terms of the proportion of predicted and proportion of actual values for each method.    Table 6 shows the results of the comparison of classification accuracy on the training data of each method.
From Table 6, it can be seen that for the proportion of predicted, the Naive Bayes method has the highest accuracy of training data classification with the target class "Not Bad Credit" with a value of 96.28%, and the SVM kernel polynomial method has the lowest accuracy with a value of 91.06%. The sigmoid kernel SVM method has an accuracy of training data classification with the highest target class "Bad Credit" with a value of 95.44%.
For the proportion of actual, the SVM kernal sigmoid method has the highest accuracy of training data classification with the target class "Not Bad Credit" with a value of 96.3%, and the SVM kernel polynomial method has the lowest accuracy with a value of 89.64%. The nave Bayes method has an accuracy of training data classification with the highest target class "Bad Credit" with a value of 95.5%, and the SVM kernel polynomial method has the lowest accuracy with a value of 89.1%.

CONCLUSION
The results of the classification of bad loans at Mutiara Sejahtera cooperatives using the SVM and Naive Bayes methods indicate that the nave Bayes method has a better performance than the SVM algorithm in classifying bad loans at Mutiara Sejahtera cooperatives, judging from the values of accuracy, F1, precision and recall respectively. algorithm. With an average value of accuracy, F1, precision, and recall above 90%, the four SVM kernels used have proven to have very good performance in their classification results, so they can still be used as alternative models to solve similar problems, even though their performance is below naive bayes algorithm. Both methods, both SVM and Naive Bayes, have very good performance in classifying data into "Bad Credit" and "Not Bad Credit" categories, as can be seen from the results of the cross validation evaluation produced.