DETECTION OF CHILDREN'S NUTRITIONAL STATUS USING MACHINE LEARNING WITH LOGISTIC REGRESSION ALGORITHM

: Children's nutritional issues are an important concern for parents to pay attention to growth and development, especially health and well-being. According to the results of the Ministry of Health's Indonesian Nutrition Status Survey (SSGI), there are 4 nutritional problems for children in Indonesia, namely stunting, wasting, underweight and everweight. In this research, how to predict signs of symptoms of a decline in a child's nutritional status using a machine learning algorithm, a prediction model was designed using logistic regression in Python IDE to predict whether a child is indicated by a decline in nutrition or not. Dataset from Bengkayang Community Health Center data consisting of 657 pediatric patient data. The dataset is divided into 7 features (independent variables) and 1 predictor (dependent variable). Test results show perfect performance with precision, recall, F1-score, accuracy values of 100%. Then the visualization results on the ROC (Receiver Operating Characteristic) curve to depict the TP (True Positive) value on the Y axis against the FP (false Positive) value on the become overfit. It is recommended that in preparing the training dataset, measure the training data and reduce the features, after carrying out feature selection to increase the accuracy of the model.


INTRODUCTION
Children are the assets of the nation and future generations who will continue the future progress and development of the Nation [1].So that the state's hopes and attention are focused on the growth and development of children.The 1945 Constitution Article 28B paragraph 2 reads "Every child has the right to survival, growth and development and the right to protection from violence and discrimination.In their growth, children must be prosperous, nurtured, protected and guided with love and affection, especially in their family environment [2].
In Indonesia, the problem of child nutrition is still quite high, which begins with weight loss.According to the results of the Indonesian Nutrition Status Survey (SSGI) of the Ministry of Health, there are 4 child nutrition problems in Indo nesia, namely stunting, wasting, underw eight, and everweight.Stunting is one of the nutritional problems that is of concern to the government and society because the rate of spread is still quite high, in 2022 it is estimated to reach 21.6%.Another nutritional problem is wasting or thinness.According to the SSGI 2022, the prevalence of wasting in Indonesia increased by 0.6 points from 7.1% to 7.7 last year [3].
The problem of child nutrition is certainly an important concern for parents to pay attention to child growth and development, especially the health and welfare of children as the successor of the family.Parents need to pay attention to healthy nutrition that is balanced and a healthy lifestyle and positive attitude.Nutritional problems for children can lead to various diseases such as growth failure, undernutrition and malnutrition, premature babies, low birth weight babies, cow's milk protein aller gies and congenital metabolic disorders.Another problem has a major impact on the number of stunting cases.One of the government's current focuses is stunting prevention.
This effort aims for Indonesian children to grow and develop healthily by having emotional, social, and physical readiness to learn and innovate and comp ete at the global level.This needs attention from stakeholders to monitor the improvement of toddler growth, posyandu, increase access to medical services for sick children under five, improve the quality of household drink ing water, correct defecation behavior, and provide BANSOS.In order to create healthy child development for the next generation of a healthy nation.
Referring to previous researchers on the topic of applying Artificial Intelligence-based information technol ogy in the field of Machine Learning by taking the problem of early detection of diabetes with experimental results show ing that hyperparameter tuning-based models that can improve performance to predict accuracy values of 82%, 81% precision, 79% recall and 80% F1-score [4].
Analyzing machine learning mod els for human activity recognition with research results using accuracy and preci sion efficiency approaches [5].Applying Hybrid machine learning in heart disease prediction models results in an accuracy value of 84.48% increasing to 1.32% [6].Comparing the upport Vector Machine (SVM) and Artificial Neural Network (ANN) methods on Toddler Nutrition Classification Case Study at Salissingan Health Center from the results of the analysis obtained the size of getting the best method [7].
Prediction of stunting in toddlers by applying the k-Nearest Neighbors classification algorithm based on the parameters used in the dataset.From the system built, it shows an accuracy of 97% based on data partitioning testing in the confusion matrix using data partitions 90% training data and 10% testing data with k = 5 nearest neighbors [8].
In this study, which is a differentiator from previous researchers who need to be followed up on how to predict the detection of signs of symptoms of decreased nutritional status of children using machine learning algor ithms.In this study, a prediction model is designed using logistic regression in Python IDE for the detection of symptoms of nutritional status decline by providing predictions of children who are indicated to be malnourished or not malnourished based on the initial data provided.experiments were conducted using datasets from bengkayang puskes mas data using measurements of independent variables and dependent variables.[9][10] [11]

METHOD
This research uses the logistic regression method.The dataset that has been obtained is divided according to the criteria, then testing is carried out by dividing the data, applying the logistic regression method for modeling, then measuring the performance of the model.The following is the flow of research conducted can be seen in image 1.

Image 1. Research flow Logistic Regression
Logistic regression is used to predict binary categories (0 or 1), with only two possibilities as predictions 'yes or 'no, win/lose, happy/unhappy etc.'.These predictions are made based on one or more features (independent variables) that become predictors (dependent variables).Each feature will be given its own weight.[12][13][14] [15] Image 2. Feature and weight simulation to generate predictions Logistic regression is a method in the field of statistics used by machine learning to be processed in computers or we can call it a logistic function or Sigmoid function, which produces output in the form of a curve denoted by the letter S with a value between 0 and 1.The logistic regression model is used to model the probability (possibility) if it is greater than 0.5 then it can be considered that the input value inputted into the model will be categorized into class 0, otherwise if the probability number is lower than 0.5 then the input value will be categorized into class 1. Logistic regression requires one or more features as predictors.If there are as many as n features (representing the letter x) then to get an output will require n + 1 coefficients (represented by b) as follows: Output = bo + b1X1 + b2X2 + ... bnXn This method finds the best values for the coefficients bo, b1, b2, and bn based on calculations using the available training dataset.

Child Nutrition Status Dataset
The dataset for predicting the

FP TN
In table 1 TP is the number of positive data and positive prediction results.TN is the number of positive data and negative prediction results.FP is the amount of negative data and positive prediction results.And TN is the amount of negative data and negative prediction results.Confusion matrix consists of several calculations: 1. Precision is a calculation to determine the number of true positive data from all true positive prediction results.Precision can be done using the equation: Preci-sion = (TP)/(TP+FP).2. Recall is a calculation to determine the amount of data that is predicted to be true positive from all true positive results.
Recall uses the Equation: Recall = TP / (TP + FN). 3. F1 Score is a calculation to determine the average of precision and recall comparisons.By using the following equation: (2 * Recall * Precision) / (Recall + Precision).4. Accuracy is a calculation to dete rmine the accuracy of the model in the correct classification.Accuracy can be done with Equation: (TP+TN) / (TP+FP+FN+TN).

RESULTS AND DISCUSSION
Logistic regression methodology was used to predict the detection of child malnutrition status with 657 data availa ble.The variables used in predicting are as follows: Table 2 The following is general infor mation about the data set at this stage is the implementation of Machine Learning programming using the python language and several pandas libraries are provided.This stage seen in Image 4 is the result of data import.Data exploration is used to determine the dimensions of the dataset.

Training dataset preparation
The Data Set is used to load the prediction model for child nutrition data.Which patients are at risk of deteriorating nutritional status.The data that will be used is by dividing the data; Dependent & Independent Data.Dependent Variable: Target; Independent Variables: jk, age, BW, TB, BW/U, TB/U and status.

Image 3. General dataset information
Check the data, whether there is still empty data or not.It can be seen that the image shows all 0 numbers, meaning that there is no null value data.This means that there is no data that contains blanks, making it easier for us to create an accurate model.This amount of data is more than half of the dataset, it seems that the dataset is good enough, so there is no data that needs to be cleaned or changed.

Split the dataset into training and test
Furthermore, 80% of the dataset is used as training dataset and the remaining 20% as test dataset.This training dataset is used to build the model, then the test dataset is used to test the model and evaluate the model.This means that the very high value is close to 1, which is a sign that this model is overfit.

CONCLUSION
This research implements the use of logistic regression algorithms by applying logistic functions to produce binary or zero and one as a classification determination.The design of the prediction model in the python IDE programming language in detecting the nutritional status of children whose output results are predictions.Then the results show a decrease in nutritional status or not based on the initial data given.Experiments were conducted using a dataset of bengkayang health center data consisting of 657 pediatric patient data, each of which has 7 features (independent variables) and 1 predictor (dependent variable).This test shows perfect performance with precision, recall, F1-score, accuracy, amounting to 100%.Visualization results on the ROC (Receiver Operating Characteristic) curve to describe the TP (True Positive) value on the Y-axis against the FP (false Positive) value on the X-axis show a very high value and is close to 1, this is a sign that this model is overfit.
We recommend that in the preparation of train-ing datasets be measured with training data only without test data and reduce features because not all features in the dataset can be useful for model building, after that you have to do feature selection to increase the accuracy of the model.

Image 4 .
Check for null data Next, let's try to see how many children have decreased nutritional status (column "target" =1): it turns out that the statistics show that there are 135 children who have a decrease in nutritional status.Image 5. Statistical data on decreased nutritional status

Image 6 .
Training data sets To use the letter X (uppercase) to represent all features, and the letter y (lowercase) to represent the target feature.In the training dataset, the features are stored in a DataFrame named X_train while the target is stored in y_train.Then in the test dataset, the feature will be stored in the DataFrame named X_test and the target is stored in y_test.For training purposes, we will use all existing features (jk, age, bb, tb, bb/u, tb/u and status.Training dataset and test dataset should be chosen randomly.Train_test_split() function to call by inputting the test dataset size such as 0.2 = 20%.Modelling At this stage we will use scikitlearn to create a Lo-gistic Regression mo Image 13. calculation results The measurement results show a value of 1.0 this value shows a very high value.Next we use the ROC curve (Receiver Operating Characteristic) this curve is tasked with describing the TP (True Positive) value on the Y axis against the FP (false Positive) value on the X axis.AUC (Area Under Curve) task is to show the area under the curve, which is used to indicate the good and bad size of a model.An AUC close to 1 indicates a near-perfect model, while 0.5 is a poor model.Image 14. ROC curve visualization In this study shows a perfect size, it can be seen in the results of scikit-learn with the function roc_auc_score() to calculate AUC; Image 15. scikit-learn results Anak akibat Penyakit," 2019.[2] B. Soediono, "INFO DATIN KEMENKES RI Kondisi Pencapaian Program Kesehatan Anak Indonesia," . Description Variables used to label the dataset