ESTIMATION OF JAVA GRDP IN REGENCY/CITY LEVEL: SATELLITE IMAGERY AND MACHINE LEARNING APPROACHES

: Gross Regional Domestic Product (GRDP) is one of the most important socio-economic indicators. In order to gain a more comprehensive understanding of the current economic situation and regional differences, estimating GRDP using integration of satellite imagery and official statistics data can provide valuable information. This research estimates the GRDP value in 2022 by using data in 2019 to 2021 related to two aspects, agriculture and non-agriculture. Soil adjusted vegetation index (SAVI), enhanced vegetation index (EVI), and land cover (LC) used as agriculture aspect, while nighttime light (NTL), human settlement index (HSI), land area, and population per regency/city used as non-agriculture aspect. GRDP estimation are produced with machine learning approach using support vector machine (SVM) and random forest (RF) method. Correlation test on each variable shows only land area that does not have a significant correlation with GRDP. RF model then chosen as the best model with RMSE, MSE, MAE, and R 2 value of 0.2549; 0.5049; 0.7727; and 0.2543, respectively. The estimated values acquired in several regencies/cities have rather near, some even very close to the official statistics values.


INTRODUCTION
The success of economic development in a region can be seen from its economic growth [1], [2].One of the indicators that can be used is Gross Domestic Product (GDP).GDP is a standard measure of the added value created through the production of goods and services in a certain area over a certain time period.GDP can be calculated in national and regional area (Gross Regional Domestic Area or GRDP).The higher the GDP of an area, the better the economic performance in that area [3].

Image 1. Total GRDP Value by Island in
Indonesia in 2019 (billion rupiah) Source: BPS Indonesia [4] In Indonesia, for example, when viewed by island, the total GRDP value for Java Island in 2019 is the highest among other islands.Java Island, which consists of 119 regencies and cities, has a GRDP value that continues to show an increasing trend.With a dense population for around 56% (154 million people) of the total population in Indonesia [5], Java Island is a place where various economic activities are concentrated.Economic growth in Java also tends to be positive and stable which reached 3.66% in 2021 [4].
Currently, GDP calculation in many countries still using traditional method.The data used comes from data collection through census or surveys.However, this method requires large cost or resources and takes quite a long time.Therefore, an alternative data sources can be used to calculate or at least, estimate GDP and GRDP value in an efficient and effective ways.
The existence of big data nowadays can give a more comprehensive understanding in many aspects.For instance, the use of satellite imagery or geospatial data allows spatial analysis to show patterns of economic growth, land use, infrastructure and social change in various regions.With this source of data, it is possible to analyze and monitor economic activities such as agriculture, fishing, and infrastructure development in certain areas.
Satellite imagery has been widely used to predict GDP values.One of the indices that are often used are NTL (Nighttime Light) intensity which reflects the economic prosperity of a country or region.Some numbers of studies have shown that NTL intensity has a strong correlation with economic indicators.Research conducted by [6] in China found that NTL from NPP-VIIRS imagery has a correlation with regencylevel GDP with a value of 0.8544.Then, research by [7] found that highresolution VIIRS light data provides a better prediction of metropolitan statistical area (MSA) GDP than state GDP, indicating that light data may be more closely related to the urban sector than to the rural sector.However, many other studies have shown that the use of NTL only is not enough to measure GDP accurately at the regional level.This include inability to explain areas without observable night light and distinguish certain types of economic activity, such as agriculture and forestry, which are not always associated with an increase in night light during growth.
Research by [8] stated that NTL has a rough, blooming or overglow, and saturation spatial image which can result in a loss of detail and an accurate representation of the image.Each pixel covers a larger area in the image so that small objects or features cannot be identified in detail.Thus, an overestimation value in bright areas can increase the distribution of GRDP estimation error.
Several studies then use other data that are also considered to estimate GDP with NTL data.Research by [9] presented a model for estimating agricultural and non-agricultural economic growth at the national and subnational levels using land cover and NTL data.Research by [10] conduct a regression analysis between GDP and built-up area normalized difference built-up index (NDBI) and normalized difference vegetation index (NDVI) extracted from Landsat satellite imagery.
Unfortunately, among those previous studies, research to estimate the value of GRDP by using satellite imagery is still rarely carried out in Indonesia.Thus, the main objective of this study is to estimate GRDP value in Java Island, Indonesia with the previous variables used as well as adding other supporting data, both from satellite imagery and official statistics data.The GRDP value to be estimated is at the regency/city level in 2022 through machine learning approach with using data from 2019 to 2021 since many regencies and cities in Java Island experienced a decrease in GRDP as a result of the Covid-19 pandemic [11].

METHOD Research Locus
Image 2. Java Island Territory Java Island has various characteristics and fairly good representation of Indonesia's diversity in terms of social, culture, and economy.Java Island also plays a significant role in Indonesia's economic growth since it is home to most of the provinces with high GRDP in Indonesia.The high population concentration in this area indicate various economic activities are located here.On the other hand, some areas in Java Island are also include rural areas with rich variety in agriculture, such as East Java, which has many coffee, tea and rubber plantations.Research in this area can provide insight into sustainable agriculture, natural resource management, and the challenges and opportunities in Indonesia's agricultural sector.Data, Data Sources, and Tools Research by [12] stated that areas with the most illuminated by night light have a fairly high correlation to GDP.However, according to BPS data, the agricultural sector, which is generally located in areas with low light intensity, is one of the highest contributors to GDP in Indonesia [4].This makes researchers interested in combining data related to agriculture and non-agriculture in estimating the GRDP value in Java Island.The data summary and sources in this study are presented in the following table.
This study includes 8 variables to estimate the GRDP value of regencies/cities in Java Island.Some variables that related to agriculture aspect are Soil Adjusted Vegetation Index (SAVI), Enhanced Vegetation Index (EVI), and land cover (LC).These variables collected from satellite imagery data using Google Earth Engine, a free cloud based geospatial data processing and analysis platform cloud which allows users to visualize and analyze satellite imagery data for various purposes.Then, the rest of variables are related to nonagriculture aspect (except GRDP).
All variables are collected in the range period of 2019 to 2022.The data is processed using Microsoft Excel for getting numerical data (in form of image tabular results) and geospatial software QGIS for mapping visualization.The results of the tabular data then used to construct machine learning model using R language in Google Colaboratory.

Image 3. Research Flowchart
This study uses a quantitative research method, which is used to acquire knowledge or solve problems by using numerical data.The selection of the research timeframe from 2019 to 2022 is based on the time before, during and after the Covid-19 pandemic so that it is able to cover all current conditions.The estimation of GRDP values at the regency/city level is conducted using machine learning approach with Support Vector Machine (SVM) and Random Forest (RF) method.This method has been previously used by [17] to estimate GRDP microregional, which then obtains best model with RF method.In machine learning, the data must be separated into two subsets, which is training data and test data.In this study, data from 2019 to 2021 will be used as training data, while data from 2022 will be used as test data.
Model evaluation is performed to obtain the best GRDP estimation model.The evaluation uses several indicators such as MSE, RMSE, R 2 , and MAE.The formula for the indicators used described as follows

Correlation Test on Variables
Before the variables used to create the model, all variables are tested with the dependent variable (GRDP) to check the correlation of each variable.Only significant variables are used in the model.Based on the result in table 2, land area (AREA) does not have a significant impact with the GRDP at a significance level of 5%.Therefore, all variables are used to perform the model, except AREA.[17].

Results of Estimating GRDP at the Regency/City Level in 2022
After the best model is selected based on the evaluation indicators, GRDP value estimation is performed at the regency/city level in 2022.The estimated results have been inversed (previously transformed during modeling) in order to obtain an exact estimated value and match with the official statistical data.The following image presents a graph of GRDP estimation result in some regencies compared to the official GRDP statistics produced by BPS.Some regions have very close estimation result, such as Jombang, Banjarnegara, Probolinggo, and Cimahi.The model also used to estimate the GRDP values from 2019 to 2021.The results obtained are quite satisfactory bec ause there are several regencies/cities that have close estimated value to the actual, such as Garut and Probolinggo.

CONCLUSION
This research succeeded in creating model to estimate GRDP values at the regency or city level in 2022.Through the machine learning approach, the best model chosen is the Random Forest model with the highest and lowest MSE, RMSE, and MAE.Some estimated values are fairly close, some even very close to the real GRDP value in official statistical data.Satellite imagery can pro vide many information through vary of indices.Adding more variables in creat ing GRDP estimation model can be condu cted in future research since GRDP is complex and includes many sectors.The same applies to the method used.Thus, the GRDP estimation will be more accu rate.

Table 1 .
Data Summary and Data Sources