GOLD PRICE FORECASTING USING MULTIPLE LINEAR REGRESSION METHOD

– Price forecasting is a part of economic decision making. Forecasting the daily rise and fall of gold prices can help investors decide when to buy or sell the commodity. The price of gold depends on many factors such as the price of other precious metals, the price of crude oil, the performance of the stock exchange, and the exchange rate of currencies. This study discusses gold price forecasting using the multiple linear regression method. The results of this study indicate that the best model is in the data distribution of 70%: 30% for training and testing, with a MAPE of 4.7% Based on these results, it can be concluded that the use of multiple linear regression method produces a fairly good model for gold prices forecasting. Besides, the correlation analysis show that the price of other precious metals greatly influences the price of gold where in this case the silver price whose correlation value is 0.87


INTRODUCTION
Forecasting is a method for estimating predictive information to determine future directions with reference to historical data. Forecasting is also an important data science task for many activities in an organization. For example, organizations in all industry sectors should engage in capacity planning to efficiently allocate scarce resources and goal setting to measure performance relative to baselines [7].
Gold is one of the most recognized precious metals in the world, many people use gold as an investment asset. Arguably no asset reflects the transformation of financial markets over the last few decades more accurately than gold [1]. The price and production behavior of gold is different from most other mineral commodities. In the 2008 financial crisis, the price of gold increased by 6% while the prices of many major minerals fell, and other equities fell by around 40%. The unique and diverse drivers of demand and supply of gold are not highly correlated with changes in other financial assets [6]. However, the price of Gold depends on many factors such as prices of other precious metals, crude oil prices, stock market performance, bond prices, currency exchange rates, etc.
Time series forecasting is working with one variable of dataset that is recorded on several period of times [2]. The gold price is fluctuated in a timely manner. Besides, there are several variables which affect the gold price. Therefore, this study applies Multiple Linear Regression to generate forecast. There are previous studies which show the implementation of Multiple Linear Regression to generate forecast for corn [11] and house price [10].

METHOD
This research method was carried out in several stages starting from literature study, data collection, dataset analysis, and model building. Each stage will be explained in detail as follows.
To find out the accuracy of the forecasting results, an evaluation is carried out using the method of calculating errors in forecasting. Mean Square Error (MSE) is the average value of the number of errors produced by an estimation model. The lower the MSE value indicates the forecasting model has good abilities. The general MSE formula can be written in equation (1). .
Where N = amount of data, yi = actual index value in period I and y^i = forecasting data value in period i.
In addition to using RMSE as a method for calculating errors, researchers also use Mean Absolute Percentage Error (MAPE). MAPE is a measure of relative accuracy used to determine the percentage of forecast deviation. The MAPE formula in general can be written in equation (2). (2) Where n = the amount of data, = the actual value of the index in the t period and = the value of forecasting data in the t period.
Regression analysis is performed to determine the correlation between two or more variables that have a causal relationship, and to make topic predictions using relationships [9]. Multiple linear regression extends simple linear regression to include more than one explanatory variable. In both cases the term 'linear' is still used because of the assumption that the response variable is directly related to a linear combination of the explanatory variables. The equation for multiple linear regression has the same form as the equation for simple linear regression but has more terms [8] as shown in equation (3). For the simple case, β0 is the constant that will be the predicted value of y when all explanatory variables are 0. In models with explanatory variables, each explanatory variable has its own β_coefficient. (3) To carry out an evaluation related to the regression model that was made, it is necessary to measure the value of R2. The R2 value is the proportion of variance in the dependent variable that can be predicted from the independent variables. It can be seen in equation (4). A low value will indicate a low level of correlation, meaning that the regression model is not valid, but not for all cases.
(4) Where SSR is the sum of the squares of the residuals and SST is the sum of the squares.
Investment is one of the determining factors in the rate of economic growth of a country. Investment is the mobilization of resources to create or increase production capacity or income in the future [3]. The main purpose of investment is to replace the part of the capital supply that is not very good and add to the existing capital supply. In-vestment is also known as investment activities in the form of money or objects with the aim of obtaining profits for one period [5]. Investments have an element of uncertainty or risk so that investors cannot predict with certainty the results of the profits or losses that will be obtained from the investments made.
One of the popular investments is gold investment. Gold has a metallic form that is dense, soft, shiny and is believed to be the most flexible metal among other metals. Gold has several advantages, namely it does not change color easily, does not rust easily, does not fade even though it has been stored for a long time and attracts people to own it [4].

Data Collection
At this stage, a data collecting was carried out regarding the price of gold, where a dataset called Gold Price Data is obtained from Kaggle in the year of 2018. The dataset consists of 2290 rows and 6 columns which has 6 variables in it, namely the date or the date that the gold price was taken, SPX or the capitalization index of 500 companies, SLV which is the price of silver, USO which is the price of oil, EUR/USD or the exchange rate euro/usd and GLD, namely the price of gold, with the variable to be predicted is GLD, namely the price of gold.

Data Analysis
In the next stage, pre-processing is carried out starting with displaying the dataset into a table containing the raw data to be used which can be seen in Image 1. From Image 1 it can be seen that the dataset consists of 2290 rows and 6 columns. The dependent variable used is GLD (Gold Price), while there are 4 independent variables, namely SPX, USO, SLV, EUR/USD. After seeing the variables from the dataset that will be used, then the gold price variable will be mapped onto the graph to see the graphical form of the variable.

Image 2. GLD value
From Image 2, it can be seen that there is a trend in the dataset used. After knowing the form of the dataset, we will then find out the relationship between each variable in the dataset and the correlation map.

Image 3. Correlation Map
Image 3 is a correlation map made from the gold price dataset. This study only focused on the price of gold. This study obtained a relationship between the price of gold and the SPX variable worth 0.049, the relationship with the USO variable was worth -0.19, the relationship with the SLV variable was worth 0.87 and the relationship with other variables EUR/USD is worth -0.024. It is known that the price of other precious metals greatly influences the price of gold where in this case the SLV variable or silver price has a value close to 1.

Modelling
At this stage, the model will be made from the method used, namely multiple linear regression. Starting with creating the x and y variables, where the x variable consists of the independent variables, namely SPX, USO, SLV, EUR/USD and y consists of the dependent variable, namely GLD. Furthermore, dividing the train and test data, in this study 3 scenarios will be made, namely the first 80% train data: 20% test data, the second scenario is 70% train data: 30% test data, and the third scenario is 60% train data: 40% test data.

CONCLUSION
From the results of the evaluation, between the 3 scenario trials, the best value was obtained in the 70% training and 30% testing distribution scenario with the smallest MAPE value of 4.8%. It is indicated that the percentage of errors is small enough that the gap is low beteen actual and prediction. Besides, the high R2 value (87%) indicated that the independent variable (silver price) highly affected the dependent variable value (gold price).