ARIMA MODELLING OF ECONOMIC VARIABLES IN THE COVID-19 ERA: A STUDY OF THE CONSUMER PRICE INDEX

How to cite this paper: Bunjaku, M., Bajrami, R., & Jusufi, G. (2023). ARIMA modelling of economic variables in the COVID-19 era: A study of the consumer price index [


INTRODUCTION
COVID-19 is considered a threat to public health by all countries of the globe. According to official statistics, 15% of cases with this pandemic are people with chronic diseases, so they suffer more and have a higher degree of risk and end up with fatal consequences (Zhang et al., 2020). Elderly people and those with medical problems may become seriously ill and experience this pandemic severely (Holingue et al., 2020). This pandemic first appeared in Wuhan, China, in December 2019, and has spread to all countries around the globe (Adenomon et al., 2020). Igwe (2020) asserts that the economies of all countries of the world are facing economic collapse because the spread of this pandemic forced people to isolate themselves and stop the economic activity. This pandemic created instability which affected the financial and economic system of every country of the globe (Adenomon & Maijamaa, 2020;Feinstein et al., 2020). Also, according to , it has also caused an increase in economic criminality, which has negatively affected the business performance of companies. The low business performance also causes an increase in the level of fiscal evasion . Likewise, the labor market has undergone drastic changes (Behluli et al., 2022). As well as the organizational structures of various enterprises (Islami, 2021).
Officially, the first case of the COVID-19 virus in Kosovo was identified on March 13, 2020 (Sopi, 2020). Then, some other cases are presented to them. After that, the citizens of Kosovo were quarantined, which negatively affected the fragile economy of Kosovo. Many small and medium-sized businesses were closed due to heavy losses as a result of the isolation (Berman et al., 2020;Qorraj & Jusufi, 2021). Therefore, the impact of the pandemic caused by the COVID-19 virus on the economic and financial system of every country must be controlled, otherwise, the economy of most countries, especially developing and underdeveloped ones, will be destroyed (Vardari, 2022;Mumbu & Hugo, 2020).
The economy of Kosovo, which is at a low stage of development, was negatively affected by the spread of COVID-19 (Qorraj & Jusufi, 2019;Jusufi & Ukaj, 2021). Consumer prices have been greatly affected, and the rate of inflation reach 20% in the Kosovar economy (Ziberi et al., 2021). Therefore, the aim and purpose of this research is the mathematical analysis of this phenomenon in the economy of Kosovo. The problem of the research lies in the treatment of this problem, which is very relevant for underdeveloped global economies, such as the case of the economy of Kosovo, where consumption occupies an important part of the gross domestic product (GDP).
Regarding the literature gap, it can be said that there is very little theoretical evidence related to this analyzed issue. The theoretical/conceptual framework applied is based on the research of Sahai et al. (2020) and Ribeiro et al. (2020). So, the mathematical model is built according to their models. The relevance and significance of the study lie in the fact that it is among the first works that, through this model, addressed a problem in the field of microeconomics, a problem that was not addressed at all by Kosovar authors. The research aim of this paper is to analyze the consumer price index (CPI) in the period in which this research was carried out.
The research methodology used is based on the mathematical model which is called "ARIMA" (autoregressive integrated moving average). Meanwhile, the main findings/contributions are the results that can be used in the formulation of the business strategy of Kosovar enterprises and corporations. There are very few mathematical models that have predicted the effects of various economic crises on the consumption, production, and distribution of goods in the countries of the Western Balkans, of which Kosovo in particular, is experiencing high inflation as a result of COVID-19.
The structure of this paper is as follows. The introduction is presented in Section 1. Section 2 reviews the relevant literature. Section 3 analyses the materials and methods that have been used to conduct empirical research. Section 4 presents the results and discussion, and finally, the conclusion is presented in Section 5.

LITERATURE REVIEW
It should be emphasized that there are few sources that deal with the ARIMA model in predicting the outbreak of global pandemics. Sahai et al. (2020) used time series data from the top five nations impacted by COVID-19 to anticipate the epidemic's spread. The internet database was used to collect daily time series data of total infected cases from the top five nations, including the United States, Brazil, India, Russia, and Spain, from the 15th of February to the 30th of June, 2020. The Hannan and Rissanen approach (Poskitt, 1987) was used to estimate ARIMA model specifications. This had achieved quite interesting and useful results for the effects of the pandemic on economic processes. In short, the ARIMA model had been efficient in monthly forecasts of the spread of the pandemic and its effect in these countries. So, the mathematical results of this model provided fairly accurate figures regarding the spread of the pandemic for each month.
ARIMA models were used to make outof-sample forecasts for 77 days. Mean absolute deviation (MAD) and mean absolute percentage error (MAPE) were used to forecast the spread of the virus for the first 18 days of July using current data for that time. The accuracy of the prediction was quite high. Some countries such as the United States, Brazil, Russia, Spain, and India determined their policies of protection from the spread of COVID-19 based on the results of this ARIMA model. The results obtained from this model showed that India and Brazil will have infections between 1.38 million and 2.47 million by July 31, while the United States will have infections of up to 4.29 million people.
With no viable treatment currently available, this prognosis will assist nations in better preparing to tackle the pandemic by expanding their healthcare facilities. Anne (2020) employed a time series model to anticipate the short-term transmission of the exponentially rising COVID-19 time series, which is simulated and investigated. The ARIMA model predicts the number of cumulative instances over time and is verified using Akaike information criterion (AIC) statistics.
Ribeiro et al. (2020) evaluated time series forecasting one, three, and six days ahead of confirmed cases of the COVID-19 virus. This forecast was carried out in ten Latin American countries with a high daily incidence. The ARIMA, cubist regression (CUBIST), random forest (RF), ridge regression (RIDGE), support vector regression (SVR), and clustering set learning are used here. CUBIST, RF, RIDGE, and SVR regression models are used as the basis of the clustering group learning technique, while the Gaussian process (GP) is used as a metalearner.
The improvement index evaluates the effectiveness of the models. Specifically, the mean absolute error and the symmetric mean absolute percentage error criteria are found. They achieved satisfactory results, where the prediction model said quite accurate results, about 80% of the result was correct prediction regarding the spread of the COVID-19 pandemic.
In many situations or cases, SVR learning and clustering outperform comparison models in terms of specified criteria, i.e., show higher effectiveness than comparison models. Only models that are designed for forecasting can provide reasonably accurate forecasts with errors ranging from 0.87 percent to 3.51 percent, from 1.02 percent to 5.63 percent, and from 0.95 percent to 6.90 percent in one, three, and six days ahead, respectively. The SVR, stacking-ensemble learning, ARIMA, RIDGE, CUBIST, and RF models are ranked from best to worst in terms of accuracy across all scenarios. According to Osmani and Jusufi (2022), these models are widely used in the decision-making process and the prediction of business risk, they should also be used in the prediction of the number of infections with COVID-19 in all global countries.
Ceylan (2020) created ARIMA models to forecast the epidemiological trajectory of COVID-19 incidence in Europe's most impacted nations. The World Health Organization website was used to collect COVID-19 prevalence data from 21 February 2020 to 15 April 2020. Several ARIMA models with various ARIMA parameters were developed. The best models for Italy, Spain, and France were ARIMA (0, 2, 1), ARIMA (1, 2, 0), and ARIMA (0, 2, 1) with the lowest MAPE values of 4.7520, 5.8486, and 5.6335, respectively. The results (specifically, the number of infected individuals) were in fairly high agreement with the predictions of the models developed for each country.
This study demonstrates that ARIMA models may be used to forecast the prevalence of COVID-19 in the future. The findings of the research can help to throw light on the outbreak's tendencies and provide an estimate of the epidemiological stage of these places. Furthermore, the forecast of COVID-19 prevalence patterns in Italy, Spain, and France can aid in the prevention and design of the strategy for this pandemic in other countries. Alabdulrazzaq et al. (2021) researched to see how accurate the ARIMA best-fit model predictions were compared to the actual values presented after the forecast time had expired. Using Kuwait as a case study, they analyze and test the correctness of an ARIMA model over a reasonably long period. They began by adjusting their model's parameters to get the optimum match by reviewing the autocorrelation function and partial autocorrelation function charts, as well as various accuracy measurements. Their results were not disappointing to the academic world.
They utilized the best-fit model to anticipate confirmed and recovered COVID-19 cases during Kuwait's progressive preventative strategy. The results reveal that, despite the disease's dynamic character and the Kuwaiti government's continual adjustments, the actual values for most of the period analyzed were well within the confines of our chosen ARIMA model prediction at a 95 percent confidence interval. Pearson's correlation coefficient was determined to be 0.996 for the forecast points and the actual recorded data. This suggests that the two sets are very closely connected. The accuracy of our ARIMA model's prediction is both suitable and satisfactory. Xu et al. (2021) used an analysis called "data envelopment analysis" (DEA) through four distinct machine learning (ML) approaches, and the purpose of using this analysis was to investigate the efficiency and performance of the United States' response to COVID-19. Initially, DEA was used to estimate the efficiency of fifty US states based on four key inputs: public funding, number of individuals tested, number of health care personnel, and number of hospital beds. The number of individuals recovered from the COVID-19 pandemic is then evaluated as the desired outcome, as well as the number of confirmed cases or individuals of COVID-19 as the undesirable outcome.
In the second stage, the COVID-19 response performance was predicted using fifteen environmental characteristics defined as the social distance, health policy, and socioeconomic metrics using classification and regression tree (CART), boosted tree (BT), random forest (RF), and logistic regression (LR). According to the findings, 23 states were efficient, with an average efficiency score of 0.97. Furthermore, the BT and RF models generated the best prediction results, whereas CART outperformed LR. Finally, the most influential determinants of efficiency were urbanity, physical inactivity, the number of tests per population, population density, and total hospital beds per population. Ardabili et al. (2020) propose the use of the comparative method of comparing machine learning and soft computing models for predicting the outbreak of the COVID-19 pandemic as an alternative to the susceptible-infectious-removed (SIR) and susceptible-exposed-infected-removed (SEIR). Two models derived from a large range of machine learning models studied gave promising results, thus confirming quite convincing results that would be used in the future for such predictions of the spread of infectious diseases or pandemics at the global level.
The COVID-19 pandemic is of such a complex nature that since 2019, despite the studies carried out, a detailed and complete understanding and knowledge of this pandemic and the virus that causes this pandemic has not been achieved. Based on this, this research proposes machine learning as a useful technique for pandemic outbreak modelling. This work serves as a preliminary benchmark to explore and provide insights into the potential of ML for future research in this important area. Therefore, from this one must understand the fact that the combination of machine learning and SEIR models will bring innovation and innovation in the methods of predicting the outbreak of pandemics and viruses across the globe.
Pinter et al. (2020) suggested a mixed ML strategy to predict COVID-19, and we demonstrate its utility using data from Hungary. To forecast the time series of infected people and death rate, hybrid ML approaches of adaptive network-based fuzzy inference system (ANFIS) and multi-layered perceptron-imperialist competitive algorithm (MLP-ICA) are presented. The projections anticipate that by late May, the epidemic and overall morale will have significantly decreased. The validation is carried out for nine days and yields encouraging results, confirming the model's correctness. As long as there is no severe disruption, the model is predicted to maintain its accuracy. This work gives an early benchmark to illustrate machine learning's potential for further research.
Chakraborty and Ghosh (2020) conducted a study that dealt with short-term predictions of future cases or infections with COVID-19 in several countries. They also assessed the risk of loss of life from COVID-19 in several global countries. They discovered or identified different demographic characteristics of these countries. Through these discoveries, they also identified some characteristics of the disease caused by COVID-19. To address these issues, they highlighted an approach that was actually a hybrid and was based on an autoregressive integrated moving average model and a prediction model called "wavelet". This latest model can predict cases ten days before they happen, in countries like the United Kingdom, India, South Korea, France, etc.
Predictions of future outbreaks for various nations will be important for the appropriate allocation of healthcare resources and will serve as an early-warning system for government authorities. In the second challenge, we used an optimum regression tree technique to identify crucial causative factors that have a substantial impact on case fatality rates in various nations. This datadriven analysis will undoubtedly give in-depth insights into the study of early risk assessments for fifty massively affected nations. Bajrami et al. (2021) analyzed the survival of individuals infected with COVID-19 in Kosovo. Some indicators such as covariates, patients with diabetes, and hospital beds per thousand are statistically significant. Whereas the variables or indicators, such as total cases, strict index, GDP per capita, age of the respondent, and materials for washing hands are not statistically significant. So, these variables show the risk of Kosovar patients from COVID-19. It should be emphasized that this research is among the first that has addressed such a delicate topic for Kosovo.
Similar results have been achieved by Jusufi et al. (2020), who have researched the impact of product innovations on the level of Kosovar exports, in the conditions of the COVID-19 pandemic. Also, Qorraj et al. (2022) have asserted that, in addition to the fact that this pandemic has affected consumer prices and economic indicators, it has also affected the higher education system, which has changed working methods, academic freedom, etc. As a summary of all these detailed sources, it can be concluded that this model has been used the most in the elaboration of economic processes which have been affected by the COVID-19 pandemic. Also, the results achieved have proven to be very good consistency and reliability.

RESEARCH METHODOLOGY
The data used for the analysis was accessed online 1 . This data is licensed for use and may be used by various researchers. These data are complete and represent the real situation caused by COVID-19. These data are constantly acquired regarding the number of infected, the number of deaths, the number of tests, etc. The methodology outlined in the classic work of Box and Jenkins (1994, as cited in MathWorks, n.d.) is used to create ARIMA models. 1 https://github.com/owid/covid- 19-data The ARIMA (p, d, q) model is about the autoregressive integrated moving average (ARIMA) model. So, in essence, it is an autoregressive model.
Moving average (MA) and autoregressive (AR) models are combined to create ARMA models. Initially, the stochastic process can be presented as follows: This stochastic process is called an ARMA (p, q) process or model. The expression { } represents a simple random process that has a mean equal to zero and a variance equal to 2 .
The models in this class are particularly interesting because they produce a compact representation of higher-order AR (p) or MA (q) processes.
Also, this equation can be reformulated using the lag operator L: where, ( ) and ( ) are polynomials of orders p and q, respectively, and are defined as: It should be emphasized that the roots of ( ) = 0 must lie outside the unit circle for stationarity. Likewise, the roots of ( ) must lie not within the unit circle for MA component reversibility. Based on this, there will be a combination of "stability" conditions of AR and MA processes.
The construction of this mathematical model requires great commitment and the following steps must be followed, which are the identification of the appropriate parameters, the evaluation of these parameters, and the diagnostic testing of the model or parameters. In the identification phase, the ARIMA model orders p, d, and q are used in the specification of the parameters to be evaluated, and their evaluation is of great importance for the effectiveness of the mathematical model. The Box-Jenkins (1994, as cited in MathWorks, n.d.) ARIMA approach has limited use because this approach is only used on stationary time series. Determining the volatility of the time series data is the first step in creating a Box-Jenkins model. This is where stationary data is obtained, and according to Gujarati and Porter (2009), obtaining stationary data is of great importance because any model that originates or is built from these data is stable or in mathematics known as stationary. This provides a safe basis for predicting various processes and phenomena.
After these, stationarity must be created. Likewise, the order (p and q) of the moving average and autoregressive terms must be determined. Methods used to achieve this are partial autocorrelation charts (PACF), as well as autocorrelation charts (ACF). ARIMA model calculations are generated by a software package. While the parameters of this model were evaluated through the use of the values of the Bayesian information criterion (BIC), and AIC.
According to Biswas and Bhattacharyya (2013), the model which has the low statistics of AIC, BIC, and Tobin's Q, and the high R-square, is the most suitable for predicting the problem studied. However, according to Gujarati and Porter (2009), the model is considered unacceptable for application if the p-value or p-value which is calculated related to Q-statistics is modest. In this case, the analysis procedure should be repeated several times, until a satisfactory mathematical model for prediction is found.
The ARIMA model is first started by defining the variable in the time series which is assumed to be stationary. With the stationary expression, it should be understood that the values of a variable change around the variance and a constant mean during the time of the researched problem. The series should be movable because this is how the ARIMA model is built. To create an ARIMA (p, d, q) model with d as the differencing order, the time series d must be changed. In this way, a stationary series is created.
If differences are required, excessive differentiation may cause the standard deviation of the data to increase. So while the value of the standard deviation should decrease, this value increases. The best strategy is to start with the lowest first order where d = 1, differentiating the statistical data as well as testing the data for unit root problems. Therefore, in this research, a first-order differential time series was obtained. As for alternative methods that can be used in such predictions, different types of regression and statistical prediction models can be used as alternative methods for researching this issue. Figure 1, which is generated from the processed results, reflects the fact that the chart pattern shows the non-stationarity of the series. The monthly CPI has an upward trend if we base this figure on it. While Figures 2a and 2b, the autocorrelation graphs, reflect significant increases up to lags 1 to 5, this graph also shows a downward trend from lag to lag and a slight break from lag 5. Lag 5 shows an element of non-stationarity in the graph. As can be understood from the graph, the partial autocorrelation also drops after lag 3, while there is a significant increase at lag 1. From the experiments done with different models, it was proven that the ARIMA (3, 1, 2) model provides the minimum AIC in the elaboration of cases with COVID-19. This can be understood from the values or statistics in Table 1. Forecasting is the main purpose of creating or inventing any time series model. This primary goal must be achieved, and if it is not achieved then the statistics have no value. It is known that many predictions have been made about the number of cases infected with COVID-19 and the number of deaths from this virus. According to the selected model, the forecast for 6 months can be seen in Table 2. These forecasts help us to better understand the economic changes in many economic variables, therefore, this model is of particular importance in explaining these economic phenomena which have not only affected human health but also ruined the global economy, in particular the economies of countries of the Western Balkans.

CONCLUSION
From the results achieved, it can be affirmed that the monthly CPI had non-stationarity. So, the ARIMA, which is designed by Box and Jenkins (1994, as cited in MathWorks, n.d.) was used to analyze the monthly CPI of the Republic of Kosovo. This research aimed to forecast the monthly CPI for a six-month period.
From the models developed, based on the minimum corrected value of the AIC, the estimation of the necessary parameters, and a series of diagnostic tests, this model was selected. Therefore, the ARIMA (3, 1, 2) model is the best mathematical model for modelling the monthly CPI for the six-month period.
The essence of this scientific research lies precisely in these models, which can be used in the formulation of the business strategy of Kosovar enterprises and corporations. There are very few mathematical models that have predicted the effects of various economic crises on the consumption, production, and distribution of goods in the countries of the Western Balkans, of which Kosovo in particular, is experiencing high inflation as a result of COVID-19.
Future business policies and strategies of Kosovar corporations should be oriented and reformulated based on mathematical models that predict changes in consumption, consumer prices, etc. Theorems and literature of various Western Balkan authors are not enough, but empirical and mathematical evidence should be used in researching the impact of economic crises and pandemics on economic systems and business strategies. Therefore, this research is of significant importance because it has never been addressed by Kosovar authors and will be a point of reference for future studies as long as this research is limited to the case of Kosovo.
In the academic world of the Western Balkan countries, there is a lack of papers that have a hundred percent character and research content.
A large part of them reflects more theoretical considerations than empirical research based on mathematical models and methods. Therefore, this paper can serve as a point of reference for future studies that will investigate the consequences of this pandemic in the field of microeconomics through the use of mathematical models. Only the results derived from the mathematical models can be taken as solid bases in the design of strategies for different business areas. Therefore, despite the limitations, this paper has significance for microeconomics researchers and those who use econometric methods to obtain the most reliable and relevant results.