3.1 Benchmarks
We employ two benchmarks in this study, including a simple forecast based on the historical average during the training period (a random walk with drift) and an AR (4) model. We choose these two benchmarks primarily because they are easy to implement and often serve as the starting point for time series analyses. Also, Favero and Marcellino (2005) show that simple approaches may be just as effective, if not superior, to complex models for forecasting fiscal variables, especially for short-term horizons and when the sample size is relatively small.
3.2.1 Regularization methods
LASSO, Ridge Regression, and Elastic Net
Regularization methods are techniques that aim to reduce the dimensionality of data and mitigate the risk of overfitting in a model. In time series forecasting, they are typically employed to extract valuable signals from a large set of potential predictors. LASSO (Tibshirani, 1996) and Ridge Regression (Hoerl and Kennard, 1970) are two different techniques used for regularization. Both methods work by adding a penalty term to the cost function of the linear regression model, which restricts the size of the coefficients and compels the model to be less complex.
The major difference between LASSO and Ridge Regression is the type of penalty used to shrink the regression coefficients. While LASSO’s penalty term forces some coefficients to be exactly zero, resulting in a model that selects only a subset of most important features for predicting the target variables, the Ridge Regression method only shrinks the magnitude of all coefficients toward zero, but not exactly zero. The penalty terms of LASSO and Ridge Regression are also referred as the L1 penalty and L2 penalty, respectively.
Another regularization method covered in this study is the Elastic Net (Zou and Hastie, 2005), which was developed with the aim of overcoming the limitations of LASSO and Ridge Regression. Similar to LASSO and Ridge Regression, Elastic Net adds a penalty term to the objective function of a linear regression model. The Elastic Net penalty term combines L1 and L2 regularization, enabling it to strike a balance between the feature selection capabilities of LASSO and the stability of Ridge Regression.
3.2.2 Ensemble methods
Ensemble machine learning methods are commonly used in forecasting. These methods combine multiple models to improve accuracy. The basic idea behind them is to leverage the strengths of different models and minimise their weaknesses. The ensemble methods covered in this study include Random Forest, Gradient Boosting, and Extreme Gradient Boosting (XGBoost).
Random forest
The core element of the random forest method (Breiman, 2001) is the decision tree learning method. Decision trees are used to build the prediction models from specific datasets, in which data is split into subsets recursively based on a set of rules. The resulting tree structure consists of connected nodes representing decision points. Random forest is a combination of multiple decision trees. It creates multiple decision trees by sampling the data and features with replacement, leading to multiple sub‑samples of the original data. These sub-samples are used to train individual decision trees that are eventually combined to produce the final forecast.
Gradient Boosting
Gradient boosting (Friedman, 2001) is also a tree-based method. Unlike the random forest method which has independent trees, each tree in gradient boosting is made conditional on previous trees. More specifically, gradient boosting aims to minimise the difference between the predicted and actual values of the target variable (a loss function) by iteratively adding decision trees to the model. At each iteration, the decision tree is trained on the errors of the previous tree, which allows the model to focus on the areas where it performed poorly in the previous iteration.
Extreme Gradient Boosting
Extreme Gradient Boosting (XGBoost) is a variant of gradient boosting that includes additional features and techniques to improve performance, such as regularization of the objective function, early stopping and subsampling. These features help to control overfitting and improve the accuracy and speed of the model (Chen and Guestrin, 2016). XGBoost is also often less sensitive to the choice of hyperparameters, which makes it easier to turn the model.
3.2.3 Neural network
Multilayer perceptron
Multilayer perceptron (MLP) is a type of artificial neural network that consists of multiple layers of interconnected nodes (neurons). The MLP is a feedforward neural network, which means that information flows in only one direction, from the input layer, through the hidden layers, to the output layer. The key feature of MLP is its ability to learn complex non‑linear relationships between the input and output variables (Rosenblatt, 1961; Rumelhart, Hinton, and Williams, 1985).
Updated