1. Introduction

Consequently, economists have an interest in enhancing the precision of government revenue predictions. Contributing to this area, this study attempts to use machine learning to forecast tax revenue in Victoria, Australia.

The use of machine learning in time series forecasts can date back to Hu’s (1964) work on weather forecasting. With recent technological advancements, there has been a significant rise in the number of studies using machine learning, prompting us to examine its effectiveness in predicting government revenue. Machine learning has three primary advantages in forecasting. Firstly, it is effective in extracting useful signals from a large set of information. Secondly, it can uncover both linear and non‑linear relationships in the data. Thirdly, it is data-driven, which means it doesn’t require users to provide a model specification.

As some previous studies suggest that machine learning methods are effective in forecasting labour market movement (e.g. Gogas et al. 2022; Kreiner and Duca 2020)) and housing market movement (e.g. Milunovich 2020), our study focuses on the predictability of payroll tax and land transfer duty revenue, which are highly correlated with these two markets. We choose to focus on these taxes also because of their significant size. Our baseline results use quarterly taxation revenue data from the Victorian Department of Treasury and Finance (DTF), covering the period from June 1992 to December 2019. In some tests, we extend the sample period to September 2022 to cover the unusual time period during the COVID-19 pandemic.

Our study employs nine forecasting methods¹, including autoregressive econometric models, regularized machine learning methods, ensemble machine learning methods, and the multilayer perceptron (MLP) neural network. In our baseline test, we include 23 features, including macroeconomic indicators provided by the Australian Bureau of Statistics (ABS) and Victorian property market indices provided by CoreLogic. We use a fixed window of the first 71 quarters of data to train our models and test forecast performance in the last 40 quarters of data (equivalent to 10 years). Consistent with common practice, we tune the hyperparameters of our machine learning models using K-fold cross-validation, with K set to 5.

Our main finding is that machine learning algorithms do not outperform simple autoregressive models for payroll tax forecasting, but might be useful in land transfer duty forecasting due to their ability the reduce data dimensionality and identify useful signals from a large set of features. More specifically, we find that of the models tested, an AR(4) model is the best performing model for payroll tax revenue forecasting, which implies that the autoregressive structure of payroll tax is the most important consideration for forecasting payroll tax.

Next, we find that Ridge Regression, which specialises in identifying useful signals from a large number of features, is the best method for land transfer duty forecasts, outperforming the benchmark AR(4) model with 25 per cent lower errors. However, this finding should be interpreted with caution, as the difference between the best machine learning method and the benchmarks is only marginally statistically significant at the 10 per cent level. Overall, the results suggest that the usefulness of machine learning methods is dependent on the characteristics of the tax line. The value of machine learning methods is higher when applied to tax lines that have higher volatility and more sensitive to fluctuations in economic conditions.

Furthermore, we conduct three additional tests to evaluate the performance of machine learning algorithms under difference scenarios. We focus on:

examining whether machine learning algorithms demonstrate improved performance when incorporating property market conditions from various Australian cities in addition to Melbourne
investigating the performance of machine learning algorithms under a data-rich environment, when the number of features increases to 166
evaluating the performance of machine learning algorithms in forecasting during the COVID period with heightened uncertainty.

Our results suggest that including property market indices from other large cities and county-level macroeconomic statistics does not significantly improve machine learning algorithm performance. In fact, increasing the feature set’s dimensionality can lead to a decrease in algorithm performance, as it becomes harder for the algorithms to extract valuable information.

On the other hand, during the COVID-19 pandemic period’s extreme observations we find that machine learning algorithms can be valuable. Interestingly, their usefulness takes on a different aspect as compared to the baseline results. In our baseline results, regularized machine learning methods that focus on signal identification prove beneficial during normal periods. During abnormal periods like the COVID-19 crisis, machine learning models that explore the non‑linear relationship between the target variable and features, such as tree-based methods and neural network, perform better.

In summary, our study contributes to the literature comparing simple and sophisticated methods in forecasting fiscal variables (e.g. Feenberg et al. 1989; Gentry 1989; Favero and Marcellino 2005; Carriero, Mumtax, and Theophilopoulou 2015). Our finding is consistent with a recent study (Chung, Williams, and Do, 2022), which shows that machine learning is not helpful in forecasting most types of government revenue in the United States, except for land transfer duty. Furthermore, we also contribute to the emerging literature on the use of machine learning approaches in economic forecasts (e.g. Gu, Kelly, and Xiu, 2020; Medeiros, Vasconcelos, Veiga, and Zilberman, 2021; Babii, Ghysels, and Striaukas, 2022; Milunovich, 2020).

The rest of the paper is organized as follows. Section 2 discusses the background of revenue forecasting and the taxation structure in Victoria. Section 3 outlines the forecasting methods covered in this study. Section 4 discusses the detailed setting of this study. Section 5 presents the main baseline results and Section 6 shows additional scenario analyses. Section 7 concludes.

Footnotes

^{[1] The forecasting models covered in this paper may not necessarily reflect the actual models used by the DTF in its official revenue forecasting process.}

Updated 11 October 2024