Chapter 16 — Evaluating Forecasts

In the previous chapter, we learned how forecasts are generated from time series models.

But an important question remains:

Forecasting is not only about producing predictions.

It is also about evaluating forecast quality.

A model that fits historical data very well may still forecast poorly out of sample.

This chapter introduces the most commonly used measures of forecast accuracy and forecast quality.

We will use a real example based on Thai inflation forecasts to illustrate the ideas.

Learning Objectives¶

By the end of this chapter, you should be able to:

compute forecast errors
distinguish bias from variability
interpret MSE, RMSE, MAE, and MAPE
understand the tradeoff between different forecast evaluation measures
interpret Theil’s $U_1$ and $U_2$ statistics
understand forecast error decomposition
compare competing forecasting models

16.1 Forecast Errors¶

Forecast evaluation begins with the forecast error.

Suppose:

actual value: $x_t$
forecast value: $\hat{x}_t$

The forecast error is:

e_t = x_t - \hat{x}_t

If:

$e_t > 0$ : the model underpredicted
$e_t < 0$ : the model overpredicted

Good evaluation distinguishes:

systematic mistakes (bias)
random fluctuations (variance)
failure to track movements (covariance)

16.2 A Thai Inflation Forecast Example¶

Suppose we compare two competing forecasts of Thailand’s year-on-year CPI inflation.

The table below shows:

actual inflation
Forecast 1 - using AR(1)
Forecast 2 - random walk model
forecast errors

Date	Actual Inflation	Forecast 1	Forecast 2
Jan 2014	1.93	1.84	1.67
Feb 2014	1.96	1.92	1.93
Mar 2014	2.11	1.96	1.96
Apr 2014	2.45	2.31	2.11
May 2014	2.62	3.61	2.45
Jun 2014	2.35	2.45	2.62
Jul 2014	2.16	2.01	2.35
Aug 2014	2.09	1.99	2.16
Sep 2014	1.75	1.81	2.09
Oct 2014	1.48	1.65	1.75
Nov 2014	1.26	1.21	1.48
Dec 2014	0.60	1.13	1.26

We now ask:

For data and computations in Excel see LINK.

16.3 Mean Forecast Error (Bias)¶

The simplest measure is the average forecast error.

\text{Bias} = \frac{1}{T} \sum_{t=1}^{T} e_t

or equivalently:

\text{Bias} = \frac{1}{T} \sum_{t=1}^{T} (x_t - \hat{x}_t)

positive bias $\rightarrow$ forecasts tend to be too low
negative bias $\rightarrow$ forecasts tend to be too high

For the Thai inflation example:

Measure	Forecast 1	Forecast 2
Bias	0.094	0.089

Both forecasts slightly underpredict inflation on average.

16.4 Mean Squared Error (MSE)¶

One problem with simple bias is that positive and negative errors can cancel out.

To avoid this, we square the errors.

\text{MSE} = \frac{1}{T} \sum_{t=1}^{T} e_t^2

Large mistakes therefore matter disproportionately.

Thai Inflation Example¶

Measure	Forecast 1	Forecast 2
MSE	0.116	0.084

Forecast 2 has the smaller MSE.

So Forecast 2 performs better according to this criterion.

16.5 Root Mean Squared Error (RMSE)¶

MSE is useful mathematically, but its units are squared.

To restore the original units, we take the square root.

\text{RMSE} = \sqrt{ \frac{1}{T} \sum_{t=1}^{T} e_t^2 }

For inflation forecasting, RMSE is measured in percentage points of inflation.

Thai Inflation Example¶

Measure	Forecast 1	Forecast 2
RMSE	0.341	0.290

Again, Forecast 2 performs better.

16.6 Why RMSE Is Popular¶

RMSE is one of the most widely used forecast evaluation measures.

Why?

Because:

it penalizes large errors
it is easy to interpret
it preserves original units

Examples include:

inflation forecasting by central banks
electricity demand forecasting
financial risk forecasting

16.7 Mean Absolute Error (MAE)¶

Instead of squaring errors, we can use absolute values.

\text{MAE} = \frac{1}{T} \sum_{t=1}^{T} |e_t|

Unlike RMSE:

MAE penalizes errors linearly
extreme errors matter less

Thai Inflation Example¶

Measure	Forecast 1	Forecast 2
MAE	0.214	0.247

Now Forecast 1 performs better.

This illustrates an important idea:

16.8 RMSE vs MAE¶

RMSE and MAE emphasize different aspects of forecast performance.

Measure	Sensitive to Large Errors?
MAE	Less sensitive
RMSE	More sensitive

RMSE is more sensitive to large errors because squaring amplifies extreme values.

Intuition¶

Suppose one model makes:

many moderate errors
but no extremely large mistakes

Another model makes:

mostly small errors
but occasionally a very large mistake

RMSE may strongly penalize the second model.

MAE may prefer it.

16.9 Percentage Errors¶

Sometimes variables have different scales.

In such cases, percentage-based measures may be useful.

The percentage forecast error is:

\text{Percentage Error}_t = 100 \times \frac{e_t}{x_t}

16.10 Mean Absolute Percentage Error (MAPE)¶

A common percentage-based measure is:

\text{MAPE} = \frac{100}{T} \sum_{t=1}^{T} \left| \frac{e_t}{x_t} \right|

Thai Inflation Example¶

Measure	Forecast 1	Forecast 2
MAPE	14.96	19.11

Forecast 1 performs better according to MAPE.

16.11 Problems with MAPE¶

MAPE has some important weaknesses.

Problem 1: Division by Small Numbers¶

If actual values are close to zero:

\frac{e_t}{x_t}

can become extremely large.

Problem 2: Undefined for Zero Values¶

If:

x_t = 0

then MAPE is undefined.

Problem 3: Asymmetry¶

MAPE can penalize overpredictions and underpredictions differently.

This is common in:

inflation
growth rates
financial returns

16.12 Relationship Between MSE, RMSE, and Bias¶

Recall:

\text{MSE} = \frac{1}{T} \sum e_t^2

MSE can be decomposed into:

\text{MSE} = \text{Variance of Errors} + (\text{Bias})^2

or approximately:

\text{MSE} = SE^2 + \text{Bias}^2

where:

$SE$ = standard forecast error
Bias = mean forecast error

16.13 Theil’s $U_1$ Statistic¶

Theil’s $U_1$ is a normalized measure of forecast accuracy.

One common version is:

U_1 = \frac{ \sqrt{ \frac{1}{T} \sum (\hat{x}_t - x_t)^2 } }{ \sqrt{ \frac{1}{T} \sum \hat{x}_t^2 } + \sqrt{ \frac{1}{T} \sum x_t^2 } }

Properties¶

Thai Inflation Example¶

Measure	Forecast 1	Forecast 2
Theil’s $U_1$	0.084	0.071

Forecast 2 performs slightly better.

16.14 Theil’s $U_2$ Statistic¶

Theil’s $U_2$ compares a forecast against a benchmark forecast.

Usually the benchmark is a naive forecast:

\hat{x}_t = x_{t-1}

For example:

tomorrow equals today
next month equals this month

Definition¶

A common form is:

U_2 = \sqrt{ \frac{ \frac{1}{T} \sum \left( \frac{\hat{x}_t - x_t}{x_{t-1}} \right)^2 }{ \frac{1}{T} \sum \left( \frac{x_t - x_{t-1}}{x_{t-1}} \right)^2 } }

Interpretation¶

Value	Interpretation
$U_2 < 1$	model beats naive forecast
$U_2 = 1$	same as naive forecast
$U_2 > 1$	worse than naive forecast

Thai Inflation Example¶

Measure	Forecast 1	Forecast 2
Theil’s $U_2$	0.94	1.00

Forecast 1 slightly outperforms the naive benchmark.

Forecast 2 performs roughly the same as the naive forecast.

16.15 Forecast Error Decomposition¶

Theil also proposed decomposing forecast errors into components.

A common decomposition separates MSE into:

bias proportion
variance proportion
covariance proportion

Bias Proportion¶

Measures systematic differences between forecast mean and actual mean.

Variance Proportion¶

Measures differences in variability.

Covariance Proportion¶

Measures unsystematic error.

Thai Inflation Example¶

Forecast 1¶

Component	Value
Bias proportion	0.076
Variance proportion	0.040
Covariance proportion	0.885

Most errors are unsystematic.

This is generally a good sign.

Forecast 2¶

Component	Value
Bias proportion	0.094
Variance proportion	0.300
Covariance proportion	0.624

Forecast 2 has a larger variance component.

This suggests that the model may not adequately capture changes in volatility or variability.

16.16 Understanding the Sources of Forecast Error¶

An alternative way to understand forecast errors is to decompose MSE into three components:

\text{MSE} = \text{BIAS}^2 + (s_x - s_{\hat x})^2 + 2(1-r)s_x s_{\hat x}

where:

$s_x$ is the standard deviation of the actual series
$s_{\hat x}$ is the standard deviation of the forecast series
$r$ is the correlation between actual and forecast values

Bias Component¶

The first term:

\text{BIAS}^2

captures systematic overprediction or underprediction.

Variability Component¶

The second term:

(s_x - s_{\hat x})^2

measures whether the forecast is too volatile or too smooth relative to the actual data.

A forecast may correctly predict the average level while still failing to match the fluctuations of the series.

Covariance Component¶

The final term:

2(1-r)s_x s_{\hat x}

captures failure to track movements in the actual series.

If the correlation between forecasts and actual values is high, this component becomes small.

16.17 Choosing Between Forecasts¶

Which forecast is “best”?

There is no universal answer.

Different measures emphasize different aspects of performance.

Criterion	Forecast 1 Better?	Forecast 2 Better?
MSE		✓
RMSE		✓
MAE	✓
MAPE	✓
Theil’s $U_1$		✓
Theil’s $U_2$	✓

16.18 Why Forecast Evaluation Matters¶

Forecast evaluation is central in:

monetary policy
financial trading
inventory management
energy demand forecasting
macroeconomic planning

A model that performs well historically may fail during:

crises
structural breaks
regime changes
periods of unusual volatility

16.19 Python Example: Comparing Forecast Accuracy¶

import numpy as np
import pandas as pd

actual = np.array([2.11,2.45,2.62,2.35,2.16,2.09,1.75,1.48,1.26,0.60])

f1 = np.array([1.96,2.31,3.61,2.45,2.01,1.99,1.81,1.65,1.21,1.13])

f2 = np.array([1.96,2.11,2.45,2.62,2.35,2.16,2.09,1.75,1.48,1.26])

e1 = actual - f1
e2 = actual - f2

def forecast_stats(errors, actual):
    
    mse = np.mean(errors**2)
    rmse = np.sqrt(mse)
    mae = np.mean(np.abs(errors))
    mape = np.mean(np.abs(errors/actual))*100
    bias = np.mean(errors)
    
    return pd.Series({
        "Bias": bias,
        "MSE": mse,
        "RMSE": rmse,
        "MAE": mae,
        "MAPE": mape
    })

results = pd.DataFrame({
    "Forecast 1": forecast_stats(e1, actual),
    "Forecast 2": forecast_stats(e2, actual)
})

print(results.round(3))

      Forecast 1  Forecast 2
Bias      -0.126      -0.136
MSE        0.138       0.095
RMSE       0.372       0.309
MAE        0.244       0.268
MAPE      17.381      21.624

16.20 Gretl Example: Forecast Evaluation¶

GRETL provides several forecast evaluation tools.

After generating forecasts:

Model window → Analysis → Forecast evaluation

depending on the GRETL version.

GRETL may report:

MSE
RMSE
MAE
MAPE
Theil statistics

[GRETL Screenshot Placeholder: Forecast evaluation statistics]

Comparing Competing Models¶

You can compare:

AR models
ARIMA models
VAR forecasts
naive forecasts

using the same forecast sample.

16.21 Common Mistakes¶

16.22 Looking Ahead¶

So far, we have focused mainly on forecasting individual time series.

In the next part of the book, we move to relationships between time series.

We will study:

spurious regression
dynamic relationships
Granger causality
cointegration
error correction models

Key Takeaways¶

Concept Check¶

Basic¶

What is a forecast error?
What does it mean if a forecast error is positive?
What is bias in forecasting?

Intuition¶

Why can a model with zero bias still perform poorly?
Why do we square forecast errors in MSE?
Why might large forecast errors be particularly important in practice?

Measures¶

What is the difference between:
- MSE
- RMSE
- MAE
Why is RMSE often preferred to MSE?
What does MAE measure differently from RMSE?

Percentage Errors¶

What is MAPE?
Why might MAPE be misleading when values are close to zero?

Theil Measures¶

What is the intuition behind Theil’s $U_1$ ?
What does Theil’s $U_2$ compare?

Challenge¶

Why might different evaluation measures rank forecasts differently?

Interpretation & Practice¶

A model has:

low bias
very high RMSE
- What does this imply?

A model has:

low MAE
high RMSE
- What does this suggest about the distribution of errors?

Two models produce similar RMSE, but one has lower MAE.
- What might this indicate?
A model performs worse than a naive forecast.
- What does this imply?
Forecast errors are consistently positive.
- What does this indicate about the model?

Thai Inflation Example¶

In the example:

Forecast 2 performs better under RMSE
Forecast 1 performs better under MAE
- Why might this happen?

Forecast 2 has higher variance proportion.
- What does this suggest?

Challenge¶

A model has excellent in-sample fit but poor out-of-sample performance.
- What might be happening?

Numerical Practice¶

Forecast Errors¶

Suppose:

actual: $x_t = 10$
forecast: $\hat{x}_t = 8$
Compute the forecast error.

Bias¶

Suppose forecast errors are:

2, -1, 3, -2

Compute the bias.

MSE and RMSE¶

Using the same errors:

Compute MSE
Compute RMSE

MAE¶

Compute MAE for the same data.

Interpretation¶

Compare RMSE and MAE.

Which is larger?
Why?

MAPE¶

Suppose:

actual: $x_t = 2$
forecast: $\hat{x}_t = 1$
Compute percentage error.

Suppose $x_t = 0.1$ .

Why might MAPE become unreliable?

Model Comparison¶

Suppose:

Model	RMSE	MAE
A	2.5	2.0
B	2.0	2.2

Which is better under RMSE?
Which is better under MAE?

Theil’s U2¶

Suppose:

Model RMSE = 1.5
Naive RMSE = 2.0
Compute interpretation of $U_2$ .

Challenge¶

Suppose a model minimizes MSE but performs poorly in practice.

Why might this happen?

You are forecasting inflation.

Model A minimizes RMSE
Model B minimizes MAE

Which would you choose if:

large mistakes are very costly?
average accuracy matters more?

Chapter 16 — Evaluating Forecasts

Learning Objectives¶

16.1 Forecast Errors¶

16.2 A Thai Inflation Forecast Example¶

16.3 Mean Forecast Error (Bias)¶

16.4 Mean Squared Error (MSE)¶

Thai Inflation Example¶

16.5 Root Mean Squared Error (RMSE)¶

Thai Inflation Example¶

16.6 Why RMSE Is Popular¶

16.7 Mean Absolute Error (MAE)¶

Thai Inflation Example¶

16.8 RMSE vs MAE¶

Intuition¶

16.9 Percentage Errors¶

16.10 Mean Absolute Percentage Error (MAPE)¶

Thai Inflation Example¶

16.11 Problems with MAPE¶

Problem 1: Division by Small Numbers¶

Problem 2: Undefined for Zero Values¶

Problem 3: Asymmetry¶

16.12 Relationship Between MSE, RMSE, and Bias¶

16.13 Theil’s U1U_1U1​ Statistic¶

Properties¶

Thai Inflation Example¶

16.14 Theil’s U2U_2U2​ Statistic¶

Definition¶

Interpretation¶

Thai Inflation Example¶

16.15 Forecast Error Decomposition¶

Bias Proportion¶

Variance Proportion¶

Covariance Proportion¶

Thai Inflation Example¶

Forecast 1¶

Forecast 2¶

16.16 Understanding the Sources of Forecast Error¶

Bias Component¶

Variability Component¶

Covariance Component¶

16.17 Choosing Between Forecasts¶

16.18 Why Forecast Evaluation Matters¶

16.19 Python Example: Comparing Forecast Accuracy¶

16.20 Gretl Example: Forecast Evaluation¶

Comparing Competing Models¶

16.21 Common Mistakes¶

16.22 Looking Ahead¶

Key Takeaways¶

Concept Check¶

Basic¶

Intuition¶

Measures¶

Percentage Errors¶

Theil Measures¶

Challenge¶

Interpretation & Practice¶

Thai Inflation Example¶

Challenge¶

Numerical Practice¶

Forecast Errors¶

Bias¶

MSE and RMSE¶

MAE¶

Interpretation¶

MAPE¶

Model Comparison¶

Theil’s U2¶

Challenge¶

16.13 Theil’s $U_1$ Statistic¶

16.14 Theil’s $U_2$ Statistic¶