Chapter 19 — Predictive Causality (Granger Causality)

In the previous chapter, we introduced dynamic regression models and emphasized the importance of lagged relationships.

We now ask a natural question:

This idea is formalized through Granger causality.

Despite the name, Granger causality is not about deep philosophical or structural causality. It is about predictive content.

Learning Objectives¶

By the end of this chapter, you should be able to:

explain the idea of Granger causality
distinguish predictive causality from true causality
understand restricted and unrestricted models
interpret the F-test for Granger causality
choose lag length carefully
implement Granger causality tests in Gretl
understand common limitations and pitfalls

19.1 What Is Granger Causality?¶

Suppose we want to know whether $x_t$ helps predict $y_t$ .

We say that $x_t$ Granger-causes $y_t$ if past values of $x_t$ help forecast $y_t$ after controlling for past values of $y_t$ .

19.2 Prediction, Not True Causation¶

Granger causality is about forecasting.

It asks:

Do past values of $x_t$ improve prediction of $y_t$ ?

It does not automatically answer:

Does $x_t$ structurally or economically cause $y_t$ ?

Examples¶

We might ask:

Do interest rates help predict inflation?
Does income help predict consumption?
Does money supply help predict output?
Do exchange rates help predict exports?

In each case, the question is predictive:

19.3 Restricted and Unrestricted Models¶

To test whether $x_t$ Granger-causes $y_t$ , we compare two models.

Restricted Model¶

The restricted model uses only past values of $y_t$ :

y_t = \alpha + \sum_{i=1}^{p}\phi_i y_{t-i} + u_t

This model assumes that past values of $x_t$ do not help predict $y_t$ .

Unrestricted Model¶

The unrestricted model includes both past values of $y_t$ and past values of $x_t$ :

y_t = \alpha + \sum_{i=1}^{p}\phi_i y_{t-i} + \sum_{j=1}^{q}\beta_j x_{t-j} + u_t

19.4 Hypothesis Testing Framework¶

The null hypothesis is that lagged values of $x_t$ do not help predict $y_t$ :

H_0: \beta_1 = \beta_2 = \cdots = \beta_q = 0

The alternative is:

H_1: \text{at least one } \beta_j \neq 0

19.5 The F-Test Intuition¶

The Granger causality test compares:

the restricted model
the unrestricted model

If the unrestricted model fits much better, then lagged values of $x_t$ add useful predictive information.

The F-statistic is:

F = \frac{(SSR_R - SSR_U)/q}{SSR_U/(T-k)}

where:

$SSR_R$ is the sum of squared residuals from the restricted model
$SSR_U$ is the sum of squared residuals from the unrestricted model
$q$ is the number of restrictions
$T$ is the number of observations
$k$ is the number of parameters in the unrestricted model

19.6 Possible Outcomes¶

Granger causality can run in one direction, both directions, or neither direction.

Result	Interpretation
$x \rightarrow y$	$x$ helps predict $y$
$y \rightarrow x$	$y$ helps predict $x$
both directions	feedback relationship
neither direction	no predictive relationship

19.7 Lag Length Matters¶

Lag selection is crucial.

Too few lags may omit important dynamics.

Too many lags may reduce degrees of freedom and make the test less precise.

In practice, lag length can be guided by:

AIC
BIC
HQC
residual diagnostics
economic reasoning

19.8 Stationarity Matters¶

Granger causality tests are usually applied to stationary series.

If variables are $I(1)$ , common options include:

difference the variables
test for cointegration
use an error correction framework if cointegration exists

19.9 Economic Interpretation¶

Even when Granger causality is detected, we should interpret the result carefully.

It does not necessarily imply:

policy effectiveness
a structural mechanism
true economic causality

19.10 Gretl Example: Interest Rates and Inflation¶

We now implement a Granger causality test in GRETL using the jgm-data dataset. This follows the example in your notes. :contentReference[oaicite:0]{index=0}

We ask:

Do short-term interest rates help predict CPI inflation?

Step 1: Load the Data¶

File → Open data → Sample file...

Select jgm-data from the GRETL database.

We use:

pi_c     inflation rate based on the CPI
r_s      short-term interest rate

19.11 Restricted Model¶

The restricted model predicts inflation using only its own past values.

In your notes, lag selection suggests an AR(1) model for pi_c.

Model → Univariate time series → ARIMA lag selection

Dependent variable:

pi_c

Try a maximum AR lag of 5.

Example output suggests that one lag is sufficient.

Then estimate:

Model → Univariate time series → ARIMA

with AR order 1.

Restricted Model Output¶

Model 2: ARMA, using observations 1952-1994 (T = 43)
Estimated using AS 197 (exact ML)
Dependent variable: pi_c

             coefficient   std. error     z       p-value 
  --------------------------------------------------------
  const       3.53530      1.64385       2.151   0.0315    **
  phi_1       0.873605     0.0693047    12.61    1.97e-036 ***

Mean dependent var   4.246077   S.D. dependent var   3.195937
Mean of innovations  0.034572   S.D. of innovations  1.532847
R-squared            0.764867   Adjusted R-squared   0.764867
Log-likelihood      −80.10105   Akaike criterion     166.2021
Schwarz criterion    171.4857   Hannan-Quinn         168.1505

Save the error sum of squares:

Save → Error sum of squares

In the example:

SSR_R = 101.0337

19.12 Unrestricted Model¶

The unrestricted model includes lagged interest rates.

We use two lags of r_s, based on lag selection.

Lag Selection¶

Model → Multivariate time series → VAR lag selection

Use:

dependent variable: pi_c
predictor: r_s
maximum lag: 5

Example output:

VAR system, maximum lag order 5

lags        loglik    p(LR)       AIC          BIC          HQC

   1     -68.39186             3.757466     3.886749     3.803464 
   2     -65.81966  0.02332    3.674719*    3.847096*    3.736049*
   3     -65.81635  0.93515    3.727176     3.942648     3.803839 
   4     -65.76655  0.75230    3.777187     4.035753     3.869182 
   5     -65.76616  0.97785    3.829798     4.131458     3.937126

The information criteria suggest two lags.-

Estimate the Unrestricted Model¶

Model → Univariate time series → ARIMA

Use:

dependent variable: pi_c
AR order: 1
regressors: r_s(-1) r_s(-2)

Use the lags... icon in the model box to create lagged regressors.

Output¶

Model 3: ARMAX, using observations 1954-1994 (T = 41)
Estimated using AS 197 (exact ML)
Dependent variable: pi_c

             coefficient   std. error     z       p-value 
  --------------------------------------------------------
  const        2.48133     1.73771       1.428   0.1533   
  phi_1        0.886579    0.0736422    12.04    2.22e-033 ***
  r_s_1        0.342550    0.111572      3.070   0.0021    ***
  r_s_2       −0.197852    0.109574     −1.806   0.0710    *

Mean dependent var   4.415833   S.D. dependent var   3.153664
Mean of innovations  0.076028   S.D. of innovations  1.288293
R-squared            0.829694   Adjusted R-squared   0.820731
Log-likelihood      −69.33345   Akaike criterion     148.6669
Schwarz criterion    157.2348   Hannan-Quinn         151.7868

Save the unrestricted error sum of squares.

In the example:

SSR_U = 68.0476

19.13 Manual F-Test¶

We now compute:

F = \frac{(SSR_R - SSR_U)/q}{SSR_U/(T-k)}

Using:

$SSR_R = 101.0337$
$SSR_U = 68.0476$
$q = 2$
$T = 41$
$k = 4$

we get:

F = \frac{(101.0337 - 68.0476)/2}{68.0476/(41-4)} = 8.97

19.14 VAR Alternative in Gretl¶

A more direct way is to estimate a VAR and use GRETL’s built-in tests.

Model → Multivariate time series → Vector Autoregression

Choose:

lag order: 2
endogenous variables: pi_c and r_s

Command¶

var 2 pi_c r_s

Example output:

VAR system, lag order 2
OLS estimates, observations 1954-1994 (T = 41)

Equation 1: pi_c

             coefficient   std. error   t-ratio   p-value 
  --------------------------------------------------------
  const       1.15247       0.435623     2.646    0.0120   **
  pi_c_1      0.900227      0.152258     5.913    9.10e-07 ***
  pi_c_2      0.0613022     0.160992     0.3808   0.7056  
  r_s_1       0.255695      0.134001     1.908    0.0644   *
  r_s_2      −0.401963      0.123225    −3.262    0.0024   ***

F-tests of zero restrictions:

All lags of pi_c             F(2, 36) =   43.758 [0.0000]
All lags of r_s              F(2, 36) =   5.6225 [0.0075]
All vars, lag 2              F(2, 36) =   6.1167 [0.0052]

The line:

All lags of r_s              F(2, 36) =   5.6225 [0.0075]

tests whether lagged values of r_s help predict pi_c.

Since the p-value is small, we reject the null.

19.15 Common Mistakes¶

19.16 Looking Ahead¶

Granger causality focuses on predictive relationships, usually in stationary data.

However, many economic time series are nonstationary.

In the next chapter, we introduce cointegration, which allows us to study meaningful long-run relationships between nonstationary variables.

Key Takeaways¶

Concept Check¶

Basic¶

What is Granger causality?
What does it mean for $x_t$ to Granger-cause $y_t$ ?
What is the difference between predictive causality and true causality?

Intuition¶

Why can a variable help predict another without truly causing it?
Why is Granger causality fundamentally a forecasting concept?
Why is it important to include lagged values of $y_t$ in the test?

Models¶

What is the difference between:
- restricted model
- unrestricted model
What is the null hypothesis in a Granger causality test?
What does it mean to reject the null hypothesis?

Testing¶

What does the F-test compare?
What does a large F-statistic indicate?

Lag Length¶

Why does lag length matter in Granger causality testing?
What happens if too few lags are used?
What happens if too many lags are used?

Stationarity¶

Why must variables be stationary before applying Granger causality tests?

Challenge¶

Can Granger causality exist in both directions?

What does this imply?

Interpretation & Practice¶

A test finds that $x_t$ Granger-causes $y_t$ .
- What does this mean?
- What does it NOT mean?
A test fails to reject the null.
- What conclusion can you draw?
- What can you NOT conclude?
Both $x_t$ and $y_t$ Granger-cause each other.
- What type of relationship might this indicate?
A model includes too few lags.
- How might this affect the test?
A model includes too many lags.
- What problem might arise?

Stationarity Interpretation¶

You run a Granger test on nonstationary variables in levels.
- What is the risk?
After differencing, the Granger result disappears.
- What does this suggest?

Economic Interpretation¶

Interest rates Granger-cause inflation.
- Why should we interpret this result cautiously?

Challenge¶

A third variable affects both $x_t$ and $y_t$ .
- How could this affect Granger causality results?

Numerical Practice¶

Understanding the Test¶

Suppose:

$SSR_R = 120$
$SSR_U = 80$
$q = 2$
$T = 50$
$k = 4$

Compute the F-statistic.

Interpretation¶

The F-statistic is large and statistically significant.

What is your decision?
What does it imply?

Model Comparison¶

Suppose:

restricted model fits poorly
unrestricted model fits much better

What does this suggest?

Lag Structure¶

Suppose only one lag is included, but the true relationship requires two lags.

What happens to the test?

Stationarity¶

Suppose both variables are random walks.

Why might the test be misleading?

VAR Output Interpretation¶

Suppose Gretl reports:

All lags of x: F(2,36) = 5.2 [0.01]

What does this mean?
What is your conclusion?

Challenge¶

Suppose:

$x_t$ does not Granger-cause $y_t$
but $y_t$ Granger-causes $x_t$
How would you interpret this?

Suppose:

results change dramatically when lag length changes
What does this suggest?

Appendix 19A — Testing Linear Restrictions in Regression¶

This appendix explains the general idea behind tests such as the Granger causality test.

A.1 The Basic Idea¶

Suppose we estimate:

y_t = \alpha + \beta_1 x_{1t} + \beta_2 x_{2t} + \cdots + \beta_k x_{kt} + u_t

We may want to test whether some variables matter jointly.

A.2 A Joint Hypothesis¶

For example:

H_0: \beta_2 = \beta_3 = 0

This means that $x_{2t}$ and $x_{3t}$ have no effect on $y_t$ after controlling for the other variables.

A.3 Restricted and Unrestricted Models¶

The unrestricted model includes all variables:

y_t = \alpha + \beta_1 x_{1t} + \beta_2 x_{2t} + \beta_3 x_{3t} + u_t

The restricted model imposes the null hypothesis:

y_t = \alpha + \beta_1 x_{1t} + u_t

A.4 Comparing the Models¶

We compare the sum of squared residuals:

$SSR_U$ from the unrestricted model
$SSR_R$ from the restricted model

If the restricted model fits much worse, then the excluded variables are probably important.

A.5 The F-Test¶

The test statistic is:

F = \frac{(SSR_R - SSR_U)/q}{SSR_U/(T-k)}

where:

$q$ is the number of restrictions
$T$ is the number of observations
$k$ is the number of parameters in the unrestricted model

A.6 Connection to Granger Causality¶

In Granger causality testing, the null is:

H_0: \text{lagged values of } x_t \text{ do not help predict } y_t

This is simply a joint test that several lag coefficients are equal to zero.

A.7 Intuition in Words¶

The logic is simple:

if excluded variables do not matter, removing them should not change the model much
if excluded variables do matter, removing them worsens the fit

Chapter 19 — Predictive Causality (Granger Causality)

Learning Objectives¶

19.1 What Is Granger Causality?¶

19.2 Prediction, Not True Causation¶

Examples¶

19.3 Restricted and Unrestricted Models¶

Restricted Model¶

Unrestricted Model¶

19.4 Hypothesis Testing Framework¶

19.5 The F-Test Intuition¶

19.6 Possible Outcomes¶

19.7 Lag Length Matters¶

19.8 Stationarity Matters¶

19.9 Economic Interpretation¶

19.10 Gretl Example: Interest Rates and Inflation¶

Step 1: Load the Data¶

Menu¶

19.11 Restricted Model¶

Menu¶

Restricted Model Output¶

19.12 Unrestricted Model¶

Lag Selection¶

Menu¶

Estimate the Unrestricted Model¶

Menu¶

Output¶

19.13 Manual F-Test¶

19.14 VAR Alternative in Gretl¶

Menu¶

Command¶

19.15 Common Mistakes¶

19.16 Looking Ahead¶

Key Takeaways¶

Concept Check¶

Basic¶

Intuition¶

Models¶

Testing¶

Lag Length¶

Stationarity¶

Challenge¶

Interpretation & Practice¶

Stationarity Interpretation¶

Economic Interpretation¶

Challenge¶

Numerical Practice¶

Understanding the Test¶

Interpretation¶

Model Comparison¶

Lag Structure¶

Stationarity¶

VAR Output Interpretation¶

Challenge¶

Appendix 19A — Testing Linear Restrictions in Regression¶

A.1 The Basic Idea¶

A.2 A Joint Hypothesis¶

A.3 Restricted and Unrestricted Models¶

A.4 Comparing the Models¶

A.5 The F-Test¶

A.6 Connection to Granger Causality¶

A.7 Intuition in Words¶