Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 20 — Cointegration and Long-Run Relationships

In Chapter 17, we saw that regressions involving nonstationary variables can produce spurious results.

In Chapter 18, we introduced dynamic models such as ARDL, which capture short-run dependence and adjustment dynamics.

An important question now arises:

The answer is yes.

This idea is called cointegration.


Learning Objectives

By the end of this chapter, you should be able to:


20.1 Motivation: Spurious vs Meaningful Relationships

Recall from Chapter 17:

If we regress two unrelated nonstationary variables,

yt=α+βxt+ety_t = \alpha + \beta x_t + e_t

we may obtain:

even when the variables are unrelated.

However, some nonstationary variables genuinely move together because they are linked by economic forces.

Examples include:


20.2 What Is Cointegration?

Suppose:

but a particular linear combination is stationary.

Then the variables are cointegrated.

Formally:

xtI(1),ytI(1)x_t \sim I(1), \qquad y_t \sim I(1)

but:

et=ytβxtI(0)e_t = y_t - \beta x_t \sim I(0)

20.3 Intuition: Long-Run Equilibrium

Even if two variables individually behave like random walks, their difference may remain stable.

Cointegration means that while variables may drift over time, they do not drift arbitrarily far apart.

This suggests that some equilibrium force ties them together.

Examples:


20.4 Spurious Regression vs Cointegration

This distinction is fundamental.

CaseResidualsInterpretation
Spurious regressionNonstationaryNo meaningful relationship
CointegrationStationaryLong-run equilibrium exists

If residuals are stationary, the regression may be meaningful despite nonstationarity in the original variables.


20.5 The Engle–Granger Two-Step Procedure

We now describe the classic Engle–Granger procedure for testing cointegration.

We use quarterly GDP data from Gretl.

Step 1: Load the Data

File → Open data → Sample file...

Select:

gdp

from the POE 4th ed. database.

The dataset contains:

usa     real GDP of USA
aus     real GDP of Australia

Step 2: Estimate the Long-Run Relationship

Estimate:

aust=α+βusat+etaus_t = \alpha + \beta usa_t + e_t

Gretl Command

ols aus const usa

Output

Model 1: OLS, using observations 1970:1-2000:4 (T = 124)
Dependent variable: aus

             coefficient   std. error   t-ratio    p-value 
  ---------------------------------------------------------
  const       −1.07237     0.403225      −2.659   0.0089    ***
  usa          1.00099     0.00610028   164.1     5.85e-145 ***

Mean dependent var   62.72528
R-squared            0.995489
Durbin-Watson        0.272654

Step 3: Extract the Residuals

Save the residuals:

series uhat = $uhat
[GRETL Screenshot Placeholder: Residual series]

Step 4: Test Residual Stationarity

We now test whether the residuals are stationary.

This is the crucial step.

Select uhat.

Then:

Variable → Unit root tests → Augmented Dickey-Fuller

Gretl Command

adf 1 uhat

Example Output

Augmented Dickey-Fuller test for uhat
unit-root null hypothesis: a = 1

test statistic: tau_c(1) = -3.03875
asymptotic p-value 0.03145

20.6 Hypotheses

We test:

H0:Residuals contain a unit rootH_0: \text{Residuals contain a unit root}

against:

H1:Residuals are stationaryH_1: \text{Residuals are stationary}

20.7 Interpretation

If residuals are stationary:


20.8 Why Residual Stationarity Matters

Suppose:

et=ytβxte_t = y_t - \beta x_t

is stationary.

Then although:

their deviations from equilibrium remain bounded.


20.9 Important Caveats

The Engle–Granger procedure has several limitations.

In multivariate systems, more advanced methods may be preferable.


20.10 Cointegration and Dynamic Models

Cointegration naturally connects to the ARDL framework from Chapter 18.

Recall the ARDL model:

yt=α+ϕyt1+β0xt+β1xt1+uty_t = \alpha + \phi y_{t-1} + \beta_0 x_t + \beta_1 x_{t-1} + u_t

This model contains both:


20.11 Cointegration via ARDL (Bounds Testing)

The ARDL bounds approach provides an alternative to Engle–Granger.

An important advantage is flexibility.

as long as none are I(2)I(2).


20.12 From ARDL to ECM

Consider:

yt=α+ϕyt1+β0xt+β1xt1+uty_t = \alpha + \phi y_{t-1} + \beta_0 x_t + \beta_1 x_{t-1} + u_t

This can be rewritten as:

Δyt=γΔxt+λ1yt1+λ2xt1+ut\Delta y_t = \gamma \Delta x_t + \lambda_1 y_{t-1} + \lambda_2 x_{t-1} + u_t

20.13 The Bounds Test

We test:

H0:λ1=λ2=0H_0: \lambda_1 = \lambda_2 = 0

against:

H1:At least one coefficient is nonzeroH_1: \text{At least one coefficient is nonzero}

Decision Rule

F-statisticConclusion
below lower boundno cointegration
above upper boundcointegration
between boundsinconclusive

20.14 Implementing ARDL Bounds Testing in GRETL

Step 1: Estimate an ARDL Model

Model → Time series → ARDL

Choose:

[GRETL Screenshot Placeholder: ARDL specification window]

Step 2: Perform Bounds Test

From the ARDL output window:

[GRETL Screenshot Placeholder: Bounds test output]

Example Command

ardl 2 2 y x

Then:

ecm

20.15 Comparing Engle–Granger and ARDL

FeatureEngle–GrangerARDL Bounds
Requires all variables I(1)I(1)YesNo
Residual-basedYesNo
Dynamic model basedNoYes
Allows mixed I(0)/I(1)I(0)/I(1) variablesNoYes

20.16 Common Mistakes


20.17 Looking Ahead

Cointegration tells us that a long-run equilibrium relationship exists.

But how do variables adjust when they deviate from equilibrium?

This leads naturally to the Error Correction Model (ECM).

Key Takeaways

Concept Check

Basic

  1. What is cointegration?

  2. What does it mean for two variables to be I(1)I(1)?

  3. What does it mean for a linear combination of variables to be I(0)I(0)?


Intuition

  1. Why can two nonstationary variables still have a meaningful relationship?

  2. What is meant by a long-run equilibrium?

  3. Explain the “rubber band” analogy for cointegration.


Spurious vs Cointegration

  1. What distinguishes a spurious regression from a cointegrated relationship?

  2. Why is a high R2R^2 not sufficient evidence of cointegration?

  3. What role do residuals play in diagnosing cointegration?


Engle–Granger Procedure

  1. What are the two steps in the Engle–Granger method?

  2. What is the null hypothesis in the residual-based test?

  3. What does it mean to reject the null hypothesis?


ARDL and Bounds Testing

  1. How does the ARDL bounds approach differ from Engle–Granger?

  2. What is the key hypothesis tested in the bounds test?


Challenge

  1. Can cointegration exist if one variable is I(0)I(0) and the other is I(1)I(1)?


Interpretation & Practice

  1. A regression between two variables produces:

  1. Residuals from a regression are stationary.

    • What does this suggest?

  2. Two variables are both I(1)I(1), but their difference is stationary.

    • What does this imply?

  3. ADF test on residuals gives p-value = 0.02.

    • What is your conclusion?

  4. ADF test on residuals gives p-value = 0.60.

    • What is your conclusion?


ARDL Interpretation

  1. In an ARDL model, lagged level terms are jointly significant.

    • What does this imply?

  2. Bounds test F-statistic is above the upper bound.

    • What is your conclusion?


Economic Interpretation

  1. Consumption and income are cointegrated.

    • What does this imply about their relationship?


Challenge

  1. A regression is significant in levels but insignificant in differences.

    • What might this suggest?


Numerical Practice

Residual-Based Logic

  1. Suppose:



ADF Interpretation Table

  1. Consider:

SeriesADF p-value
xtx_t0.85
yty_t0.78
residuals0.03


  1. Now consider:

SeriesADF p-value
xtx_t0.90
yty_t0.88
residuals0.72


Engle–Granger

  1. Explain why testing residuals is central to the Engle–Granger procedure.


Bounds Test

  1. Suppose:



Interpretation

  1. Suppose cointegration exists.


Challenge

  1. Suppose two variables are cointegrated.

  1. You regress:

You find:



Appendix 20A — The ARDL Bounds Test (Conceptual Overview)

This appendix provides a simplified explanation of the ARDL bounds approach.


A.1 Starting Point

Consider:

Δyt=γΔxt+λ1yt1+λ2xt1+ut\Delta y_t = \gamma \Delta x_t + \lambda_1 y_{t-1} + \lambda_2 x_{t-1} + u_t

A.2 The Key Question

Do the lagged level terms matter?

That is:

H0:λ1=λ2=0H_0: \lambda_1 = \lambda_2 = 0

A.3 Interpretation

If both coefficients are zero:

If at least one coefficient is nonzero:


A.4 Why Two Critical Values?

The asymptotic distribution depends on whether variables are:

The bounds approach therefore provides:


A.5 Decision Rule