Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 14 — ARIMA Models

In the previous chapters, we studied:

These models assume that the underlying process is stationary.

However, many economic and financial time series are not stationary.

Examples include:

These series often exhibit:

The solution is to transform the data before modeling.


Learning Objectives

By the end of this chapter, you should be able to:


14.1 Why ARIMA Models?

Many real-world time series exhibit strong persistence and nonstationarity.

For example:

Applying stationary ARMA models directly to such data may produce misleading results.


14.2 Differencing Revisited

Recall the first difference operator:

Δxt=xtxt1\Delta x_t = x_t - x_{t-1}

Differencing removes persistent stochastic trends.

Example: Random Walk

Suppose:

xt=xt1+wtx_t = x_{t-1} + w_t

Then:

Δxt=wt\Delta x_t = w_t

which is white noise.


14.3 Integrated Processes

First Difference

Δxt=xtxt1\Delta x_t = x_t - x_{t-1}

Second Difference

Δ2xt=Δ(Δxt)\Delta^2 x_t = \Delta(\Delta x_t)

or:

Δ2xt=xt2xt1+xt2\Delta^2 x_t = x_t - 2x_{t-1} + x_{t-2}

14.4 The ARIMA Model

General Form

After differencing dd times:

ϕ(B)(1B)dxt=θ(B)wt\phi(B)(1-B)^d x_t = \theta(B)w_t

where:


14.5 Understanding the Components

AR Component

Captures persistence through past values.

I Component

Captures nonstationarity through differencing.

MA Component

Captures temporary propagation of shocks.


14.6 Example: ARIMA(0,1,0)

Consider:

(1B)xt=wt(1-B)x_t = w_t

or:

xtxt1=wtx_t - x_{t-1} = w_t

Thus:

xt=xt1+wtx_t = x_{t-1} + w_t

14.7 Example: ARIMA(1,1,0)

Suppose:

(1ϕB)(1B)xt=wt(1-\phi B)(1-B)x_t = w_t

or equivalently:

Δxt=ϕΔxt1+wt\Delta x_t = \phi \Delta x_{t-1} + w_t

14.8 Example: ARIMA(0,1,1)

Suppose:

(1B)xt=(1+θB)wt(1-B)x_t = (1+\theta B)w_t

Then:

Δxt=wt+θwt1\Delta x_t = w_t + \theta w_{t-1}

14.9 Simulating an ARIMA Process

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

n = 400

phi = 0.7

w = np.random.normal(size=n)

dx = np.zeros(n)

for t in range(1,n):
    dx[t] = phi*dx[t-1] + w[t]

x = np.cumsum(dx)

fig, ax = plt.subplots(2,1, figsize=(10,6))

ax[0].plot(dx)
ax[0].set_title("Differenced Series")

ax[1].plot(x)
ax[1].set_title("Integrated Series")

plt.tight_layout()

plt.savefig("figs/ch14/ARIMA.png", dpi=300, bbox_inches="tight")
plt.close()   # replace with plt.show()
ARMA

14.10 The Box–Jenkins Methodology

Classical ARIMA modeling follows the Box–Jenkins approach.

Step 1 — Identification

Step 2 — Estimation

Estimate candidate ARIMA models.

Step 3 — Diagnostic Checking

Check whether residuals resemble white noise.

Step 4 — Forecasting

Generate forecasts and evaluate performance.


14.11 Identifying Differencing Order

A series may require differencing if:


14.12 Under-Differencing vs Over-Differencing

Under-Differencing

Over-Differencing


14.13 ACF and PACF in ARIMA Modeling

After differencing:

Typical Patterns

ModelACFPACF
AR(pp)tails offcuts off
MA(qq)cuts offtails off
ARMA(p,qp,q)tails offtails off

14.14 Estimation in Gretl

Model → Time Series → ARIMA

Typical Workflow

  1. plot the series

  2. test for unit roots

  3. difference if needed

  4. inspect ACF/PACF

  5. estimate candidate models

  6. compare AIC/BIC

  7. check residuals

[GRETL Screenshot Placeholder: ARIMA estimation dialog]
[GRETL Screenshot Placeholder: ARIMA output]

14.15 Residual Diagnostics

Residuals should resemble white noise.

Residual ACF

import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf

model = sm.tsa.ARIMA(x, order=(1,1,0))
res = model.fit()

plot_acf(res.resid, lags=20)

plt.savefig("figs/ch14/acf.png", dpi=300, bbox_inches="tight")
plt.close()   # replace with plt.show()
ACF

Ljung–Box Test

from statsmodels.stats.diagnostic import acorr_ljungbox

acorr_ljungbox(res.resid, lags=[10,20], return_df=True)
|  | lb_stat   | lb_pvalue |
|---------|------------|-----------|
| 10      | 4.729113   | 0.908524  |
| 20      | 25.169131  | 0.195037  |


14.16 Information Criteria

Model selection often uses:

Akaike Information Criterion

AIC=2log(L^)+2kAIC = -2\log(\hat L) + 2k

Bayesian Information Criterion

BIC=2log(L^)+klognBIC = -2\log(\hat L) + k\log n


14.17 Forecasting with ARIMA Models

Once estimated, ARIMA models can generate forecasts.

Multi-Step Forecasts

Forecasts are generated recursively using:


14.18 ARIMA Models in Economics and Finance

ARIMA models are widely used for:


14.19 Common Mistakes


14.20 Looking Ahead

In this chapter, we extended ARMA models to handle nonstationary series through differencing.

We now move to forecasting and forecast evaluation, where we study:

Key Takeaways

Concept Check

Basic

  1. What is an ARIMA model?

  2. What does the “I” in ARIMA represent?

  3. What does differencing do to a time series?


Intuition

  1. Why are many economic time series nonstationary?

  2. Why is it problematic to apply ARMA models to nonstationary data?

  3. What is the idea behind transforming data before modeling?


Intermediate

  1. What does it mean for a series to be:

    • I(0)I(0)

    • I(1)I(1)

  2. What is the difference between first and second differencing?

  3. Why is most real-world data I(1)I(1) rather than I(2)I(2)?


ARIMA Structure

  1. What do pp, dd, and qq represent in ARIMA(p,d,qp,d,q)?

  2. What happens after differencing is applied?


Challenge

  1. Suppose a series becomes stationary after differencing once.


Interpretation & Practice

  1. A time series shows a strong upward trend.

    • What transformation might be needed?

  2. After differencing, the series fluctuates around zero.

    • What does this suggest?

  3. A series exhibits very slow ACF decay.

    • What does this indicate?

  4. After differencing, ACF shows AR-type behavior.

    • What does this suggest?

  5. A series still appears nonstationary after differencing once.

    • What might you do next?


Finance Interpretation

  1. Stock prices are nonstationary.

    • Why are returns preferred for modeling?

  2. A return series appears stationary.

    • Why is this useful?


Challenge

  1. A model fits well but uses d=2d=2.

    • Why might this be problematic?


Model Selection (AIC & BIC)

  1. Suppose you estimate two ARIMA models:

ModelAICBIC
ARIMA(1,1,1)520540
ARIMA(2,1,2)510560


  1. Suppose you estimate:

ModelAICBIC
ARIMA(1,1,0)600610
ARIMA(3,1,2)590640


  1. Explain the intuition behind:

AIC=2logL+2kAIC = -2\log L + 2k
BIC=2logL+klognBIC = -2\log L + k \log n


  1. Why does BIC typically select simpler models than AIC?


Interpretation

  1. A model has very low AIC but performs poorly out-of-sample.


Challenge

  1. Suppose you keep adding lags to improve fit.


Numerical Practice

Differencing

  1. Given:

xt=100,105,111,118x_t = 100, 105, 111, 118

  1. Compute second differences.


Identification

  1. Suppose:


  1. Suppose after differencing:


Model Structure

  1. Interpret:

ARIMA(1,1,1)ARIMA(1,1,1)

Diagnostics

  1. Residuals still show autocorrelation.


Challenge

  1. Suppose you over-difference a series.


Appendix 14A — Understanding Differencing and Integration

A.1 First Difference

Δxt=xtxt1\Delta x_t = x_t - x_{t-1}

This removes linear stochastic trends.


A.2 Random Walk Example

xt=xt1+wtx_t = x_{t-1} + w_t

Then:

Δxt=wt\Delta x_t = w_t

A.3 Second Difference

Δ2xt=xt2xt1+xt2\Delta^2 x_t = x_t - 2x_{t-1} + x_{t-2}

Used for stronger nonstationarity.


A.4 Why Differencing Works

Nonstationary series accumulate shocks:

xt=wtx_t = \sum w_t

Differencing removes this accumulation:

Δxt=wt\Delta x_t = w_t

A.5 Practical Interpretation