Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 12 — Moving Average Models

In the previous chapter, we studied autoregressive (AR) models, where the current value depends on past values of the series itself.

We now turn to another important class of time series models:

This distinction is extremely important:

Together, AR and MA models form the foundation of Box–Jenkins time series analysis.


Learning Objectives

By the end of this chapter, you should be able to:


12.1 Moving Average Smoothing vs MA(q) Models

Earlier, we studied moving average smoothing:

x~t=13(xt1+xt+xt+1)\tilde{x}_t = \frac{1}{3}(x_{t-1}+x_t+x_{t+1})

This is a filtering method.

In MA models, the current value is driven by:


12.2 The Basic Idea

MA models capture how shocks affect the system for several periods.

Examples:


12.3 The MA(1) Model

where:


12.4 Intuition Behind MA(1)

Unlike AR models:


12.5 Simulating MA(1)

Source
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

n = 400
w = np.random.normal(size=n)

theta_values = [0.3, 0.8, -0.8]

fig, ax = plt.subplots(3,1, figsize=(10,8))

for i, theta in enumerate(theta_values):

    x = np.zeros(n)

    for t in range(1,n):
        x[t] = w[t] + theta*w[t-1]

    ax[i].plot(x)
    ax[i].set_title(rf"MA(1), $\theta={theta}$")

plt.tight_layout()
plt.savefig("figs/ch12/ma1.png", dpi=300, bbox_inches="tight")
plt.close()   # replace with plt.show()
MA1

12.6 Mean

E[xt]=0E[x_t] = 0

(assuming zero mean shocks)


12.7 Variance

Var(xt)=(1+θ2)σw2Var(x_t) = (1+\theta^2)\sigma_w^2

12.8 Autocovariance

γ(1)=θσw2\gamma(1) = \theta \sigma_w^2
γ(h)=0for h>1\gamma(h) = 0 \quad \text{for } h>1

12.9 Autocorrelation Function (ACF)

ρ(1)=θ1+θ2\rho(1) = \frac{\theta}{1+\theta^2}
ρ(h)=0for h>1\rho(h) = 0 \quad \text{for } h>1

Simulation

Source
from statsmodels.graphics.tsaplots import plot_acf

plot_acf(x, lags=20)

plt.savefig("figs/ch12/acf.png", dpi=300, bbox_inches="tight")
plt.close()   # replace with plt.show()
ACF

12.10 Partial Autocorrelation (PACF)

Source
from statsmodels.graphics.tsaplots import plot_pacf

plot_pacf(x, lags=20)

plt.savefig("figs/ch12/pacf.png", dpi=300, bbox_inches="tight")
plt.close()   # replace with plt.show()
PACF

12.11 MA(2)

xt=wt+θ1wt1+θ2wt2x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2}

12.12 General MA(q)

xt=wt+θ1wt1++θqwtqx_t = w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q}

12.13 Invertibility


12.14 AR vs MA

FeatureARMA
Depends onpast valuespast shocks
Memoryinfinitefinite
ACFtails offcuts off
PACFcuts offtails off

12.15 Estimation

Source
import statsmodels.api as sm

model = sm.tsa.ARIMA(x, order=(0,0,1))
res = model.fit()

print(res.summary())
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  400
Model:                 ARIMA(0, 0, 1)   Log Likelihood                -565.364
Date:                Mon, 04 May 2026   AIC                           1136.727
Time:                        21:54:51   BIC                           1148.702
Sample:                             0   HQIC                          1141.469
                                - 400                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0099      0.010     -0.954      0.340      -0.030       0.010
ma.L1         -0.7942      0.032    -24.900      0.000      -0.857      -0.732
sigma2         0.9865      0.070     14.130      0.000       0.850       1.123
===================================================================================
Ljung-Box (L1) (Q):                   0.06   Jarque-Bera (JB):                 0.29
Prob(Q):                              0.80   Prob(JB):                         0.86
Heteroskedasticity (H):               0.69   Skew:                             0.07
Prob(H) (two-sided):                  0.03   Kurtosis:                         3.00
===================================================================================

12.16 Diagnostics

Source
from statsmodels.stats.diagnostic import acorr_ljungbox

acorr_ljungbox(res.resid, lags=[10,20], return_df=True)
| |  lb_stat   | lb_pvalue |
|---------|------------|-----------|
| 10      | 4.539711   | 0.919734  |
| 20      | 23.589777  | 0.260771  |

12.17 Common Mistakes


12.18 Looking Ahead

Next, we combine AR and MA models into ARMA models.


Key Takeaways

Concept Check

Basic

  1. What is a moving average (MA) model?

  2. What is the key difference between:

    • moving average smoothing

    • MA(qq) stochastic models


Intuition

  1. In an MA(1) model, how long does a shock affect the system?

  2. What does it mean for an MA process to have finite memory?

  3. How does the parameter θ\theta affect the behavior of the series?


Intermediate

  1. Why do MA models depend on past shocks rather than past values?

  2. What happens to the effect of a shock after qq periods in an MA(qq) model?

  3. What is invertibility in MA models?


ACF & PACF

  1. What pattern does the ACF of an MA(1) process exhibit?

  2. What pattern does the PACF of an MA(1) process exhibit?

  3. Why does the ACF “cut off” for MA models?


Finance Insight

  1. Give an example of a real-world situation where shocks have temporary effects.

  2. Why might MA models be useful for modeling short-term market reactions?


Challenge

  1. Suppose θ\theta is very large (in absolute value).


Interpretation & Practice

  1. A time series reacts strongly to shocks but quickly stabilizes.

    • What type of model might describe this?

    • Why?


  1. ACF shows:


  1. PACF shows gradual decay.

    • What type of model might this indicate?


  1. ACF does NOT cut off sharply.

    • Why might this happen in practice even if the true model is MA(1)?


  1. A series shows:


Finance Interpretation

  1. A stock reacts to news for one or two days and then stabilizes.

    • What type of model could describe this?

    • Why?


  1. A volatility series shows long-lasting effects of shocks.

    • Would MA models be sufficient?

    • Why or why not?


Challenge

  1. A model fits the ACF pattern well but produces poor forecasts.

    • What might be missing?

    • Why might combining AR and MA be useful?


Numerical Practice

MA(1) Construction

  1. Consider:

xt=wt+0.5wt1x_t = w_t + 0.5 w_{t-1}

with shocks:

wt=2,1,3w_t = 2, -1, 3


Comparing Effects

  1. Repeat with:

xt=wt+0.9wt1x_t = w_t + 0.9 w_{t-1}

Finite Memory

  1. Suppose a shock of +5 occurs at time tt in an MA(1).


ACF Interpretation

  1. Suppose you observe:



Model Identification

  1. You observe:



Estimation Output

  1. Suppose an estimated MA(1) model is:

xt=wt+0.7wt1x_t = w_t + 0.7 w_{t-1}


Diagnostics

  1. After estimating an MA model, residuals show:



Challenge

  1. Suppose θ=1.2\theta = 1.2.


  1. Suppose you incorrectly use an MA(1) model when the true process is AR(1).


Appendix 12A — Mathematical Details of MA Models

This appendix provides additional insight into the structure of moving average (MA) models.

The goal is to explain why the key results hold, not just state them.


A.1 The MA(1) Process

Consider:

xt=wt+θwt1x_t = w_t + \theta w_{t-1}

where wtwn(0,σ2)w_t \sim wn(0,\sigma^2).


A.2 Mean

Taking expectations:

E[xt]=E[wt]+θE[wt1]=0E[x_t] = E[w_t] + \theta E[w_{t-1}] = 0

So:

E[xt]=0E[x_t] = 0

A.3 Variance

Var(xt)=Var(wt+θwt1)Var(x_t) = Var(w_t + \theta w_{t-1})

Because shocks are independent:

Var(xt)=Var(wt)+θ2Var(wt1)Var(x_t) = Var(w_t) + \theta^2 Var(w_{t-1})
=(1+θ2)σ2= (1 + \theta^2)\sigma^2

A.4 Autocovariance

We compute:

γ(h)=Cov(xt,xth)\gamma(h) = Cov(x_t, x_{t-h})

Lag 1

xt=wt+θwt1x_t = w_t + \theta w_{t-1}
xt1=wt1+θwt2x_{t-1} = w_{t-1} + \theta w_{t-2}

Only the shared term ( w_{t-1} ) contributes:

γ(1)=Cov(θwt1,wt1)=θσ2\gamma(1) = Cov(\theta w_{t-1}, w_{t-1}) = \theta \sigma^2

Lag 2

xt=wt+θwt1x_t = w_t + \theta w_{t-1}
xt2=wt2+θwt3x_{t-2} = w_{t-2} + \theta w_{t-3}

No overlapping shocks:

γ(2)=0\gamma(2) = 0

General Result

γ(h)={(1+θ2)σ2h=0θσ2h=10h>1\gamma(h) = \begin{cases} (1+\theta^2)\sigma^2 & h = 0 \\ \theta \sigma^2 & h = 1 \\ 0 & h > 1 \end{cases}

A.5 Autocorrelation Function (ACF)

ρ(h)=γ(h)γ(0)\rho(h) = \frac{\gamma(h)}{\gamma(0)}

So:

ρ(1)=θ1+θ2\rho(1) = \frac{\theta}{1+\theta^2}
ρ(h)=0for h>1\rho(h) = 0 \quad \text{for } h>1

A.6 Why the ACF Cuts Off

This is the defining feature of MA models.

At lag 1:

At lag 2:


A.7 Invertibility (Deeper Insight)

Consider:

xt=wt+θwt1x_t = w_t + \theta w_{t-1}

We can rearrange:

wt=xtθwt1w_t = x_t - \theta w_{t-1}

Substituting repeatedly:

wt=xtθxt1+θ2xt2w_t = x_t - \theta x_{t-1} + \theta^2 x_{t-2} - \cdots

Thus:

xt=j=0(θ)jxtj+wtx_t = \sum_{j=0}^{\infty} (-\theta)^j x_{t-j} + w_t

This expresses the MA process as an infinite AR process.


Condition

For convergence:

θ<1|\theta| < 1

A.8 Why Invertibility Matters

Without invertibility:


A.9 General MA(q)

xt=wt+θ1wt1++θqwtqx_t = w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q}

Key properties: