Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 3 — A Quick Review of Probability and Statistics

Time series analysis combines:

Before studying dynamic models, we briefly review several statistical ideas that will appear repeatedly throughout the book.

This chapter is intentionally intuitive and applications-oriented.

The goal is not to provide a complete statistics course.

Instead, the goal is to build enough intuition to understand uncertainty in time series analysis.

We will focus on ideas that are especially important in economics, finance, and forecasting.


Learning Objectives

By the end of this chapter, you should be able to:


3.1 Why Probability Matters

Economic and financial systems are uncertain.

Examples include:

We therefore cannot predict outcomes with certainty.

Instead, we reason probabilistically.


3.2 Random Variables

A random variable is a variable whose future value is uncertain.

Examples include:

We often denote a random variable by:

XX

Possible outcomes may include:

x1,x2,x3,x_1,x_2,x_3,\dots

Example: Stock Returns

Suppose tomorrow’s stock return could be:

OutcomeProbability
+2%0.4
0%0.3
-3%0.3

The future return is uncertain.

Probability describes how likely different outcomes are.


3.3 Populations and Samples

In statistics, we often distinguish between:

Population

A population represents the entire set of possible observations.

Example:

Sample

A sample is the observed subset of data.

Example:


3.4 Mean and Expected Value

The mean measures the center of a distribution.

For observations:

x1,x2,,xnx_1,x_2,\dots,x_n

the sample mean is:

xˉ=1ni=1nxi\bar{x} = \frac{1}{n} \sum_{i=1}^{n}x_i

Interpretation

The mean measures the average level of the data.

Examples:

Expected Value

For random variables, the theoretical average is called the expected value.

E[X]E[X]

3.5 Variance and Volatility

The mean alone is not enough.

Two variables may have the same average but very different uncertainty.

Variance measures dispersion around the mean.

Variance

The sample variance is:

s2=1n1i=1n(xixˉ)2s^2 = \frac{1}{n-1} \sum_{i=1}^{n}(x_i-\bar{x})^2

The standard deviation is:

s=s2s=\sqrt{s^2}

Financial Interpretation

In finance, volatility is often measured using the standard deviation of returns.

Higher volatility means:


3.6 Probability Distributions

A probability distribution describes how likely different outcomes are.

Different variables may follow different distributions.

Examples

VariableTypical Distribution Shape
heightsroughly normal
incomeright-skewed
stock returnsfat-tailed
waiting timesasymmetric

3.7 The Normal Distribution

The normal distribution is one of the most important probability distributions in statistics.

It is symmetric and bell-shaped.

XN(μ,σ2)X \sim N(\mu,\sigma^2)

means:

Properties of the Normal Distribution

The normal distribution is:

The 68–95–99.7 Rule

For a normal distribution:

RangeApproximate Probability
within 1 standard deviation68%
within 2 standard deviations95%
within 3 standard deviations99.7%

3.8 Why Finance Often Violates Normality

Financial returns often display:

Large events occur more often than predicted by the normal distribution.

This phenomenon is called:

fat tails

Example

Stock market crashes are much more common than a simple normal model would predict.

This observation motivates later volatility models such as ARCH and GARCH.


3.9 The t Distribution

The t distribution resembles the normal distribution but has fatter tails.

It is especially useful when:

Intuition

Compared with the normal distribution:


3.10 The F Distribution

The F distribution commonly appears when comparing:

For now, the important point is conceptual:

We will encounter F-tests later in regression and VAR models.


3.11 Sampling Uncertainty

Suppose we estimate average stock returns using historical data.

If we choose a different sample period, the estimated mean may differ.

This is called sampling uncertainty.

Example

Average returns estimated from:

may differ substantially.


3.12 Statistical Inference

Statistical inference uses sample data to make statements about broader populations.

Examples include:

Inference therefore combines:


3.13 Hypothesis Testing

A hypothesis test evaluates whether the data support a particular claim.

Example

Suppose we test:

H0:μ=0H_0:\mu=0

against:

H1:μ0H_1:\mu \neq 0

where:

Financial Interpretation

For example:

H0:average return=0H_0:\text{average return}=0

asks whether average returns differ significantly from zero.


3.14 Test Statistic

A test statistic summarizes how far the data deviate from the null hypothesis.

For example, a t-statistic often takes the form:

t=observedexpected valuestandard errort = \frac{ \text{observed} - \text{expected value} }{ \text{standard error} }

Large values suggest stronger evidence against the null hypothesis.


3.15 p-Values

A p-value measures how surprising the observed data would be if the null hypothesis were true.

Small p-values imply stronger evidence against the null hypothesis.

Common Rule

p-valueInterpretation
smallevidence against H0H_0
largeinsufficient evidence against H0H_0

This is a very common misunderstanding.


3.16 Confidence Intervals

A confidence interval provides a plausible range for an unknown parameter.

For example:

Average inflation is estimated to be 2.5% ± 0.5%.

This communicates uncertainty more clearly than a single number.


3.17 Correlation

Correlation measures how strongly two variables move together.

The correlation coefficient is often denoted:

ρ\rho

or:

rr

Interpretation

CorrelationMeaning
+1perfect positive relationship
0no linear relationship
-1perfect negative relationship

Financial Examples

Correlations matter greatly in finance because diversification depends on how assets move together.

Two variables may move together without one causing the other.


3.18 Randomness vs Predictability

A central challenge in time series analysis is distinguishing:

Some movements may be predictable.

Others may simply reflect randomness.

This question appears repeatedly throughout time series analysis.


3.19 Simulating Random Data in Python

We now simulate random observations from a normal distribution.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

x = np.random.normal(size=1000)

plt.hist(x, bins=30)

plt.title("Simulated Normal Data")

plt.savefig("figs/ch3/normal.png", dpi=300, bbox_inches="tight")
plt.close()   # replace with plt.show()
Normal

This is an important lesson in time series analysis.


3.20 Simulating Fat Tails

We now compare the normal distribution with a t distribution.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

normal = np.random.normal(size=10000)

t_dist = np.random.standard_t(df=3, size=10000)

plt.hist(normal, bins=60, alpha=0.5, density=True, label="Normal")

plt.hist(t_dist, bins=60, alpha=0.5, density=True, label="t(3)")

plt.legend()

plt.title("Normal vs Fat-Tailed t Distribution")

plt.savefig("figs/ch3/fattail.png", dpi=300, bbox_inches="tight")
plt.close()   # replace with plt.show()
Fat Tail

This helps explain why financial returns often appear more volatile than standard textbook models predict.


3.21 Gretl Example: Descriptive Statistics

Gretl can quickly compute descriptive statistics.


Step 1: Open Data

Load a dataset.


Step 2: Summary Statistics

Menu:

View → Summary statistics

GRETL reports:


[GRETL Screenshot Placeholder: Summary statistics window]


3.22 Common Mistakes


3.23 Looking Ahead

This chapter reviewed several statistical ideas that will appear repeatedly throughout the book.

We now begin studying how to visualize and interpret time series data directly.

The next chapter introduces:

Key Takeaways

Concept Check

Basic

  1. What is a probability distribution?

  2. What does the mean of a distribution represent?

  3. What does the standard deviation measure?


Intuition

  1. Why is randomness unavoidable in economic and financial data?

  2. What does it mean for an event to be “unlikely”?

  3. Why do financial returns often exhibit more extreme outcomes than the normal distribution predicts?


Intermediate

  1. What is the 68–95–99.7 rule?

  2. Why is the normal distribution often used as a benchmark in finance?

  3. What is a p-value?

  4. Why does a small p-value provide evidence against a null hypothesis?


Challenge

  1. A model assumes returns are normally distributed.


Interpretation & Practice

  1. A histogram of returns appears roughly bell-shaped but shows a few very large spikes.

    • What does this suggest about the distribution?

    • Why might the normal model be misleading here?


  1. Suppose returns are centered around zero but fluctuate widely.

    • What does this imply about risk?

    • Why might investors still be concerned?


  1. A risk model predicts that extreme losses are “very unlikely.”

    • What assumption is likely being made?

    • Why might this be dangerous in practice?


  1. A hypothesis test produces:

    p-value = 0.04

    • What is the decision at the 5% level?

    • What does this imply about the null hypothesis?


  1. A hypothesis test produces:

    p-value = 0.25

    • What does this imply?

    • Why does this NOT prove the null hypothesis is true?


Challenge

  1. Suppose a financial model consistently underpredicts large losses.

    • What feature of the data is the model likely missing?

    • Why is this especially problematic in finance?


Numerical Practice

Basic

  1. Suppose a variable has:

    mean = 100
    standard deviation = 15

    • Using the 68–95–99.7 rule, what range contains about 95% of observations?


  1. Using the same distribution:

    • Is a value of 130 likely or unlikely?

    • Roughly what percentage of observations exceed 130?


Intermediate

  1. Suppose returns follow a normal distribution with:

    mean = 0
    standard deviation = 2

    • What range contains about 68% of returns?

    • What range contains about 95% of returns?


  1. Two assets have:

    AssetMean ReturnStandard Deviation
    A2%5%
    B3%5%
    • Which asset would a risk-neutral investor prefer?

    • Why?


  1. Two assets have:

    AssetMean ReturnStandard Deviation
    C3%4%
    D3%8%
    • Which asset is riskier?

    • Why might an investor prefer Asset C?


Hypothesis Testing

6. Testing Average Return

A trader claims that a strategy generates a positive average daily return.

You collect a sample of 25 daily returns with:


t=rˉ0s/nt = \frac{\bar{r} - 0}{s / \sqrt{n}}

7. Testing for Zero Mean

An analyst believes that a stock’s average return is zero.

You observe:



8. Volatility Change (Interpretation Focus)

Suppose two periods of returns have:

PeriodMean ReturnStandard Deviation
A0.5%1%
B0.5%3%


Challenge
  1. A risk model assumes returns are normally distributed with: