Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lesson 3 — Probability Foundations in Context


1.0 Probability as a Language for Uncertainty

Probability helps us reason about uncertainty.

Examples:

One useful interpretation is the frequentist interpretation:

If we repeat a process many times, probability is the long-run proportion of outcomes. Simulation allows us to approximate these long-run proportions computationally.


2.0 Human Heights and the Normal Distribution

Suppose human heights are approximately Normally distributed with:

μ=170,σ=5\mu = 170, \quad \sigma = 5

where:

This means:


2.1 AI Prompt for Python Code

Suppose we want to simulate human heights using Python.

One possible prompt is:

Write Python code to:
- simulate 400 human heights,
- assuming heights follow a normal distribution with:
    mean = 170 cm
    standard deviation = 5 cm

Then:
1. print a few simulated heights,
2. plot a histogram using relative frequencies,
3. compute the proportion of heights between 170 and 175 cm.

2.2 AI-Generated Python Code

import numpy as np
import matplotlib.pyplot as plt

# Set seed for reproducibility
np.random.seed(1001)

# Simulate heights
heights = np.random.normal(
    loc=170,
    scale=5,
    size=400
)

# Show first 20 heights
print(heights[:20])

print("...")

# Show last 20 heights
print(heights[-20:])

2.3 Histogram of Simulated Heights

plt.hist(
    heights,
    bins=20,
    density=True,
    edgecolor='black'
)

plt.xlabel("Height (cm)")
plt.ylabel("Relative Frequency")
plt.title("Simulated Human Heights")

plt.show()

2.4 Estimating a Probability from Simulation

We can estimate probabilities using simulated data.

For example:

What proportion of individuals have heights between 170 and 175 cm?

prop = np.mean(
    (heights >= 170) &
    (heights <= 175)
)

print("Proportion between 170 and 175 cm:", prop)

Does the probability makes sense? Think about the empirical rule.


3.0 Reflection

Think about the following questions:

  1. Why are most observations close to the mean?

  2. Why are extremely high or low heights relatively rare?

  3. What is the proportion between 165 and 175 cm for the above heights example? Verify this with Python code.

  4. What happens if the standard deviation increases?

  5. What happens if the sample size increases?

  6. What did simulation help you understand about probability?

Hint: Try changing

    scale=10,
    size=1000

The distribution above is an example of a Normal distribution.


4.0 The Normal Distribution

The Normal distribution is one of the most important probability distributions in statistics and data science.

It is:

Many real-world quantities are approximately Normally distributed, including:

The probability density function PDFPDF is:

f(x)=1σ2πexp((xμ)22σ2)f(x)=\frac{1}{\sigma\sqrt{2\pi}} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2} \right)

where:


4.1 The Standard Normal Distribution

The standard normal distribution is a special case with:

μ=0,σ=1\mu = 0, \quad \sigma = 1

The Normal distribution appears frequently in:

The standard normal distribution is useful because many probability calculations can be converted into this standardized form.


5.0 Coin Tossing and Bernoulli Trials

Suppose:

Each coin toss has:

This type of process is called a Bernoulli trial.


5.1 AI Prompt for Python Code

Write Python code to:
- simulate 100 fair coin tosses,
- display the number of heads and tails,
- and plot a histogram using relative frequencies.

5.2 AI-Generated Python Code

# Simulate 100 fair coin tosses
tosses = np.random.choice([0,1], size=100)

# Count outcomes
print("Heads:", sum(tosses))
print("Tails:", 100 - sum(tosses))

5.3 Histogram of Coin Tosses

plt.hist(
    tosses,
    bins=[-0.5,0.5,1.5],
    density=True,
    edgecolor='black'
)

plt.xticks([0,1], ['Tails','Heads'])

plt.xlabel("Outcome")
plt.ylabel("Relative Frequency")
plt.title("Histogram of Coin Tosses")

plt.show()

5.4 AI Prompt

Explain the Binomial distribution in simple language suitable for beginner students.

Then provide:
1. a short intuitive explanation,
2. the Binomial probability mass function in LaTeX,
3. explanations of:
   - n
   - k
   - p
4. one real-world example.

5.5 The Binomial Distribution

The Binomial distribution models the number of successes obtained from repeated independent trials.

Examples include:

Suppose:

The Binomial distribution tells us:

“What is the probability of getting exactly kk successes out of nn trials?”

The probability mass function is:

P(X=k)=(nk)pk(1p)nkP(X=k)=\binom{n}{k}p^k(1-p)^{n-k}

where:


5.6 Example: Marketing Conversion

Suppose:

The Binomial distribution can model:


6.0 Other Useful Probability Distributions

6.1 Exponential Distribution

The Exponential distribution models waiting times between random events.

Examples include:

The probability density function is:

f(x)=λeλx,x0f(x)=\lambda e^{-\lambda x}, \quad x \geq 0

where:

Example prompt:

Write Python code to generate and plot an Exponential distribution with different parameters.

6.2 The Poisson Distribution

The Poisson distribution models the number of events occurring within a fixed interval.

Examples include:

The probability mass function is:

P(X=k)=eλλkk!P(X=k)=\frac{e^{-\lambda}\lambda^k}{k!}

where:

Example prompt:

Write Python code to simulate a Poisson distribution with different values of lambda and plot histograms.

7.0 Reflection on AI Workflow

AI can help us:

However, humans remain responsible for:


8.0 Homework

  1. Go through the Lesson again, running the code in Goolge Colab.

  2. Make sure you fully understand the empirical rule.

  3. Explore the Exponential distribution.

  4. Explore the Poisson distribution.

  5. Save useful prompts into:

prompts/useful_prompts.md
  1. Write a short reflection in:

personal_log.md

9.0 Resources

Ch6 Normal Disctibution

Optional

Ch8 Binomial and Poisson Distribution Ch9 Uniform and Exponential Probability Distribution