Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lesson 6 — Correlation, Simple & Multiple Regression (interpretation for business/economics)

Why this matters

Regression is one of the most widely used tools in business and economics because it helps quantify relationships:

But regression is easy to misuse—especially when we confuse association with causation.


Where regression sits in the AI/ML/DS map


The regression mindset: from question to model

Today’s running example (new dataset)

We use:


Step 0: Start with a picture (scatter plot)

Before any equation, draw the relationship.


Step 1: Correlation (a warm-up)

Correlation is a standardized measure of linear association.


Step 2: Simple regression (one predictor)

A simple regression estimates a line:

Y=β0+β1X+εY = \beta_0 + \beta_1 X + \varepsilon

Our example (simple regression)

Interpretation template:

“A one-unit increase in income_pc (i.e., 1,0001{,}000 dollars of GDP per capita) is associated with a β1\beta_1 change in d_rate, on average, in this sample.”


Step 3: Multiple regression (adding a control)

Multiple regression adds additional predictors:

Y=β0+β1X1+β2X2+εY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \varepsilon

Now β1\beta_1 is interpreted as the association between YY and X1X_1 holding other included variables constant.

Why add controls?

To reduce confounding (informally):

Today we add:


How to read regression output (what we focus on)

  1. Coefficients

  1. Uncertainty (standard errors / confidence intervals)

  1. Fit (R-squared)


Regression and causality (a careful note)

Regression describes a relationship. Causal interpretation requires stronger assumptions/design.

Common threats:


Mini-lab (Google Colab)

In-class checkpoints

  1. Load divorce_raw.csv and filter to year = 2019.

  2. Create:

    • income_pc = gdp_pc / 1000

    • d_rate = divorce_rate / marriage_rate * 1000

  3. Make a scatter plot: d_rate vs income_pc.

  4. Compute correlation between d_rate and income_pc.

  5. Fit simple regression: d_rate ~ income_pc and interpret β1\beta_1.

  6. Fit multiple regression: d_rate ~ income_pc + gdp_gr and interpret how β1\beta_1 changes (if it does).

  7. Make one diagnostic plot (residuals vs fitted OR histogram of residuals).

  8. Write a short “manager memo” (5–7 lines): headline + evidence + caveat.

Submission (after class)


AI check (responsible use)