Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lesson 6 — Simple & Multiple Regression (interpretation for business/economics)

Why this matters (motivation)

Regression is one of the most widely used tools in business and economics because it helps quantify relationships:

But regression is also easy to misuse, especially when people confuse association with causation.


The regression mindset: from question to model

Today’s scope

We focus on:

We will not claim causality unless the design supports it.


Simple regression (one predictor)

Example

YY = monthly sales
XX = marketing spend

Interpretation: if β1=2.5\beta_1 = 2.5, then an extra unit of spend is associated with +2.5 units of sales (on average), in this dataset.


Multiple regression (controls)

Why add controls?

To reduce confounding (informally):

Example:

Now β1\beta_1 is the association between sales and marketing, after accounting for season and price.


How to read regression output (what we focus on)

1) Coefficients

2) Uncertainty (standard errors / confidence intervals)

We care about whether the estimate is precise enough to be actionable.

3) Fit (R-squared)


Regression and causality (a careful note)

Regression describes a relationship. Causal interpretation requires stronger assumptions/design.

Common threats:


Mini case: sales, marketing, and seasonality

Question: Is marketing spend associated with sales after accounting for seasonality and price?

Steps:

  1. EDA reminder: plot sales over time and by category/region

  2. Fit simple regression: sales ~ marketing

  3. Fit multiple regression: sales ~ marketing + price + season + category/region dummies (as appropriate)

  4. Compare coefficient on marketing and discuss how the interpretation changes

  5. Write a short manager memo with one caveat


Mini-lab (Google Colab)

In-class checkpoints

  1. Fit a simple regression and interpret β1\beta_1 in units.

  2. Fit a multiple regression that adds at least 2 controls.

  3. Compare the coefficient on your main variable (e.g., marketing) across the two models:

    • Did it change sign or magnitude?

    • Why might that happen?

  4. Create one diagnostic plot (residuals vs fitted OR histogram of residuals).

  5. Write a short “manager memo” (5–7 lines) in the notebook.

Submission (after class)


AI check (responsible use for regression)

Good prompt examples

Bad prompt example


Review questions (quiz / reflection)

  1. What changes about the interpretation of β1\beta_1 when you move from simple to multiple regression?

  2. Give one example of an omitted variable that could bias your regression.

  3. Why is “high R²” not the same as “good model”?