Lesson 6 — Simple & Multiple Regression (interpretation for business/economics)
Why this matters (motivation)¶
Regression is one of the most widely used tools in business and economics because it helps quantify relationships:
How is sales associated with marketing spend?
Are higher prices associated with lower demand?
Which factors are correlated with churn or customer satisfaction?
But regression is also easy to misuse, especially when people confuse association with causation.
The regression mindset: from question to model¶
Today’s scope¶
We focus on:
fitting regressions,
interpreting outputs,
and communicating limitations.
We will not claim causality unless the design supports it.
Simple regression (one predictor)¶
Example¶
= monthly sales
= marketing spend
Interpretation: if , then an extra unit of spend is associated with +2.5 units of sales (on average), in this dataset.
Multiple regression (controls)¶
Why add controls?¶
To reduce confounding (informally):
If both marketing spend and season affect sales, a simple regression might mix the two effects.
Example:
= sales
= marketing spend
= season indicator (e.g., Q4 = 1)
= price
Now is the association between sales and marketing, after accounting for season and price.
How to read regression output (what we focus on)¶
1) Coefficients¶
sign: positive/negative association
magnitude: “how much change in per unit of ”
units matter (e.g., dollars vs thousands)
2) Uncertainty (standard errors / confidence intervals)¶
We care about whether the estimate is precise enough to be actionable.
3) Fit (R-squared)¶
R² tells you how much variation in the model explains in-sample.
A high R² does not guarantee good decision-making, and a low R² does not mean the model is useless.
Regression and causality (a careful note)¶
Regression describes a relationship. Causal interpretation requires stronger assumptions/design.
Common threats:
Omitted variables: a third factor affects both X and Y
Reverse causality: Y influences X (e.g., firms spend more on marketing when sales rise)
Measurement error: X or Y is noisy or mismeasured
Selection bias: data is not representative
Mini case: sales, marketing, and seasonality¶
Question: Is marketing spend associated with sales after accounting for seasonality and price?
Steps:
EDA reminder: plot sales over time and by category/region
Fit simple regression: sales ~ marketing
Fit multiple regression: sales ~ marketing + price + season + category/region dummies (as appropriate)
Compare coefficient on marketing and discuss how the interpretation changes
Write a short manager memo with one caveat
Mini-lab (Google Colab)¶
In-class checkpoints
Fit a simple regression and interpret in units.
Fit a multiple regression that adds at least 2 controls.
Compare the coefficient on your main variable (e.g., marketing) across the two models:
Did it change sign or magnitude?
Why might that happen?
Create one diagnostic plot (residuals vs fitted OR histogram of residuals).
Write a short “manager memo” (5–7 lines) in the notebook.
Submission (after class)
Share the Colab link (view permission) or export to PDF.
Include a short memo + one caveat as Markdown.
AI check (responsible use for regression)¶
Good prompt examples
“Generate statsmodels code to run OLS of sales on marketing, price, and a season dummy, with a clear summary output.”
“How should I interpret the coefficient on marketing when price and season are included?”
“Suggest one diagnostic plot and explain what it checks.”
Bad prompt example
“Write my conclusion that marketing causes sales to rise” (without a causal design)
Review questions (quiz / reflection)¶
What changes about the interpretation of when you move from simple to multiple regression?
Give one example of an omitted variable that could bias your regression.
Why is “high R²” not the same as “good model”?