Lesson 9 — Decision Trees + Text-as-Data (NLP) + Ethical Implications
(model reasoning, evaluation, bias risks, and careful interpretation)
Why this matters (motivation)¶
Decision trees and text analytics are both popular in real-world analytics:
Trees provide “explainable” rules used in credit, marketing, and operations.
Text analytics summarizes customer feedback, reviews, and news—often the fastest signal available.
But both can go wrong:
Trees can overfit and encode proxy bias.
Sentiment tools can misread context, sarcasm, and domain-specific language.
Part A — Decision trees (intuition-first)¶
What is a decision tree?¶
A decision tree is a flowchart-like model that repeatedly splits the data into smaller groups to make predictions.
Trees are supervised learning¶
Classification tree: predicts a category (e.g., churn yes/no)
Regression tree: predicts a number (e.g., spending next month)
We focus mainly on classification today, because it fits many business decisions.
How trees choose splits (high-level)¶
Trees try many candidate split points and choose the one that improves “purity” of outcomes.
For classification: reduce impurity (e.g., Gini / entropy)
For regression: reduce within-leaf error (e.g., squared error)
Overfitting (why trees are fragile)¶
Trees can keep splitting until they effectively memorize the training data.
Symptoms:
high training accuracy
much lower test accuracy
many small leaves and complex rules
Controls (practical knobs):
max_depthmin_samples_leafmin_samples_splitpruning (conceptual)
Evaluation: what we emphasize in this course¶
Confusion matrix and error costs¶
For binary classification (e.g., churn yes/no):
True positive (TP): correctly flagged churners
False positive (FP): flagged but not a churner
False negative (FN): missed churner
True negative (TN): correctly not flagged
In business, FP vs FN have different costs:
FP → wasted retention offer / unnecessary friction
FN → lost customer / lost revenue
Minimal metric set (recommended)¶
Accuracy (as a baseline only)
Precision and recall (or F1)
ROC-AUC (optional if you introduce it later)
We will keep this light and interpretable.
Interpretation: reading tree rules as business logic¶
One advantage of trees is the ability to explain decision paths:
“If A and B then predicted churn”
“If C then predicted no churn”
But interpretation must be cautious:
a rule can reflect confounding, not causality
a rule can be unstable if it depends on small leaves
Part B — Text as data (NLP) (practical and cautious)¶
Why text matters in business/economics¶
Text appears everywhere:
customer reviews and open-ended survey responses
call-center notes, chat logs
news articles and corporate reports
policy documents
Text is rich but messy. We need a simple workflow to extract signals.
Today’s focus: sentiment analysis (simple signal, not truth)¶
Sentiment tools label text as positive/negative/neutral.
Useful for:
quick monitoring and coarse trends
comparing product categories (with validation)
identifying highly negative feedback to review manually
Weak for:
sarcasm (“great, it broke again”)
domain-specific language (“volatile” can be neutral in finance)
mixed sentiment in one text
multilingual or non-native writing (often common in international business settings)
Part C — Ethical implications (trees + text)¶
Ethical risk is not an “extra topic”—it is embedded in the pipeline.
Where bias enters (structured data)¶
Target label (what is “good/bad”?)
Features (proxy variables for sensitive attributes)
Sampling (who is represented?)
Measurement (errors differ across groups)
Objective (optimize accuracy only → harms minority groups)
Examples:
credit: zip code can proxy socio-economic status
hiring: “career gap” can proxy caregiving responsibilities
customer scoring: region/language can proxy demographics
Where bias enters (text)¶
training data biases (what language/register is “normal”?)
stereotypes encoded in word usage
sentiment misreads certain dialects or non-native writing
moderation/selection bias (who leaves reviews?)
A simple “responsible presentation” checklist (course version)¶
When presenting model results, include:
What the model is for (decision context)
Performance summary (and what errors mean)
At least one limitation (data, measurement, generalization)
At least one fairness/ethics caveat (proxy risk, group disparity risk, monitoring need)
A statement about human oversight (not fully automated decisions in high-stakes contexts)
Mini case 1: churn prediction with a tree (rules + evaluation)¶
Question: “Can we identify at-risk customers using simple, explainable rules?”
Workflow:
Choose outcome (churn yes/no)
Fit baseline tree
Evaluate on test set and interpret errors
Extract 2–3 decision rules and assess plausibility
Identify proxy risk and propose monitoring
Mini case 2: review sentiment (text signal + error analysis)¶
Question: “What complaints are most negative, and what are we missing?”
Workflow:
Compute sentiment scores for reviews
Compare distributions by product category
Inspect examples:
top 5 most negative
5 that look wrongly labeled
Write one limitation and propose a validation step
Mini-lab (Google Colab)¶
In-class checkpoints (Decision tree)¶
Choose a classification outcome (e.g., churn yes/no) and 5–10 predictors.
Split data into train/test (and optionally a validation split).
Train a baseline decision tree.
Control complexity:
try at least two settings (e.g.,
max_depthormin_samples_leaf)compare train vs test results
Report:
confusion matrix
precision/recall (or F1)
Interpret:
extract at least two decision paths or rules
explain them in plain language (“If … then …”)
Short reflection:
what kind of mistake is more costly here (FP or FN)? why?
In-class checkpoints (Text-as-data / sentiment)¶
Load a small dataset of text reviews (provided).
Preprocess text (light):
lowercase
remove obvious punctuation
Compute sentiment:
score each text
compare average sentiment by product/category
Error analysis:
inspect at least 5 examples where the sentiment seems wrong
write a short explanation (sarcasm? domain terms? mixed sentiment?)
Optional extension:
create a simple “negative review” flag and count by category/time.
In-class checkpoints (Ethics)¶
Identify at least one potential proxy feature risk in your tree model.
Identify at least one bias risk in your text workflow (language, sampling, domain mismatch).
Write a short “responsible use note”:
how you would monitor model performance and harms in practice.
Submission (after class)
Colab link (view permission) or PDF export.
Include:
metrics + 2 rules + 1 limitation (tree)
1 visualization + 5 inspected examples + 1 limitation (sentiment)
a brief ethics/fairness note
prompt/workflow log if AI tools were used
AI check (responsible use)¶
Good prompt examples
“Write sklearn code to train a decision tree with max_depth=3 and print a confusion matrix.”
“How do I interpret precision and recall in a churn context with costly false negatives?”
“Suggest a checklist for validating sentiment analysis outputs on product reviews.”
“List potential proxy variables for sensitive attributes in churn/credit contexts.”
Bad prompt examples
“Prove my model is fair.”
“Write a persuasive story that the model should be deployed immediately.”
“Summarize the reviews and tell me what customers ‘really think’ without showing examples.”
Review questions (quiz / reflection)¶
Why do decision trees often overfit, and what are two ways to reduce overfitting?
Why can accuracy be misleading in imbalanced classification problems?
Give two reasons sentiment analysis might fail on business/econ text.
Name one point in the pipeline where bias can enter, and one monitoring step you would propose.