Lesson 7 — Data Visualization for Communication

(chart choice, storytelling, interactive plots, and honest interpretation)

Why this matters¶

In business and economics, most decisions are made from:

charts in slide decks,
dashboards,
short memos with figures,
interactive visuals that allow users to explore patterns.

Even a correct model can be ignored if the story is unclear.
And a misleading chart can create confidence in the wrong conclusion.

Today’s running example: Gapminder-style visualization¶

Today we use three Gapminder data files:

lex.csv — life expectancy
gdp_pcap.csv — GDP per capita / income per person
pop.csv — population

These files are in wide format: one row per country, with years stored as columns such as 1800, 1801, 1802, and so on.

We will reshape them into long format, merge them into a country-year panel, and create a Gapminder-style visualizer.

The visualization mindset: “What question does this answer?”¶

A practical checklist before plotting¶

Unit of observation: country-year? customer? firm? transaction?
Aggregation rule: sum, mean, median, rate, ratio?
Comparability: are groups measured in the same way?
Missingness: are you hiding missing values?
Scale: should the axis be raw or log?
Uncertainty: do you need sample sizes, bands, caveats, or confidence intervals?

Chart chooser: matching chart to purpose¶

Today’s chart types¶

In the Gapminder lab, we will create:

Static scatter plot
- Question: Are richer countries associated with longer life expectancy?
- Example: life expectancy vs log GDP per capita in one year.
Animated bubble chart
- Question: How have income, health, and population changed over time?
- Example: Gapminder-style animation from 1800 onward.
Line charts for selected countries
- Question: How did Japan, Thailand, and the United States evolve over time?
- Example: life expectancy and GDP per capita over time.

The Gapminder-style chart¶

A Gapminder-style chart typically uses:

x-axis: GDP per capita / income per person
y-axis: life expectancy
bubble size: population
animation: year
hover label: country

This is powerful because it combines:

relationship: income and life expectancy,
trend: changes over time,
comparison: countries moving differently,
scale: population size.

Why use a log scale for GDP per capita?¶

GDP per capita is highly skewed. A few countries have very high income levels, while many others are clustered at lower income levels.

If we use raw GDP per capita on the x-axis, many countries may be squeezed into the left side of the graph.

Using a log scale makes the pattern easier to see.

Honest charts: common ways visualizations mislead¶

1. Misleading axes¶

A chart can exaggerate or hide differences depending on axis choices.

2. Aggregation traps¶

Averages can hide important variation.

For example:

global life expectancy may rise,
but some countries may stagnate or fall behind.

3. Cherry-picking time windows¶

Trends can look very different depending on start and end dates.

A country’s GDP per capita may look impressive from 2000 onward, but less so if we start from 1980.

4. Too much clutter¶

A chart with too many lines or labels can become unreadable.

Good options:

show 3–5 selected countries,
use interaction,
create separate panels,
or use a table only when exact values matter.

5. Causality by implication¶

A chart can quietly suggest a causal story even when it only shows association.

For example, a scatter plot of GDP per capita and life expectancy does not prove that income alone causes longer life. Health systems, education, public policy, conflict, and many other factors matter.

Building a data story¶

Caption template¶

A useful chart caption includes:

What: what the chart shows
Pattern: what stands out
Meaning: why it matters
Caveat: what we should be careful about

Example:

“Countries with higher GDP per capita tend to have higher life expectancy, especially after 1950. However, the relationship is not perfectly linear, and the chart shows association rather than causation.”

Mini-lab (Google Colab)¶

In-class checkpoints¶

A. Load and inspect

Load the three source files:
- lex.csv
- gdp_pcap.csv
- pop.csv
Run basic checks:
- shape,
- column names,
- missingness,
- year range.

B. Reshape and merge 3. Convert each dataset from wide to long format. 4. Keep both:

geo = country code,
name = country name.

Merge into one country-year panel.
Rename name to country.

C. Static visualization 7. Choose one year, such as 2019. 8. Create a scatter plot:

x-axis: log GDP per capita,
y-axis: life expectancy.

Write a short interpretation of the pattern.

D. Gapminder-style visualizer 10. Create an animated bubble chart: - x-axis: GDP per capita, - y-axis: life expectancy, - bubble size: population, - animation: year, - hover label: country. 11. Use a log scale for GDP per capita. 12. Set a sensible y-axis range, such as 0 to 100 years.

E. Story charts 13. Choose 3–5 countries. 14. Produce: - life expectancy over time, - GDP per capita over time. 15. Write a mini data story: - one headline, - 2–3 evidence bullets, - one caveat.

Useful Python patterns¶

Python patterns you will reuse

Useful snippets from today’s notebook:

# Find country names
country_names = sorted(panel["country"].dropna().unique())

# Search for country names
panel[panel["country"].str.contains("United", case=False, na=False)]["country"].drop_duplicates()

# Select countries
countries = ["Japan", "Thailand", "United States of America"]
sub = panel[panel["country"].isin(countries)].copy()

# Filter years
panel_1800_2025 = panel[(panel["year"] >= 1800) & (panel["year"] <= 2025)].copy()