Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lesson 7 — Data Visualization for Communication

(chart choice, storytelling, interactive plots, and honest interpretation)

Why this matters

In business and economics, most decisions are made from:

Even a correct model can be ignored if the story is unclear.
And a misleading chart can create confidence in the wrong conclusion.


Today’s running example: Gapminder-style visualization

Today we use three Gapminder data files:

These files are in wide format: one row per country, with years stored as columns such as 1800, 1801, 1802, and so on.

We will reshape them into long format, merge them into a country-year panel, and create a Gapminder-style visualizer.


The visualization mindset: “What question does this answer?”

A practical checklist before plotting

  1. Unit of observation: country-year? customer? firm? transaction?

  2. Aggregation rule: sum, mean, median, rate, ratio?

  3. Comparability: are groups measured in the same way?

  4. Missingness: are you hiding missing values?

  5. Scale: should the axis be raw or log?

  6. Uncertainty: do you need sample sizes, bands, caveats, or confidence intervals?


Chart chooser: matching chart to purpose

Today’s chart types

In the Gapminder lab, we will create:

  1. Static scatter plot

    • Question: Are richer countries associated with longer life expectancy?

    • Example: life expectancy vs log GDP per capita in one year.

  2. Animated bubble chart

    • Question: How have income, health, and population changed over time?

    • Example: Gapminder-style animation from 1800 onward.

  3. Line charts for selected countries

    • Question: How did Japan, Thailand, and the United States evolve over time?

    • Example: life expectancy and GDP per capita over time.


The Gapminder-style chart

A Gapminder-style chart typically uses:

This is powerful because it combines:


Why use a log scale for GDP per capita?

GDP per capita is highly skewed. A few countries have very high income levels, while many others are clustered at lower income levels.

If we use raw GDP per capita on the x-axis, many countries may be squeezed into the left side of the graph.

Using a log scale makes the pattern easier to see.


Honest charts: common ways visualizations mislead

1. Misleading axes

A chart can exaggerate or hide differences depending on axis choices.

2. Aggregation traps

Averages can hide important variation.

For example:

3. Cherry-picking time windows

Trends can look very different depending on start and end dates.

A country’s GDP per capita may look impressive from 2000 onward, but less so if we start from 1980.

4. Too much clutter

A chart with too many lines or labels can become unreadable.

Good options:

5. Causality by implication

A chart can quietly suggest a causal story even when it only shows association.

For example, a scatter plot of GDP per capita and life expectancy does not prove that income alone causes longer life. Health systems, education, public policy, conflict, and many other factors matter.


Building a data story

Caption template

A useful chart caption includes:

Example:

“Countries with higher GDP per capita tend to have higher life expectancy, especially after 1950. However, the relationship is not perfectly linear, and the chart shows association rather than causation.”


Mini-lab (Google Colab)

In-class checkpoints

A. Load and inspect

  1. Load the three source files:

    • lex.csv

    • gdp_pcap.csv

    • pop.csv

  2. Run basic checks:

    • shape,

    • column names,

    • missingness,

    • year range.

B. Reshape and merge 3. Convert each dataset from wide to long format. 4. Keep both:

  1. Merge into one country-year panel.

  2. Rename name to country.

C. Static visualization 7. Choose one year, such as 2019. 8. Create a scatter plot:

  1. Write a short interpretation of the pattern.

D. Gapminder-style visualizer 10. Create an animated bubble chart: - x-axis: GDP per capita, - y-axis: life expectancy, - bubble size: population, - animation: year, - hover label: country. 11. Use a log scale for GDP per capita. 12. Set a sensible y-axis range, such as 0 to 100 years.

E. Story charts 13. Choose 3–5 countries. 14. Produce: - life expectancy over time, - GDP per capita over time. 15. Write a mini data story: - one headline, - 2–3 evidence bullets, - one caveat.


Useful Python patterns