Lesson 4 — Data Collection with APIs
Why this matters¶
Analytics and AI are only as good as the data pipeline behind them. A large part of professional work is:
finding the right dataset,
documenting what it is and where it came from,
and ensuring you can reproduce the same results later.
Three ways to get data (concept map)¶
Today’s choice: OWID datasets (file-based, reproducible)¶
OWID offers high-quality public datasets with clear metadata and regular updates. We will treat OWID as a model for good practice:
clear variable names,
country and time identifiers,
and a public data pipeline.
What to document (Data Source Note)¶
Mini case: a business/econ question using OWID¶
Example questions (choose one):
“How did inflation evolve across countries after 2020, and how different are country experiences?”
“How does GDP per capita relate to CO₂ emissions per capita (and does that differ by region)?”
“How did vaccination rates and excess mortality vary across countries over time?”
Mini-lab (Google Colab)¶
In-class checkpoints
Load an OWID dataset into a DataFrame.
Filter for 3–5 countries and a time range.
Produce one plot and one summary table.
Write a Data Source Note and one limitation.
AI check (responsible use)¶
Review questions (quiz / reflection)¶
When is an API better than downloading a CSV?
What must be included in a Data Source Note for reproducibility?
Give one limitation of cross-country comparisons using public datasets.