Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lesson 2 — Data Literacy & Exploratory Data Analysis (EDA)

1.0 Why this matters (motivation)

Most mistakes in analytics happen before modeling:

EDA is your “first conversation” with the data.


2.0 The EDA mindset: what questions are we asking?

2.1 A simple EDA checklist (what we do every time)

  1. Preview: rows, columns, data types

  2. Validate: missing values, duplicates, impossible values

  3. Summarize: distributions (center + spread)

  4. Compare: by group (category, time, segment)

  5. Visualize: choose the right chart for the question

  6. Write: 3–5 plain-language insights + 1 caveat

This EDA checklist forms part of a reproducible analytical workflow that we will use repeatedly throughout the course.


3.0 Data literacy essentials (types and pitfalls)

3.1 Data types that matter

3.2 Missing values: what they might mean

Missingness can be:


4.0 Descriptive statistics that students actually use

4.1 Core summary numbers

4.2 What to report (rule of thumb)


5.0 Visualization as a decision tool (not decoration)

5.1 Quick chart chooser (practical)


6.0 Mini case: “Sales and marketing” dataset (EDA story)

Question: “Which product categories grew, and is growth associated with marketing spend?”

EDA steps:

  1. Check data types (date as datetime, spend as numeric)

  2. Summarize missing values (overall + by category)

  3. Plot distribution of sales (is it skewed?)

  4. Plot sales over time (overall + by category)

  5. Scatter: sales vs marketing (colored by category)

  6. Write 3 insights + 1 caveat


7.0 Mini-lab (Google Colab)

In-class tasks (checkpoints)

  1. Print data types and basic shape

  2. Create a missingness summary table

  3. Produce: (i) histogram/boxplot, (ii) line chart, (iii) bar chart by category

  4. Submit 3 insights + 1 caveat in markdown inside the notebook

Submission


8.0 AI check (responsible use for EDA)

Good prompt examples

Bad prompt example

AI can help generate EDA ideas and code scaffolding, but interpretation and verification remain human responsibilities.


9.0 Review questions (quiz / reflection)

  1. When would you report median and IQR instead of mean and SD?

  2. Give one example of systematic missingness in a business dataset.

  3. Which chart would you choose to show: (i) trend, (ii) distribution, (iii) group comparison—and why?

Kindly submit Reflection prompts for Lesson 1 and Lesson 2 here (before next class). https://docs.google.com/forms/d/e/1FAIpQLSfsq2ln4ru9G1Mtt8Huezzl52RbNw0SFnXyeMj9g2wuudt8UQ/viewform?usp=publish-editor

In the next lesson, we will begin organizing prompts, reflections, and workflows into a lightweight “Research OS” for managing analytical projects.