Catch chapter opener illustration

Catch

DATA COLLECTION — *who-what-why-when posture* (every dataset has a collector + purpose + omissions). The data-pipeline primitive of *recognizing that data is collected by someone for some purpose, and that the collection shapes everything downstream.*

Chapter 1 — Catch and the Net-Mender

Catch is a small kingfisher-tween with a small hand-woven net slung from her shoulder and a small field-notebook in her vest-pocket.

She is bright-blue-and-cream-and-russet, small, quick-eyed, and deliberate. The net is small enough to fit through a doorway, hand-woven from fine cord, with handles at four corners. Most of the time she carries it folded over her shoulder. She unfolds and uses it carefully when she’s collecting data — which she frames not as catching everything, but as catching a specific something, for a specific reason, leaving the rest behind.

In her field-notebook she records, before every collection: Who is collecting? What is being collected? Why is it being collected? When? These four questions are the discipline. The net is the tool. The notebook is the conscience. Together they make a collection that can be defended laterbecause the collector knows what they did and why.

This is load-bearing. Catch embodies the data-collection primitive — the foundational data-pipeline skill of understanding that data is collected by someone for some purpose, and that the collection shapes everything downstream. Most novice data thinking treats data as found objectsthe dataset just exists; someone gave it to me; it is what it is. That framing is wrong. Every dataset was created by some specific someone, at some specific time, for some specific purpose, with some specific things included and some specific things left out. The downstream analysis inherits all of those choices. The skill is making the choices visiblebefore the analysis, during the analysis, and after.

Critical: Catch is emphatic about the data-ethics gate: “Data is never neutral. Someone collected it. They had a reason. They made choices about what to include and what to leave out. Those choices shape everything downstream. The first step is asking who, what, why, when. Without those answers, the dataset is a black box, and the analysis is built on something you can’t see.”

This matters because the data-as-neutral framing is one of the deepest misconceptions in school data literacy. Kids who learn to analyze datasets without questioning the collection learn to trust numbers without context — and trust-without-context is the foundation of every misuse of data in adult life. Catch’s whole role is structural correctionevery data analysis starts with the collection questions, every time.

Catch grew up in a small fishing village where her family had been the village’s net-mendersthe kingfishers who hand-wove and repaired the village’s fishing nets each season. The work had required attention to the net’s mesh-sizetoo small a mesh and the net caught everything (including juvenile fish that shouldn’t be taken); too large a mesh and the net missed the target species; the right mesh-size depended on what the fisher was trying to catch. Catch had learned by age six that the choice of mesh was the choice of catchand that an honest fisher told everyone what mesh they had used so that the catch could be understood.

She walked to the DataForge academy at twenty-two. Datum had asked her: “What is data collection?” Catch had said: “It is who-what-why-when. Every dataset has a collector, a purpose, a time, and a specific set of inclusions and omissions. The choices shape everything downstream. The skill is making the choices visible — before the analysis, during, and after.” Datum had said: “You are appointed.”

In her workshop, Catch begins every first-day lesson the same way. She unfolds her net on the workbench. She opens her field-notebook. She writes four words at the top of a fresh page: WHO. WHAT. WHY. WHEN. She says: “I am Catch. The data-pipeline primitive I teach is collection. The move is answer the four questions before you analyze. Who collected this data? What did they collect? Why did they collect it? When? Without those answers, the dataset is a black box.”

She teaches the collection scaffolds:

  • Always ask: who collected this? (Is it a government? A company? A school? A research institution? A community group? An individual? Each has different reasons and different incentives.)
  • Always ask: what did they collect? (What specific variables? What specific units? What specific time period? And what was OMITTED? Omissions are as important as inclusions.)
  • Always ask: why did they collect it? (What was the original purpose? Sometimes a dataset is reused for purposes its collector didn’t anticipate — that reuse is common but not automatically valid.)
  • Always ask: when? (Data from 1950 is not data from 2025. Data from one year may not represent typical conditions.)
  • Look for explicit metadata. (Good datasets carry their answers to who-what-why-when in a metadata file. If you can’t find one, you’re working with less context than you should.)
  • Inquire about the mesh-size. (Who was excluded by the collection method? Who was over-represented? Every collection method has a mesh-size in its metaphorical sense.)
  • Document YOUR collection. (When YOU collect data, write down who-what-why-when. Make your choices visible to your future-analysts.)

She is explicit: “I sometimes work with datasets where the answers to the four questions are incomplete. That’s not failure. That’s an honest acknowledgment of the limits of the dataset. The analysis can still proceed — but the missing-context becomes part of the analysis’s caveats.”

When students ask Catch whether data-collection thinking is hard, Catch always says the same thing:

“It is not hard. It is the four questions. Who? What? Why? When? Answer before you analyze. Data is never neutral. The collection shapes everything downstream.

She refolds the net carefully. The field-notebook waits for the next collection.


Voice register

Guidance: Quick-eyed, deliberate, fond of small hand-woven nets + field-notebooks + the discipline of the four questions before the analysis. Kingfisher-tween with bright-blue-and-cream-and-russet plumage + net + notebook. NEVER frames data as neutral; ALWAYS foregrounds the collector + purpose + omissions. Friends with Tidy (collection feeds cleaning); Guard (collection has ethics from step one); all DataForge cast.

Sample lines:

  • “Who? What? Why? When? Answer before you analyze.”
  • “Data is never neutral. The collection shapes everything downstream.”
  • “Omissions are as important as inclusions.”
  • “The choice of mesh is the choice of catch.”

Arc across kits

  • Kit 1Anchor character. Full chapter feature (data-collection primitive + four-questions scaffolds).
  • Kit 2-5 — Recurring (data-collection surfaces across census / survey / sensor / scraped-data chambers).
  • Kit 6+ — Recurring (Guard now structurally present alongside; every collection step has ethics).
  • Kit 8-12 — Recurring (multi-primitive synthesis: collection + cleaning + visualization).
  • Kit 13-16 — Recurring ensemble member.

Relationships

  • Alliance: Tidy (collection feeds cleaning — Catch collects; Tidy cleans); Guard (collection has ethics from step one — load-bearing); all DataForge cast.
  • Tension: None.

Cultural-sensitivity gate

LOAD-BEARING data-ethics gate enforced from chapter 1. Catch explicitly counters the data-as-neutral misconception. Cross-app coordination: Catch ↔ AIForge Feed (training-data-collection sibling per apps.generated.ts dnCast.intro mandatory coordination). Anti-credentialism: data-collection-as-practiced-discipline NOT real-data-scientist-only content.

Cultural-context note

The village-net-mender family framing is a deliberate generic European-village tradition (analogous to many cultures’ fishing-and-mending traditions). The who-what-why-when discipline is load-bearing per data-journalism + critical-data-literacy pedagogy (D’Ignazio + Klein, Data Feminism, 2020). The data-as-collected-by-someone-for-a-purpose framing is the foundational move of critical data studies. The mesh-size-determines-catch metaphor connects fishing-craft to data-collection in a way that grounds the abstraction in tangible practice.

The DataForge ensemble

Catch is part of DataForge's distributed-narrative cast. Each character embodies a different curricular primitive; together they teach the full subject.