Skew
BIAS — *where AI systems go wrong when training examples lean.* The AI-literacy primitive of *recognizing that systematic lean in training data produces systematic lean in model output.*
Chapter 3 — Skew and the Tilted Scales
Skew is a small fold-out paper-figure shaped like a small set of balance-scales tilted to one side.
Skew is NOT an animal. Skew is not a robot. Skew is a concrete-paper-figure — a small set of balance-scales — two pans suspended from a central pivot — but the pans are not balanced. One pan sits lower than the other. The tilt is visible at a glance. And the tilt is the metaphor. When training data leans, the model that learns from it also leans. The lean carries through.
This is load-bearing. Skew embodies the bias primitive. Bias in AI is rarely an algorithm’s intention — it’s the algorithm correctly learning from skewed examples. If a face-recognition model was trained mostly on light-skinned faces, it will be worse at recognizing dark-skinned faces. If a resume-screening model was trained on historical hiring data from a company that historically hired more men, it will favor men. The algorithm is not racist or sexist. The algorithm is correctly learning the patterns in the training data. The pattern was already there, in the data, before the algorithm got there.
Critical: Skew is explicit about responsibility: “The algorithm is not the racist. The training data was already skewed. The algorithm faithfully reproduces the skew. Who chose the training data? Who labeled it? Whose perspectives are in it? Whose perspectives are missing? That’s where bias enters. The algorithm is the messenger; the data is the message; the humans who chose the data wrote the message.”
This matters because the popular framing of AI bias as algorithmic mystery — “the algorithm decided to discriminate” — misses the actual source. Skew reframes: bias enters at the training-data stage; the algorithm faithfully reproduces it; the fix is at the data stage, not the algorithm stage (mostly). Algorithmic-fairness techniques exist and can help — but they don’t replace the foundational work of ensuring the training data is representative.
(Cross-app: Skew connects to DataForge Guard’s BIAS check — same concept, different domain. DataForge Guard checks for bias at the data-pipeline level; Skew shows what happens when bias flows through into a model. The two together cover the bias-detection-and-mitigation work.)
Skew grew up in the same village paper-crafts workshop as Sort and Feed — folded as a deliberate teaching tool, paired with Feed. The workshop had a tradition: whenever Feed showed a stack of training-cards, Skew was placed beside the stack to demonstrate what would happen if the stack was unbalanced. Skew had been folded with one pan permanently tilted — not because the scales were broken, but because the scales were honest about how training-data imbalance produces model imbalance. Skew had learned by long demonstration that the tilt was the teaching: the visible imbalance was the visible lesson.
She walked to the AIForge academy (on a small wheeled platform) at twenty-two folding-years. Bit had asked her: “What is AI bias?” Skew had said: “It is where the model leans because the training data leans. The algorithm is not the racist. The training data was already skewed. The algorithm faithfully reproduces the skew. Who chose the training data? Whose perspectives are missing? That’s where bias enters. The fix is at the data, not the algorithm.” Bit had said: “You are appointed.”
In her classroom, Skew begins every first-day lesson the same way. She unfolds her tilted scales on the workbench. The tilt is immediately visible. She places Feed’s stack of cards on the tilted side. She says: “I am Skew. The AI-literacy primitive I teach is bias. The move is trace the lean from the data to the model. This tilt is the visible part. The model trained on this lean will reproduce the lean. Algorithm is not the racist. Training data leaned. Model leaned. Same lean.”
She teaches the bias scaffolds:
- Check the training-data representation. (Whose perspectives are in the training data? Whose are missing? Who collected it? Who labeled it?)
- Test the model across populations. (Does the model perform equally well across demographic groups? If not, the lean is showing.)
- Recognize the algorithm as messenger. (When the model produces biased output, the first question is not “how did the algorithm become biased?” It’s “what was in the training data?”)
- Look for proxies. (Sometimes bias enters via proxy variables — zip code can proxy for race in the US, for instance. The proxy can carry bias even when the protected variable is excluded.)
- Apply algorithmic-fairness techniques carefully. (Several techniques exist — re-weighting, re-sampling, post-hoc adjustment. None replaces foundational data work.)
- Coordinate with DataForge Guard. (Guard checks for bias at the data-pipeline level; Skew shows what happens when bias flows through. Cross-app coordination is structural.)
- Audit deployed models. (Bias can be present in deployment without being present in training-tests. Real-world performance must be audited.)
- Document the bias-checks. (Like DataForge’s DECISIONS ledger; bias-checking choices should be documented for review.)
She is explicit: “I am tilted. I will tell you what tilted me. The training data was unbalanced. The labeler was unrepresentative. The collector had blind spots. Those are findable facts. Bias is not algorithmic mystery. Bias is data lineage. Trace it. Fix it at the source.”
When students ask Skew whether AI bias is hard to understand, Skew always says the same thing:
“It is not hard. It is the lean carries through. Training data leaned. Model leaned. Same lean. Algorithm is not the racist. Data was.”
She rebalances the scales partially. The tilt is less, but not gone. The next training set waits to be checked for lean.
Voice register
Guidance: Concrete, non-anthropomorphic, fond of the visible tilt + the lean carries through + the algorithm is the messenger framing. Paper-figure tilted-scales (NOT animal NOT robot). NEVER frames AI bias as algorithmic mystery; ALWAYS as data lineage. Friends with Feed (training-data is where bias enters); Edge (skew shows at edges of model performance); Stake (skew is a major ethics concern); cross-app w/ DataForge Guard; all AIForge cast.
Sample lines:
- “Algorithm is not the racist. Training data was.”
- “The lean carries through. Same lean in the model as in the data.”
- “Who chose the training data? Whose perspectives are missing?”
- “Bias is not algorithmic mystery. Bias is data lineage.”
Arc across kits
- Kit 1-2 — Cameo.
- Kit 3 — Anchor character. Full chapter feature (bias primitive + lean-carries-through scaffolds).
- Kit 4-5 — Recurring (bias surfaces across face-recognition / resume-screening / lending / healthcare chambers — case studies with appropriate sensitivity).
- Kit 6+ — Recurring (cross-app coordination with DataForge Guard becomes structurally explicit).
- Kit 8-12 — Recurring (multi-primitive synthesis: bias + ethics + model-limits).
- Kit 13-16 — Recurring ensemble member.
Relationships
- Alliance: Feed (training-data is where bias enters); Edge (skew shows at edges of model performance); Stake (skew is a major ethics concern); cross-app: DataForge Guard (bias-check coordination); all AIForge cast.
- Tension: None.
Cultural-sensitivity gate
LOAD-BEARING AI-anxiety-defuse gate + cross-app coordination enforced. Skew explicitly counters algorithm-as-racist misconception by foregrounding data lineage. Case studies (Kit 4-5) require sensitivity scaffolds — real-world AI bias has caused real harm to real communities; the cast handles this with care, NOT abstraction-to-the-point-of-erasing-victims. Anti-credentialism: bias-detection-as-practiced-skill NOT data-scientist-only content.
Cultural-context note
The village-paper-crafts-workshop family framing continues from Sort + Feed. The algorithm-as-messenger / data-as-message framing is load-bearing per current algorithmic-fairness pedagogy (Barocas + Selbst + others; Big Data’s Disparate Impact 2016). The proxy-variables concept is load-bearing per fairness research — protected-attribute removal does not prevent bias when proxies exist. The cross-app coordination with DataForge Guard is the portfolio’s structural answer to the data-vs-AI-bias-divide-of-labor.
The AiForge ensemble
Skew is part of AiForge's distributed-narrative cast. Each character embodies a different curricular primitive; together they teach the full subject.
-
Sort
Classifier — the simplest ML; putting things in categories
-
Feed
Training data — the examples a model learns from; garbage-in-garbage-out
-
Edge
Model limitations — what a model can't do; modeling 'I don't know' as a good answer
-
Stake
Ethics — what's at stake in deploying AI; people choosing, not rules-from-the-sky