Marcus sets the brief on Monday morning. Off-the-shelf is over — can Sarah train NorthStar's own model on NorthStar's own data? She opens northstar_churn.csv: 10,000 customers, 11 features, one target. By Friday she will hold three new tools: a clean preprocessing pipeline, cross-validated training, and a threshold chosen by the business, not the math.
Real CSVs are messy. Missing cells, mixed scales, free-text categories. Click the four buttons below to walk a raw NorthStar row through the preprocessing it needs before any model sees it.
| tier | avg_spend | tenure_m | reviews | churned |
|---|
The most common silent bug in a supervised week. Mean and standard deviation that include test rows are pretending to know the future. Two orderings, two outcomes.
The scaler used test-row statistics. At deployment, future test rows do not exist yet. The evaluation has quietly cheated — production accuracy will be lower than the notebook says.
The test set is treated as the future. Better still: wrap everything in a Pipeline so this discipline is enforced automatically — including across cross-validation folds.
# The pattern every supervised model in M3 starts with from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.impute import SimpleImputer prep = ColumnTransformer([ ("num", Pipeline([("imp", SimpleImputer(strategy="median")), ("sc", StandardScaler())]), numeric_cols), ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols), ]) model = Pipeline([("prep", prep), ("clf", LogisticRegression())]) model.fit(X_train, y_train) # prep is re-fit per CV fold automatically
A single 80/20 split changes its mind every time you change the random seed. Cross-validation averages across folds so the number on your slide is signal, not coincidence.
Same model, same data, different random_state. Which one do you put on the slide?
A 4.9pp swing between seeds means the test set is too small to trust a single number. One run is a coin flip dressed up as evaluation.
Each row gets to be in validation exactly once. Preprocessing is re-fit per fold — never trained on its own validator.
Tighter spread, more honest. The mean is the headline; the spread tells you how seriously to take it. Pair it with the L02 confidence interval.
On NorthStar's 12% churn rate, a model that predicts "no churn" for everyone scores 88% accuracy — and is useless. The confusion matrix tells the truth. Drag the threshold and watch precision and recall trade places.
Each tick is one customer. Vermillion = will actually churn. Teal = will not. The vertical line is the threshold.
The retention team can call 80 customers a week. A missed churner costs £240 of lifetime value; a wasted call costs £15 of analyst time. The right threshold is the one that fits both.
Capacity caps flag volume. Cost ratio drives where the boundary should sit — cheap mistakes get more headroom, expensive ones drive the threshold up or down.
Skip any of these and you'll either drown the team in low-value flags or quietly burn your held-out evaluation.
How many flagged cases can the team actually act on per week? The threshold's job is to fit inside that capacity — not to maximise a metric.
✕ "F1 is highest at threshold 0.32." ✓ "Retention can call 80/week. Threshold 0.62 flags 78."A missed churner and a wasted call are rarely equally expensive. The cheaper error gets more headroom; the costly one drives the threshold.
✕ "Default 0.5 — balanced, right?" ✓ "FN = £240, FP = £15. Lower the threshold."If you tuned the threshold to maximise F1 on the held-out test set, you've used that data to make a decision — the evaluation is no longer clean.
✕ "Searched 0.0–1.0 on test — best F1 wins." ✓ "Pick from capacity + cost. Then report metrics."Try each mentally first. Click to flip the card. Numbering matches lesson.md.
Pipelines, train / validate splits, precision / recall / F1, threshold choice — the rest of the course swaps the algorithm but keeps the discipline. Hover any tile for "how L03 shows up here".