Priya is reading Sarah's slide deck. The model says 60% of reviews are positive — a clean number on a clean chart. She squints. But are the positive ones actually positive? How do we know we can trust that number? Sarah doesn't have an answer. This lesson is how she gets one: three lenses for honest numbers — distributions, confidence intervals, hypothesis tests.
A mean is a one-line story written by the data. Sometimes the data has a different story to tell. Toggle the shape below — watch the mean (red) and the median (teal) part ways.
z = (value − μ) / σ — here μ (mu) is the population mean and σ (sigma) is the standard deviation — normalises "is this unusual?" across any scale. The working rule: |z| > 3 is an outlier worth checking.
In a normal distribution, only about 0.27% of observations sit further than 3σ from the mean — roughly one day per year of business. Use the bands below as a guide.
| |z| < 1 | Within 1σ — 68% of values live here. Entirely normal. |
| 1 ≤ |z| < 2 | Mild deviation — worth a glance, not an alert. |
| 2 ≤ |z| < 3 | Suspicious — only ~4.5% of values land here. Investigate the source. |
| |z| ≥ 3 | Outlier — <0.27% of values. Worth checking immediately. |
Even when the raw data is messy — skewed, bimodal, or irregular — the distribution of sample means converges to a bell curve as sample size grows. That is why statistics works on real-world data. Draw samples below and watch it happen.
Why does this matter? Once sample means follow a normal distribution, you can calculate standard errors, build confidence intervals, and run hypothesis tests — even when the raw data is far from normal. This is the theoretical foundation that makes the tools in Sections IV and V work.
NorthStar order amounts: most under £100, a long tail to £10,000+. Nothing remotely bell-shaped. We cannot apply normal-distribution formulas to this directly.
After ~30 draws a bell begins to appear. After ~100, it is unmistakable. The CLT is not metaphor — you are watching it. Once the sample means are normal, we can apply standard error and CI formulas to them.
"84% accuracy" is half a fact. "84% (95% CI: 78–89%) on 200 hand-labelled reviews" is the honest version. Slide the sample size below — reducing uncertainty has a price, and that price is roughly: quadruple n to halve the width.
Why? Each additional data point gives the estimate a little more stability. With more observations, the random variation in any one sample matters less, so the range of plausible values narrows. Formally, the margin of error shrinks with the square root of n — which is why doubling n only shrinks the interval by about 30%, not 50%.
Reported accuracy is fixed at 84% for this demo — only sample size changes. The math: half-width ≈ 1.96 × √(p(1−p)/n).
The 95% is about the method, not this one interval. If you computed CIs this way 100 times, ~95 of them would contain the true value. You cannot say this specific interval has a 95% probability.
A p-value answers "could chance produce this difference?" But statistical significance is not practical significance. Slide the two dimensions and read the quadrant. The coupon experiment: H₀ = no effect on 30-day repurchase rate.
With a large enough sample, a 0.01% difference can become "significant." Always report effect size alongside p — and let business decide whether the effect is worth shipping.
Skip any one of these and you have confident-sounding nonsense — the most expensive kind of mistake in business.
If the distribution is skewed, the mean misleads. Report the median, segment the data, or be explicit about which summary you chose — and why.
✕ "Average spend is £85." ✓ "Median spend is £42 (mean £85 is pulled up by a corporate-account long tail)."Every number from a sample needs a confidence interval. The point estimate alone is dishonest by omission.
✕ "Model accuracy is 84%." ✓ "84% (95% CI: 78–89%) on 200 hand-labelled reviews."When you compare two groups, you owe the reader both a p-value and an effect size. Either alone misleads.
✕ "p = 0.001 — ship it." ✓ "p = 0.001, effect = +2.3pp conversion. Worth the dev cost? Yes."Try each one mentally first. Click to flip and read the sample answer. Numbering matches lesson.md.
Whenever you report a model number, claim a difference, or detect an anomaly — for the rest of the course — you are using L02 tools. Hover any tile for the one-line "how L02 shows up here".