Statistics and Probability for Machine Learning
The mathematical language behind uncertainty, inference, and evaluation.
Probability foundations
Probability assigns consistency constraints to uncertainty. Random variables, conditional distributions, and expectations define the primitives used by almost every ML model.
Key statistical perspective
We observe finite samples from an unknown population and infer latent structure. Estimators map data to parameters, and their quality is studied through bias, variance, consistency, and sample complexity.
Concepts every practitioner uses
- Law of large numbers and concentration bounds.
- Maximum likelihood and maximum a posteriori estimation.
- Confidence intervals and hypothesis testing.
- Covariance, correlation, and conditional independence.
- Frequentist vs Bayesian interpretations of uncertainty.
ML evaluation through statistics
Metrics are random variables too. Proper validation requires uncertainty-aware comparisons: confidence intervals for accuracy/AUC, bootstrap for robustness, and significance testing for model comparisons.
Takeaway: Statistics turns model performance from anecdote into evidence.