ML 101
An exhaustive introduction to machine learning fundamentals. Covers the ML landscape, gradient descent, bias-variance tradeoff, linear regression, logistic regression, decision trees, ensembles, SVMs, neural networks, clustering, dimensionality reduction, and model evaluation metrics.
Reading time
~90 min
Structure
Single track
The ML Landscape
What machine learning actually is, and how its three paradigms are fundamentally different.
What is Machine Learning?
Traditional programming is rules → data → output. You encode logic explicitly. Machine learning flips this: you give it data + desired outputs → it infers the rules. The "rules" are parameters of a mathematical model, and "learning" is optimizing those parameters to minimize error.
Arthur Samuel's classic definition: "Field of study that gives computers the ability to learn without being explicitly programmed." More precisely — a program learns from experience E with respect to task T and performance P, if its P on T improves with E.
- Features (X): Input variables — the raw data your model sees.
- Labels / Targets (y): What you're trying to predict (in supervised learning).
- Model: A mathematical function f(X; θ) parameterized by θ that maps X → ŷ.
- Training: Adjusting θ to minimize the gap between ŷ and y on a training set.
- Inference: Running the trained model on new, unseen data.
- Generalization: How well the model performs on data it hasn't seen — the real goal.
Three Paradigms
Supervised Learning
Learn a mapping X→y from labeled examples. Error signal is explicit — you know the right answer.
Unsupervised Learning
Find structure in unlabeled data. No ground truth — the model discovers patterns, clusters, or representations.
Reinforcement Learning
An agent takes actions in an environment to maximize cumulative reward. No labeled data — learns by trial and error.
The Data Split — Always
- Training set (~70%): Model sees and learns from this. Parameters are optimized here.
- Validation set (~15%): Hyperparameter tuning and model selection. Model never trains on this, but you peek at it to make decisions — so it's "contaminated" for final evaluation.
- Test set (~15%): Touched exactly ONCE at the very end. This is your unbiased estimate of real-world performance.
- Data leakage: If test-set information bleeds into training (e.g., scaling using the full dataset's mean), your evaluation is optimistically biased. Fit scalers/preprocessors on train, apply to val/test.
- i.i.d. assumption: Most ML theory assumes data is independently and identically distributed. Time-series, geo-clustered, or grouped data violates this — use temporal/grouped splits instead of random splits.
- More data beats better algorithms more often than you'd think. Before tuning models, verify your data pipeline is clean.