ML Algorithms 101

Ridge Regression (L2 Regularization)

Ridge smoothly shrinks coefficients toward zero without eliminating them. Use when you suspect all features are relevant but some are noisy.

Scikit-Learn Implementation

from sklearn.linear_model import Ridge
Key parameter: alpha (λ). Larger → more shrinkage. Start with alpha=1.0.
Usage: Ridge(alpha=1.0).fit(X_train, y_train)
Get coefficients: .coef_

Ridge vs Linear Regression

Linear Reg (OLS): minimize MSE only. No penalty for large coefficients.
Ridge: minimize MSE + alpha·(sum of squared coefficients).
Ridge trades a small increase in bias for a large decrease in variance — especially useful with multicollinearity.
All coefficients remain non-zero (unless alpha → ∞).

Lasso Regression (L1 Regularization)

Lasso produces sparse solutions — many coefficients become exactly zero. Automatic feature selection. Use when you suspect most features are irrelevant.

Scikit-Learn Implementation

from sklearn.linear_model import Lasso
Key parameter: alpha (λ). Larger → more coefficients driven to zero.
Usage: Lasso(alpha=0.1).fit(X_train, y_train)
Higher alpha typically needed for Lasso than Ridge for similar effect.

Ridge vs Lasso — When to Use Each

Situation	Ridge	Lasso
Many correlated features	✓ Keeps all, shrinks smoothly	⚠ Arbitrarily picks one
Suspected irrelevant features	Doesn't eliminate	✓ Zeros them out
Interpretability priority	Hard (many small coefs)	✓ Easier (many zeros)
High-dim, few samples	✓ Stable	✓ Does feature selection

Critical Gotchas

Scale your features: Ridge/Lasso penalties treat all coefficients equally. Unscaled features with large ranges get unfairly penalized. Always standardize before fitting.
Lasso arbitrariness: With correlated features, Lasso arbitrarily selects one and zeros the rest. Not reproducible across runs/seeds. If you need stability, use Elastic Net.
alpha = 0 means no regularization — equivalent to OLS.
Tuning alpha: Use cross-validation. sklearn's RidgeCV and LassoCV do this automatically.

Parameter Summary

Key Hyperparameters

alpha: Regularization strength. Higher = more penalty. Ridge/Lasso respond differently to scale. Use CV to find optimal.
solver (Ridge/Lasso): For Ridge, 'auto' is fine (uses best method for your data). For Lasso, 'cd' (coordinate descent) is standard.
fit_intercept: Whether to fit a constant term (default True). Usually keep True.

Ridge & Lasso Regression

Ridge Regression (L2 Regularization)

Lasso Regression (L1 Regularization)

Ridge vs Lasso — When to Use Each

Parameter Summary