Back to library

ML Algorithms 101

An exhaustive algorithm reference spanning classical ML, gradient boosting, time series, deep learning, and AutoML. Part 1 covers sklearn foundations (Ridge, Lasso, Trees, SVMs, KNN). Part 2 covers advanced algorithms (XGBoost, LightGBM, CatBoost, ARIMA, Prophet, Anomaly Detection). Part 3 covers deep learning (PyTorch, Keras, CNNs, RNNs, Transformers, RL, AutoML).

Progress
1/ 21

Reading time

~180 min

Structure

3 parts

Part 1, Chapter 01 / Linear Models1 of 21

Ridge & Lasso Regression

L2 and L1 regularization: when and why to use them. The difference between shrinking and zeroing.

Ridge Regression (L2 Regularization)

Ridge smoothly shrinks coefficients toward zero without eliminating them. Use when you suspect all features are relevant but some are noisy.

Scikit-Learn Implementation
  • from sklearn.linear_model import Ridge
  • Key parameter: alpha (λ). Larger → more shrinkage. Start with alpha=1.0.
  • Usage: Ridge(alpha=1.0).fit(X_train, y_train)
  • Get coefficients: .coef_
Ridge vs Linear Regression
  • Linear Reg (OLS): minimize MSE only. No penalty for large coefficients.
  • Ridge: minimize MSE + alpha·(sum of squared coefficients).
  • Ridge trades a small increase in bias for a large decrease in variance — especially useful with multicollinearity.
  • All coefficients remain non-zero (unless alpha → ∞).

Lasso Regression (L1 Regularization)

Lasso produces sparse solutions — many coefficients become exactly zero. Automatic feature selection. Use when you suspect most features are irrelevant.

Scikit-Learn Implementation
  • from sklearn.linear_model import Lasso
  • Key parameter: alpha (λ). Larger → more coefficients driven to zero.
  • Usage: Lasso(alpha=0.1).fit(X_train, y_train)
  • Higher alpha typically needed for Lasso than Ridge for similar effect.

Ridge vs Lasso — When to Use Each

SituationRidgeLasso
Many correlated features✓ Keeps all, shrinks smoothly⚠ Arbitrarily picks one
Suspected irrelevant featuresDoesn't eliminate✓ Zeros them out
Interpretability priorityHard (many small coefs)✓ Easier (many zeros)
High-dim, few samples✓ Stable✓ Does feature selection
Critical Gotchas
  • Scale your features: Ridge/Lasso penalties treat all coefficients equally. Unscaled features with large ranges get unfairly penalized. Always standardize before fitting.
  • Lasso arbitrariness: With correlated features, Lasso arbitrarily selects one and zeros the rest. Not reproducible across runs/seeds. If you need stability, use Elastic Net.
  • alpha = 0 means no regularization — equivalent to OLS.
  • Tuning alpha: Use cross-validation. sklearn's RidgeCV and LassoCV do this automatically.

Parameter Summary

Key Hyperparameters
  • alpha: Regularization strength. Higher = more penalty. Ridge/Lasso respond differently to scale. Use CV to find optimal.
  • solver (Ridge/Lasso): For Ridge, 'auto' is fine (uses best method for your data). For Lasso, 'cd' (coordinate descent) is standard.
  • fit_intercept: Whether to fit a constant term (default True). Usually keep True.