Overfitting vs Underfitting & Regularization (L1/L2)

📉 Overfitting, Underfitting & Regularization

Diagnose bias–variance problems and control complexity with L1/L2 regularization and early-stopping ideas.

Symptoms

Overfit: high train, low test
Underfit: low train & test

Fixes

Simplify model / more data
Regularize (L1/L2)
Cross-validate hyperparameters

Python (L2 & L1 Logistic Regression)

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import classification_report

X, y = load_breast_cancer(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

# L2 (default): smaller C = stronger regularization
clf_l2 = LogisticRegression(penalty="l2", C=0.5, max_iter=5000).fit(X_tr, y_tr)

# L1 (sparse coefficients) with liblinear/saga solver
clf_l1 = LogisticRegression(penalty="l1", solver="liblinear", C=0.5, max_iter=5000).fit(X_tr, y_tr)

print("L2:\n", classification_report(y_te, clf_l2.predict(X_te)))
print("L1:\n", classification_report(y_te, clf_l1.predict(X_te)))

Self-check: When would you prefer L1 over L2? What does coefficient sparsity buy you?

Machine Learning with Python: From Basics to Capstone

Curriculum

Overfitting vs Underfitting & Regularization (L1/L2)

Symptoms

Fixes

Python (L2 & L1 Logistic Regression)

Modal title