Logistic Regression in Machine Learning – Theory & Python Implementation

🧠 Logistic Regression – Theory & Practice

Learn how Logistic Regression turns inputs into class probabilities for binary classification (e.g., Spam/Not-Spam), then evaluate it with a Confusion Matrix and a Classification Report.

What You’ll Learn

When to use Logistic Regression
Sigmoid function & decision boundary
Build a model with Scikit-learn
Evaluate with precision, recall, F1

Use Cases

Email spam detection
Disease diagnosis (0/1)
Customer churn prediction
Credit default risk

Sigmoid Function

Maps any real value to a probability in [0,1].

        σ(z) = 1 / (1 + e−z)
      

Decision Boundary

If P(y=1|x) ≥ 0.5 ⇒ class 1; else class 0 (threshold can be tuned).

When To Use

Binary outcomes, linearly separable (or near-linear) in feature space. Fast, interpretable baseline.

Python Implementation (Scikit-learn)

# Step 1: Imports
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import classification_report, confusion_matrix

# Step 2: Data
data = load_breast_cancer()
X, y = data.data, data.target

# Step 3: Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Step 4: Train
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Step 5: Evaluate
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Tip: Use stratify=y to keep class balance in train/test splits.

✅ Advantages

Simple, fast, interpretable
Outputs calibrated probabilities
Strong baseline for many tasks

⚠️ Limitations

Assumes linear decision boundary (in log-odds)
Sensitive to outliers & multicollinearity
Underfits complex non-linear patterns

📝 Self-Check

Why do we use the sigmoid function in Logistic Regression?
What does a 0.5 threshold mean for classification?
Which metrics from the classification report would you monitor first and why?

Next: K-Nearest Neighbors (KNN)

Machine Learning with Python: From Basics to Capstone

Curriculum