Naive Bayes in Machine Learning – Bayes’ Theorem & Python Example

📦 Naive Bayes – Fast Probabilistic Classifier

Learn a lightweight, surprisingly strong baseline for text and tabular classification, built on Bayes’ Theorem with the “naive” assumption of feature independence.

You’ll Learn

Bayes’ Theorem & independence assumption
Gaussian, Multinomial & Bernoulli variants
When NB beats heavier models
Implement NB in Python (scikit-learn)

Great For

Spam detection, sentiment analysis
High-dimensional text features (TF-IDF)
Fast baselines & low-resource setups

Bayes’ Theorem

P(y|x) ∝ P(x|y) · P(y). Naive Bayes assumes features are conditionally independent given the class.

        P(y|x) = P(y) · Πi P(xi|y) / P(x)
      

Common Variants

Gaussian NB: continuous features ~ Normal
Multinomial NB: counts/TF-IDF (text)
Bernoulli NB: binary features (word present?)

Strengths & Gotchas

Ridiculously fast training & inference
Surprisingly strong for text/large vocab
Assumes independence (often violated)
Zero probabilities → use smoothing (α)

Python Implementation (Multinomial NB with TF-IDF)

# Step 1: Imports
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix

# Step 2: Data (two categories for demo)
cats = ['sci.space','rec.sport.baseball']
data = fetch_20newsgroups(subset='all', categories=cats, remove=('headers','footers','quotes'))
X, y = data.data, data.target

# Step 3: Pipeline (TF-IDF -> Multinomial NB)
pipe = Pipeline([
    ("tfidf", TfidfVectorizer(min_df=2, ngram_range=(1,2))),
    ("nb", MultinomialNB(alpha=1.0))
])

# Step 4: Train/Test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
pipe.fit(X_train, y_train)

# Step 5: Evaluate
y_pred = pipe.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=cats))

Tip: Adjust alpha (Laplace smoothing) and ngram_range to boost performance; keep stratify for balanced splits.

NB vs Logistic vs KNN (When to pick?)

Naive Bayes: text/high dimensions, tiny compute, quick baselines
Logistic Regression: linearly separable, calibrated probs, interpretable
KNN: non-parametric, simple, but needs scaling & can be slow at predict time

📝 Self-Check

Which NB variant would you choose for TF-IDF text features and why?
What problem does Laplace smoothing solve in NB?
Why does NB often work well despite the independence assumption being false?

Next: Decision Trees – Splits, Gini & Entropy

Machine Learning with Python: From Basics to Capstone

Curriculum