Lesson 2: Encoding Categorical Variables

🔠 Lesson 2: Encoding Categorical Variables in Python

Table of Contents

Machine learning models need **numbers**, not text! This lesson teaches you how to convert text-based categorical features into numerical form using Label Encoding and One-Hot Encoding.

🔍 Why Encoding?

Most ML algorithms can’t handle non-numeric features. For example, models can’t understand “Male”, “Female” as values.

We must **convert text into numbers** without changing its meaning.

1️⃣ Label Encoding

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['Gender_encoded'] = le.fit_transform(df['Gender'])

Converts categories to integers: Male = 1, Female = 0

Use when: The values have some order (e.g., Low, Medium, High).

📦 One-Hot Encoding

pd.get_dummies(df['City'])

This creates separate columns: City_Delhi, City_Mumbai, City_Kolkata etc.

Use when: No order between values (e.g., cities, names, categories).

🧠 Try This:

Use LabelEncoder on a “Gender” column
Use get_dummies() to encode the “City” column
Compare how many new columns are created with one-hot encoding

⏭️ Next Lesson: Feature Scaling – Standardization & Normalization

Machine Learning with Python: From Basics to Capstone

Curriculum