Table of Contents
ToggleMachine learning models need **numbers**, not text! This lesson teaches you how to convert text-based categorical features into numerical form using Label Encoding and One-Hot Encoding.
Most ML algorithms can’t handle non-numeric features. For example, models can’t understand “Male”, “Female” as values.
We must **convert text into numbers** without changing its meaning.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Gender_encoded'] = le.fit_transform(df['Gender'])
Converts categories to integers: Male = 1, Female = 0
Use when: The values have some order (e.g., Low, Medium, High).
pd.get_dummies(df['City'])
This creates separate columns: City_Delhi, City_Mumbai, City_Kolkata etc.
Use when: No order between values (e.g., cities, names, categories).
LabelEncoder
on a “Gender” columnget_dummies()
to encode the “City” column⏭️ Next Lesson: Feature Scaling – Standardization & Normalization