Comparing categorical data is an essential task for getting insights and understanding the relationships between different categories of the data. In Python, Pandas provides various ways to perform comparisons using comparison operators (==, !=, >, >=, <, and <=) on categorical data. These comparisons can be made in three main scenarios −
== and !=).==, !=, >, >=, <, and <=).It is important to note that any non-equality comparisons between categorical data with different categories or between a categorical Series and a list-like object will raise a TypeError. This is due to the categories ordering could be interpreted in two ways, one with taking into account the ordering and one without.
In this tutorial by Vista Academy, we will learn how to compare categorical data in Python Pandas library using the comparison operators such as ==, !=, >, >=, <, and <=.
In Pandas, comparing categorical data for equality is possible with a variety of objects such as lists, arrays, or Series objects of the same length as the categorical data.
The following example demonstrates how to perform equality and inequality comparisons between categorical Series and the list-like objects.
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np
# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Creating another categorical Series for comparison
s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Equality comparison
print("Equality comparison (s == s2):")
print(s == s2)
print("\nInequality comparison (s != s2):")
print(s != s2)
# Equality comparison with a NumPy array
print("\nEquality comparison with NumPy array:")
print(s == np.array([1, 2, 3, 1, 2, 3, 2, 1]))
Equality comparison (s == s2):
0 False
1 True
2 False
3 True
4 False
5 True
6 False
7 True
dtype: bool
Inequality comparison (s != s2):
0 True
1 False
2 True
3 False
4 True
5 False
6 True
7 False
dtype: bool
Equality comparison with NumPy array:
0 True
1 True
2 False
3 True
4 True
5 True
6 False
7 False
dtype: bool
Pandas allows you to perform various comparison operations including (>, >=, <=, <=) between the ordered categorical data.
This example demonstrates how to perform non-equality comparisons (>, >=, <=, <=) on ordered categorical data.
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np
# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Creating another categorical Series for comparison
s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Greater than comparison
print("Greater than comparison:\n",s > s2)
# Less than comparison
print("\nLess than comparison:\n",s < s2)
# Greater than or equal to comparison
print("\nGreater than or equal to comparison:\n",s >= s2)
# Lessthan or equal to comparison
print("\nLess than or equal to comparison:\n",s <= s2)
Greater than comparison:
0 True
1 False
2 True
3 False
4 False
5 False
6 True
7 False
dtype: bool
Less than comparison:
0 False
1 False
2 False
3 False
4 True
5 False
6 False
7 False
dtype: bool
Greater than or equal to comparison:
0 True
1 True
2 True
3 True
4 False
5 True
6 True
7 True
dtype: bool
Lessthan or equal to comparison:
0 False
1 True
2 False
3 True
4 True
5 True
6 False
7 True
dtype: bool
Categorical data can also be compared to scalar values using all comparison operators (==, !=, >, >=, <, and <=). The categorical values are compared to the scalar based on the order of their categories.
The following example demonstrates how the categorical data can be compared to a scalar value.
import pandas as pd
# Creating a categorical Series
s = pd.Series([1, 2, 3]).astype(pd.CategoricalDtype([3, 2, 1], ordered=True))
# Compare to a scalar
print("Comparing categorical data to a scalar:")
print(s > 2)
Comparing categorical data to a scalar:
0 True
1 False
2 False
dtype: bool
When comparing two categorical Series that have different categories or orderings, then a TypeError will be raised.
The following example demonstrates handling the TypeError while performing the comparison between the two categorical Series objects with the different categories or orders.
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np
# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Creating another categorical Series for comparison
s3 = pd.Series([2, 2, 2, 1, 1, 3, 1, 2]).astype(CategoricalDtype(ordered=True))
try:
print("Attempting to compare differently ordered two Series objects:")
print(s > s3)
except TypeError as e:
print("TypeError:", str(e))
Attempting to compare differently ordered two Series objects:
TypeError: Categoricals can only be compared if 'categories' are the same.
