Table of Contents
ToggleFiltering arrays in NumPy allows you to select and work with subsets of data based on specific conditions. This process is useful for extracting relevant data, performing conditional operations, and analyzing subsets of data.
We can perform filtering in NumPy by creating a Boolean array (mask) where each element indicates whether the corresponding element in the original array meets the specified condition. This mask is then used to index the original array, extracting the elements that satisfy the condition.
NumPy provides various ways to filter arrays through Boolean indexing and conditional operations.
Boolean indexing allows you to filter array elements based on conditions. By applying a condition to an array, you obtain a Boolean array that you can use to index the original array.
In the following example, we are filtering elements greater than the value “10” from the given array −
# Open Compiler
import numpy as np
# Creating an array
array = np.array([1, 5, 8, 12, 20, 3])
# Define the condition
condition = array > 10
# Apply the condition to filter the array
filtered_array = array[condition]
print("Original Array:", array)
print("Filtered Array (elements > 10):", filtered_array)
Output:
Following is the output obtained −
Original Array: [ 1 5 8 12 20 3]
Filtered Array (elements > 10): [12 20]
Filtering with multiple conditions allows you to select elements from a NumPy array that meet more than one criterion simultaneously. This is achieved by combining multiple Boolean conditions using logical operators as follows −
The resulting Boolean array, representing the combined conditions, is then used to index the original array, extracting the elements that satisfy all specified criteria.
In this example, we are filtering elements within a range using multiple conditions −
# Open Compiler
import numpy as np
# Creating an array
array = np.array([1, 5, 8, 12, 20, 3])
# Define multiple conditions
condition = (array > 5) & (array < 15)
# Apply the conditions to filter the array
filtered_array = array[condition]
print("Original Array:", array)
print("Filtered Array (5 < elements < 15):", filtered_array)
Output:
This will produce the following result −
Original Array: [ 1 5 8 12 20 3]
Filtered Array (5 < elements < 15): [ 8 12]
When filtering with functions, you generally define a function that takes array elements as input and returns a Boolean value (True or False) indicating whether each element should be included in the result.
This function is then applied to the array, and the resulting Boolean array is used to index and filter the original data.
In the example below, we are using the where() function to filter elements in NumPy −
# Open Compiler
import numpy as np
# Creating an array
array = np.array([1, 5, 8, 12, 20, 3])
# Define the condition
condition = array > 10
# Filter elements
filtered_indices = np.where(condition)
filtered_array = array[filtered_indices]
print("Original Array:", array)
print("Filtered Array (elements > 10) using np.where:", filtered_array)
Output:
This function returns the indices where the condition is "True". These indices are used to extract the filtered elements as shown in the output below −
Original Array: [ 1 5 8 12 20 3]
Filtered Array (elements > 10) using np.where: [12 20]
Let us go through an example where we use a custom function to filter an array based on a specific criterion −
# Open Compiler
import numpy as np
# Create a NumPy array
array = np.array([10, 15, 20, 25, 30, 35])
# Define a custom function for filtering
def is_prime(num):
"""Return True if num is a prime number, False otherwise."""
if num <= 1:
return False
for i in range(2, int(np.sqrt(num)) + 1):
if num % i == 0:
return False
return True
# Apply the function to each element of the array
mask = np.array([is_prime(x) for x in array])
# Use the mask to filter the array
filtered_array = array[mask]
print("Original Array:", array)
print("Mask (prime numbers):", mask)
print("Filtered Array (prime numbers):", filtered_array)
Output:
The output obtained is as shown below −
Original Array: [10 15 20 25 30 35]
Mask (prime numbers): [False False False False False False]
Filtered Array (prime numbers): []
In multi-dimensional arrays, filtering can be done using Boolean indexing, similar to one-dimensional arrays. However, you need to ensure that the filtering conditions are applied appropriately to handle the dimensions of array.
Following are the steps involved for filtering in multi-dimensional arrays −
Consider a 2D array where we want to filter out rows based on a condition applied to elements in a specific column −
# Open Compiler
import numpy as np
# Create a 2D NumPy array
array = np.array([[10, 20, 30],
[15, 25, 35],
[20, 30, 40]])
# Define a condition for filtering
# Select rows where the value in the second column is greater than 25
condition = array[:, 1] > 25
# Use the condition to filter the array
filtered_array = array[condition]
print("Original Array:\n", array)
print("Condition (values in second column > 25):", condition)
print("Filtered Array:\n", filtered_array)
Output:
After executing the above code, we get the following output −
Original Array:
[[10 20 30]
[15 25 35]
[20 30 40]]
Condition (values in second column > 25): [False False True]
Filtered Array:
[[20 30 40]]
Key Takeaway: Master filtering arrays in NumPy with Vista Academy!
