Table of Contents
TogglePerformance optimization with arrays involves improving the efficiency of operations on arrays, such as reducing computation time and memory usage.
We should optimize performance for the following reasons −
Vectorized operations refer to the ability to perform operations on entire arrays or matrices in a single step without using explicit loops.
This is achieved through broadcasting and internal optimization, making these operations faster and more efficient.
In the following example, we are performing vectorized addition of two large arrays, “a” and “b”, using NumPy’s array operations. This operation calculates the element-wise sum of the arrays and stores the result in a new array “c” −
# Open Compiler import numpy as np # Create two large arrays a = np.random.rand(1000000) b = np.random.rand(1000000) # Vectorized addition c = a + b print(c)
Output:
Following is the output obtained −
[0.91662816 0.65486861 1.60409272 … 0.95122935 1.12795861 0.15812103]
Choosing the appropriate data type for your arrays is important for optimizing performance and memory usage in NumPy.
For example, using np.float32 instead of np.float64 can significantly impact memory usage and performance, particularly when working with large datasets.
In NumPy, a data type (or dtype) defines the kind of elements that an array holds and how much space is required to store each element.
In this example, we are demonstrating the usage of precision change by creating an array with double precision (64-bit) floating-point numbers and then converting it to single precision (32-bit) using the astype() method −
# Open Compiler
import numpy as np
# Create an array with double precision (64-bit)
arr_double = np.array([1.0, 2.0, 3.0], dtype=np.float64)
# Print the original double precision array
print("Original double precision array:")
print(arr_double)
print("Data type:", arr_double.dtype)
# Convert to single precision (32-bit)
arr_single = arr_double.astype(np.float32)
# Print the converted single precision array
print("\nConverted single precision array:")
print(arr_single)
print("Data type:", arr_single.dtype)
Output:
This will produce the following result −
Original double precision array:
[1. 2. 3.]
Data type: float64
Converted single precision array:
[1. 2. 3.]
Data type: float32
In NumPy, one of the primary advantages is the ability to avoid explicit loops by using built-in functions and array operations. This approach is often referred to as vectorization.
By using NumPy functions, you can perform operations on entire arrays at once, which is more concise compared to using loops.
In the example below, we calculate the mean of the array elements using the np.mean() function, without using any explicit loops −
# Open Compiler
import numpy as np
# Create an array
arr = np.array([1, 2, 3, 4, 5])
# Calculate the mean of array elements
mean = np.mean(arr)
print("mean:", mean)
Output:
Following is the output of the above code −
mean: 3.0
Broadcasting refers to the ability to perform element-wise operations on arrays with different shapes. It follows a set of rules to determine how arrays with different shapes can be aligned for operations −
In the following example, we are broadcasting “array_1d” to match the shape of “array_2d”, allowing element-wise addition −
# Open Compiler import numpy as np # Create a 2D array and a 1D array array_2d = np.array([[1, 2, 3], [4, 5, 6]]) array_1d = np.array([10, 20, 30]) # Add the 1D array to each row of the 2D array result = array_2d + array_1d print(result)
Output:
The output obtained is as shown below −
[[11 22 33]
[14 25 36]]
In-place operations in NumPy refer to modifying the data of an array directly, without creating a new array to store the result, saving memory and improving performance.
This is achieved by using operators and functions that alter the content of the original array. These operations generally use operators with an in-place suffix (e.g., +=, -=, *=, /=) or functions that support in-place modification.
In this example, we are applying arithmetic operation “+=” directly on an array without creating a new one −
# Open Compiler import numpy as np # Create an array arr = np.array([1, 2, 3, 4, 5]) # Add 10 to each element in-place arr += 10 print(arr)
Output:
After executing the above code, we get the following output −
[11 12 13 14 15]
Here, we are calculating the exponential value of each element in an array in-place using NumPy exp() function −
# Open Compiler import numpy as np # Create an array with a floating-point data type arr = np.array([1, 2, 3, 4, 5], dtype=np.float64) # Compute the exponential of each element in-place np.exp(arr, out=arr) print(arr)
Output:
After executing the above code, we get the following output −
[ 2.71828183 7.3890561 20.08553692 54.59815003 148.4131591 ]
Memory views refer to different ways of accessing or viewing the same underlying data in an array without duplicating it. This concept allows you to create different “views” or “slices” of the array that can operate on the same data in various ways −
In the example below, we create a 2D NumPy array and a view (slice) of the original array. Modifying the view also affects the original array −
# Open Compiler import numpy as np # Create a 2D array arr = np.array([[1, 2, 3], [4, 5, 6]]) # Create a view (slice) of the original array view = arr[:, 1:] # Modify the view view[0, 0] = 99 print(arr)
Output:
We get the output as shown below −
[[ 1 99 3]
[ 4 5 6]]
Here, we create a 1D NumPy array using the arange() function and then reshape it into a 2D array with 3 rows and 4 columns, changing its structure while preserving the original data −
# Open Compiler import numpy as np # Create a 1D array arr = np.arange(12) # Reshape to a 2D array reshaped = arr.reshape((3, 4)) print(reshaped)
Output:
We get the output as shown below −
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Strides are a tuple that indicates the number of bytes to step in each dimension when traversing an array. They determine how array elements are accessed in memory, providing insight into how data is laid out and accessed.
Strides give you the memory offset for each dimension. For instance, in a 2D array, the stride for the second dimension tells you how many bytes to move in memory to access the next element in that row.
In the following example, we create a 2D NumPy array and use the strides attribute to retrieve the number of bytes to step in each dimension when traversing the array −
# Open Compiler import numpy as np # Create a 2D array arr = np.array([[1, 2, 3], [4, 5, 6]]) # Print the strides of the array print(arr.strides)
Output:
We get the output as shown below −
(24, 8)
Key Takeaway: Master performance optimization with NumPy arrays at Vista Academy!
