Exploring Data Analysis with NumPy: 10 Essential Functions

18Jul, 2023

Exploring Data Analysis with NumPy: 10 Essential Functions

For jobs involving data analysis, NumPy, the core library for numerical computing in Python, provides a wide range of functions. With NumPy, you have the option of using a wide range of tools for deriving relevant insights whether you’re working with huge datasets or doing complex calculations.

In this blog post, we will delve into 10 common NumPy functions that are essential for effective data analysis. Let’s explore these functions and understand their applications in detail.

Table of Contents

Here's an example dataset with random numbers:

import numpy as np

# Create a dataset
dataset = np.random.randint(1, 100, size=(5, 5))

print("Dataset:")
print(dataset)

np.mean()

The mean (average) of all the elements of a dataset is calculated by this function. The dataset is used as the input, and the mean value is output. By adding up each element in the dataset and dividing the result by the total amount of elements, the mean is calculated.

mean = np.mean(dataset)
print("Mean:", mean)

np.median()

The middle value in a dataset is represented by the median. The dataset is first sorted in an ascending sequence before the middle value is found in order to figure out the median. The middle value is the median if the dataset has an odd number of elements. The median is determined as the average of the two middle values when the dataset has an even number of parts.

median = np.median(dataset)
print("Median:", median)

np.std()

The spread or distribution of a dataset is measured by the standard deviation. The average separation between each data point and the mean is calculated. By taking the square root of the variance, the standard deviation is calculated. More variation in the dataset is indicated by a higher standard deviation.

std = np.std(dataset)
print("Standard Deviation:", std)

np.min()

This function returns the dataset’s smallest value. It looks through every element and finds the one with the smallest value.

min_value = np.min(dataset)
print("Minimum Value:", min_value)

np.max()

On the other hand, np.max() delivers a dataset’s maximum value. It looks over each element and determines which one has the highest value.

max_value = np.max(dataset)
print("Maximum Value:", max_value)

np.sum()

The total of each element in a dataset is calculated using this function. It computes the sum of all values in the dataset and returns it.

sum_value = np.sum(dataset)
print("Sum:", sum_value)

np.prod()

The np.prod() function calculates the sum of all dataset’s elements. It multiplies all of the values present together and outputs the result.

product = np.prod(dataset)
print("Product:", product)

np.unique()

The unique elements in a dataset are located using the np.unique() function. It removes any duplicate items and returns a sorted array of unique values.

unique_values = np.unique(dataset)
print("Unique Values:", unique_values)

np.transpose()

A dataset can be transposed using the np.transpose() function. It essentially rotates a multi-dimensional array by switching its rows and columns. When you wish to convert rows into columns or vice versa, this function is really helpful.

transposed_dataset = np.transpose(dataset)
print("Transposed Dataset:")
print(transposed_dataset)

np.reshape()

With the use of the np.reshape() method, you can modify a dataset’s shape while keeping its original elements. The intended shape and the dataset are inputs, and it outputs a new array with the required shape. Reshaping can be used to change a multi-dimensional array’s dimensions or turn a 1D array into a 2D array.

reshaped_dataset = np.reshape(dataset, (10, 5))
print("Reshaped Dataset:")
print(reshaped_dataset)