Python Pandas – Calculations with Missing Data

Handling Missing Data During Calculations in Pandas

Table of Contents

When working with data, you will often come across missing values, which are represented as NaN (Not a Number) in Pandas. Calculations with the missing values require more attention since NaN values propagate through most arithmetic operations, which may alter the results.

Pandas offers flexible ways to manage missing data during calculations, allowing you to control how these values affect your results. In this tutorial, we will learn how Pandas handles missing data during calculations, including arithmetic operations, descriptive statistics, and cumulative operations.

Arithmetic Operations with Missing Data

When performing arithmetic operations between Pandas objects, missing values (NaN) are propagated by default. For example, when you add two series with NaN values, the result will also have NaN wherever there was a missing value in any of the series.

Example

The following example demonstrates performing the arithmetic operations between two series objects with missing values.

Code:
import pandas as pd
import numpy as np

# Create 2 input series objects
ser1 = pd.Series([1, np.nan, np.nan, 2])
ser2 = pd.Series([2, np.nan, 1, np.nan])

# Display the series
print(“Input Series 1:\n”,ser1)
print(“\nInput Series 2:\n”,ser2)

# Adding two series with NaN values
result = ser1 + ser2
print(‘\nResult After adding Two series:\n’,result)

Output:
Input Series 1:
0 1.0
1 NaN
2 NaN
3 2.0
dtype: float64

Input Series 2:
0 2.0
1 NaN
2 1.0
3 NaN
dtype: float64

Result After adding Two series:
0 3.0
1 NaN
2 NaN
3 NaN
dtype: float64

Handling Missing Data in Descriptive Statistics

The Pandas library provides several methods for computing descriptive statistics, such as summing, calculating the product, or finding the cumulative sum or product. These methods are designed to handle missing data efficiently.

Example: Summing with Missing Values

When summing data with missing values, NaN values are excluded. This allows you to calculate meaningful totals even when some data is missing.

The following example performs the summing operation on a DataFrame column using the sum() function. By default, NaN values are skipped in summation operation.

Code:
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {‘A’: [np.nan, 2, np.nan, 4], ‘B’: [5, 6, 7, 8]}
df = pd.DataFrame(data)

# Display the input DataFrame
print(“Input DataFrame:\n”, df)

# Summing a column with NaN values
result = df[‘A’].sum()

print(‘\nResult After Summing the values of a column:\n’,result)

Output:
Input DataFrame:
A B
0 NaN 5
1 2.0 6
2 NaN 7
3 4.0 8

Result After Summing the values of a column: 6.0

Example: Product Calculation with Missing Values

Similar to summing, when calculating the product of values with the missing data (NaN) is treated as 1. This ensures that missing values do not alter the final product.

The following example uses the pandas df.prod() function to calculate the product of a pandas object.

Code:
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {‘A’: [np.nan, 2, np.nan, 4], ‘B’: [5, 6, np.nan, np.nan]}
df = pd.DataFrame(data)

# Display the input DataFrame
print(“Input DataFrame:\n”, df)

# Product with NaN values
result = df.prod()

print(‘\nResult After Product the values of a DataFrame:\n’,result)

Output:
Input DataFrame:
A B
0 NaN 5.0
1 2.0 6.0
2 NaN NaN
3 4.0 NaN

Result After Product the values of a DataFrame:
A 8.0
B 30.0
dtype: float64

Cumulative Operations with Missing Data

Pandas provides cumulative methods like cumsum() and cumprod() to generate running totals or products. By default, these methods ignore missing values but preserve them in the output. If you want to include the missing data in the calculation, you can set the skipna parameter to False.

Example: Cumulative Sum with Missing Values

The following example demonstrates calculating the cumulative sum of a DataFrame with missing values using the df.cumsum() method.

Code:
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {‘A’: [np.nan, 2, np.nan, 4], ‘B’: [5, 6, np.nan, np.nan]}
df = pd.DataFrame(data)

# Display the input DataFrame
print(“Input DataFrame:\n”, df)

# Calculate cumulative sum by ignoring NaN
print(‘Cumulative sum by ignoring NaN:\n’,df.cumsum())

Output:
Input DataFrame:
A B
0 NaN 5.0
1 2.0 6.0
2 NaN NaN
3 4.0 NaN

Cumulative sum by ignoring NaN:
A B
0 NaN 5.0
1 2.0 11.0
2 NaN NaN
3 6.0 NaN

Example: Including NaN in Cumulative Sum

This example shows how the cumulative sum is performed by including the missing using the df.cumsum() method by setting the skipna=False.

Code:
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {‘A’: [np.nan, 2, np.nan, 4], ‘B’: [5, 6, np.nan, np.nan]}
df = pd.DataFrame(data)

# Display the input DataFrame
print(“Input DataFrame:\n”, df)

# Calculate the cumulative sum by preserving NaN
print(‘Cumulative sum by including NaN:\n’, df.cumsum(skipna=False))

Output:
Input DataFrame:
A B
0 NaN 5.0
1 2.0 6.0
2 NaN NaN
3 4.0 NaN

Cumulative sum by including NaN:
A B
0 NaN 5.0
1 NaN 11.0
2 NaN NaN
3 NaN NaN

Example: Including NaN in Cumulative Sum

This example shows how the cumulative sum is performed by including the missing using the df.cumsum() method by setting the skipna=False.

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, np.nan, np.nan]}
df = pd.DataFrame(data)

# Display the input DataFrame
print("Input DataFrame:\n", df)

# Calculate the cumulative sum by preserving NaN
print('Cumulative sum by including NaN:\n', df.cumsum(skipna=False))

Input DataFrame:
     A    B
0  NaN  5.0
1  2.0  6.0
2  NaN  NaN
3  4.0  NaN
Cumulative sum by including NaN:
     A     B
0  NaN   5.0
1  NaN  11.0
2  NaN   NaN
3  NaN   NaN

Conclusion

When working with missing data in Pandas, it’s important to understand how NaN values influence different types of calculations. Arithmetic operations, descriptive statistics, and cumulative functions behave differently in the presence of NaN values, but Pandas provides powerful tools like skipna to help you control this behavior.

As demonstrated above, arithmetic operations propagate NaN, summing and product functions skip them by default, and cumulative functions can either skip or include them depending on parameters. Mastering these techniques is crucial for accurate data analysis and reporting.

Brought to you by Vista Academy

Vista Academy is dedicated to providing high-quality tutorials in Data Analytics and Data Science. With our real-world focused curriculum, you can gain the skills needed to become a data professional.

Pandas Written Edition English Tutorial

Curriculum

Python Pandas – Calculations with Missing Data

Handling Missing Data During Calculations in Pandas

Arithmetic Operations with Missing Data

Example

Handling Missing Data in Descriptive Statistics

Example: Summing with Missing Values

Example: Product Calculation with Missing Values

Cumulative Operations with Missing Data

Example: Cumulative Sum with Missing Values

Example: Including NaN in Cumulative Sum

Example: Including NaN in Cumulative Sum

Conclusion

Brought to you by Vista Academy

Modal title