Table of Contents
TogglePivoting in Python Pandas is a powerful data transformation technique that reshapes data for easier analysis and visualization. It changes the data representation from a “long” format to a “wide” format, making it simpler to perform aggregations and comparisons.
This technique is particularly useful when dealing with time series data or datasets with multiple columns. Pandas provides two primary methods for pivoting −
In this tutorial, we will learn about the pivoting in Pandas using these methods with examples to demonstrate their applications in data manipulation.
The Pandas df.pivot() method is used to reshape data when there are unique values for the specified index and column pairs. It is straightforward and useful when your data is well-structured without duplicate entries for the index/column combination.
Here is a basic example demonstrating pivoting a Pandas DataFrame with the Pandas df.pivot() method.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({"Col1": range(12),
"Col2": ["A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D"],
"date": pd.to_datetime(["2024-01-03", "2024-01-04", "2024-01-05"] * 4)})
# Display the Input DataFrame
print('Original DataFrame:\n', df)
# Pivot the DataFrame
pivoted = df.pivot(index="date", columns="Col2", values="Col1")
# Display the output
print('Pivoted DataFrame:\n', pivoted)
Following is the output of the above code −
Original DataFrame:
Col1 Col2 date
0 0 A 2024-01-03
1 1 A 2024-01-04
2 2 A 2024-01-05
3 3 B 2024-01-03
4 4 B 2024-01-04
5 5 B 2024-01-05
6 6 C 2024-01-03
7 7 C 2024-01-04
8 8 C 2024-01-05
9 9 D 2024-01-03
10 10 D 2024-01-04
11 11 D 2024-01-05
Pivoted DataFrame:
Col2 A B C D
date
2024-01-03 0 3 6 9
2024-01-04 1 4 7 10
2024-01-05 2 5 8 11
Note: The pivot() method requires that the index and columns specified have unique values. If your data contains duplicates, you should use the pivot_table() method instead.
The pivot() method is a straightforward way to reshape data, while pivot_table() offers flexibility for aggregation, making it suitable for more complex data manipulation tasks. This is particularly useful for summarizing data when dealing with duplicates and requires aggregation of data.
This example demonstrates pivoting a DataFrame using the df.pivot_table() method.
import numpy as np
import pandas as pd
import datetime
# Create a DataFrame
df = pd.DataFrame({
"A": [1, 1, 2, 3] * 6,
"B": ["A", "B", "C"] * 8,
"C": ["x", "x", "x", "y", "y", "y"] * 4,
"D": np.random.randn(24),
"E": np.random.randn(24),
"F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +
[datetime.datetime(2013, i, 15) for i in range(1, 13)]
})
# Display the Input DataFrame
print('Original DataFrame:\n', df)
# Pivot the DataFrame
pivot_table = pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])
# Display the output
print('Pivoted DataFrame:\n', pivot_table)
Following is the output of the above code −
Original DataFrame:
A B C D E F
0 1 A x -0.530395 2.819976 2013-01-01
1 1 B x -0.593347 0.639877 2013-02-01
2 2 C x 0.817982 -0.270499 2013-03-01
... (output truncated for brevity)
Pivoted DataFrame:
C x y
A B
1 A -0.182276 -0.605178
B -0.816068 0.127548
C 0.070736 0.786979
2 A 1.559430 NaN
B NaN -0.490681
C 0.754310 NaN
3 A NaN -1.015125
B -1.008318 NaN
C NaN -0.436260
The Pandas pivot_table() method can be used to specify an aggregation function. By default, it calculates the mean, but you can also use functions like sum, count, or even custom functions for applying aggregation to the pivoting.
This example demonstrates how to apply aggregation function with pivoting a DataFrame using the df.pivot_table() method.
import numpy as np
import datetime
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
"A": [1, 1, 2, 3] * 6,
"B": ["A", "B", "C"] * 8,
"C": ["x", "x", "x", "y", "y", "y"] * 4,
"D": np.random.randn(24),
"E": np.random.randn(24),
"F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +
[datetime.datetime(2013, i, 15) for i in range(1, 13)]
})
# Display the Input DataFrame
print('Original DataFrame:\n', df)
# Pivot the DataFrame with an aggregate function
pivot_table = pd.pivot_table(df, values=["D", "E"], index=["B"], columns=["A", "C"], aggfunc="sum")
# Display the output
print('Pivoted DataFrame:\n', pivot_table)
Following is the output of the above code −
Original DataFrame:
A B C D E F
0 1 A x 0.921728 0.807799 2013-01-01
1 1 B x 0.565152 -0.369947 2013-02-01
2 2 C x 1.260114 0.352844 2013-03-01
... (output truncated for brevity)
Pivoted DataFrame:
D E
A 1 2 3 1 2 3
C x y x y x y x y x y x y
B
A 1.125233 -0.383475 1.585623 NaN NaN NaN 2.531849 NaN NaN NaN NaN NaN
B 1.167782 -0.067283 NaN -2.263622 1.197377 NaN
C 1.599579 0.232435 1.122273 NaN NaN 0.411078
