Table of Contents
TogglePandas DataFrame is a two-dimensional data structure that can be used for storing and manipulating tabular data. It consists of rows and columns making it similar to a spreadsheet or SQL table. Modifying a Pandas DataFrame is a crucial step in data preprocessing, data analysis, and data cleaning.
Some of the most common DataFrame modifications include −
In this tutorial, we will learn about how to modify Pandas DataFrames in different ways.
Renaming column or row labels improves data readability and helps standardize column names. The rename() method in Pandas allows renaming one or more columns or row labels.
The following example uses the DataFrame.rename() method to rename a columns name of a DataFrame.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]})
# Display original DataFrame
print("Original DataFrame:")
print(df)
# Rename column 'A' to 'aa'
df = df.rename(columns={'A': 'aa'})
# Display modified DataFrame
print("Modified DataFrame:")
print(df)
Original DataFrame:
A B
0 1 4
1 2 5
2 3 6
Modified DataFrame:
aa B
0 1 4
1 2 5
2 3 6
Similarly, you can rename row labels using the index parameter of the rename() method.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]}, index=['x', 'y', 'z'])
# Display original DataFrame
print("Original DataFrame:")
print(df)
# Rename the multiple row labels
df = df.rename(index={'x': 'r1', 'y':'r2', 'z':'r3'})
# Display modified DataFrame
print("Modified DataFrame:")
print(df)
Original DataFrame:
A B
x 1 4
y 2 5
z 3 6
Modified DataFrame:
A B
r1 1 4
r2 2 5
r3 3 6
Adding a new column to an existing DataFrame is straightforward. The simplest way is to directly assign values to the DataFrame using a new column name. Additionally, you can use the DataFrame.insert() method to insert a new column at a specified location.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]})
# Add a new column 'C' with values
df['C'] = [7, 8, 9]
# Display updated DataFrame
print("DataFrame after adding a new column 'C':")
print(df)
DataFrame after adding a new column 'C':
A B C
0 1 4 7
1 2 5 8
2 3 6 9
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]})
# Insert a new column 'D' at position 1
df.insert(1, 'D', [10, 11, 12])
# Display updated DataFrame
print("DataFrame after inserting column 'D' at position 1:")
print(df)
DataFrame after inserting column 'D' at position 1:
A D B
0 1 10 4
1 2 11 5
2 3 12 6
Replacing the contents of the DataFrame can be done by multiple ways, one of the easiest way is assigning new values directly to the particular part of the DataFrame.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]})
# Replace the contents of column 'A' with new values
df['A'] = [10, 20, 30]
# Display updated DataFrame
print("DataFrame after replacing column 'A':")
print(df)
DataFrame after replacing column 'A':
A B
0 10 4
1 20 5
2 30 6
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]})
# Display the Input DataFrame
print("Original DataFrame:", df, sep='\n')
# Replace the contents
df.replace({'A': 1, 'B': 6}, 100, inplace=True)
# Display updated DataFrame
print("DataFrame after replacing column 'A':")
print(df)
Original DataFrame:
A B
0 1 4
1 2 5
2 3 6
DataFrame after replacing column 'A':
A B
0 100 4
1 2 5
2 3 100
Removing unnecessary columns is essential for data cleaning. You can delete single or multiple columns of a DataFrame using the DataFrame.drop() method.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6],'C': [7, 8, 9]})
# Display the original DataFrame
print("Original DataFrame:", df, sep='\n')
# Delete columns 'A' and 'B'
df = df.drop(columns=['A', 'B'])
# Display updated DataFrame
print("DataFrame after deleting columns 'A' and 'B':")
print(df)
Original DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
DataFrame after deleting columns 'A' and 'B':
C
0 7
1 8
2 9
