Insert, Update, and Delete Rows in Python Pandas

A Guide to Insert, Update, and Delete Rows in Python Pandas Data Analytics

In the ever-evolving realm of data science, the ability to manipulate and mold data is a fundamental skill. Whether you’re an aspiring data scientist or a seasoned analyst, proficiency in tools like Pandas can make all the difference. Among its many capabilities, Pandas excels at handling tabular data, providing a robust foundation for data manipulation in Python.

This blog aims to delve into the intricacies of efficient data manipulation in Pandas, with a specific focus on inserting, updating, and deleting rows. These operations are the building blocks of data transformation, and mastering them opens the door to a world of possibilities in data analysis and exploration.

So, fasten your seatbelts as we embark on a journey to unravel the secrets of Pandas, exploring techniques that will empower you to wield this powerful tool with finesse. Whether you’re looking to add new insights to your data, update existing records, or clean up unnecessary information, this guide has you covered. Let’s dive into the heart of Pandas and unlock the potential of data manipulation in Python.

 

import and read in Pandas

import pandas as pd
df=pd.read_excel('E:\JOINS\CUSTOMERDATA.xlsx')
print(df)
  • import pandas as pd
    This line imports the Pandas library and aliases it as pd for brevity. The alias is a common convention in the data science community.
  • Reading Excel File:
    df = pd.read_excel(‘E:\JOINS\CUSTOMERDATA.xlsx’)
    Here, you use the pd.read_excel() function to read the contents of an Excel file into a Pandas DataFrame (df). The file path ‘E:\JOINS\CUSTOMERDATA.xlsx’ points to the location of your Excel file.
  • If your Excel file has multiple sheets, you can specify the sheet name using the sheet_name parameter (e.g., pd.read_excel(‘file.xlsx’, sheet_name=’Sheet1′)).
  • Displaying the DataFrame:
  • print(df)
    This line prints the entire DataFrame to the console. This is useful for quickly inspecting the structure and contents of your data.

Note: If your DataFrame is too large, printing it in its entirety may not be practical. In such cases, you can use df.head() to display the first few rows or specify the number of rows to display.

In summary, your code reads an Excel file into a Pandas DataFrame and then prints the DataFrame to the console, allowing you to inspect the data. If you have any specific questions about the DataFrame or if you’d like to perform certain operations on it, feel free to let me know!

Basic Operator in Pandas

# Displaying the current DataFrame
print(df)

# Selecting Columns
names = df['customer_name']
print("\nSelected 'customer_name' column:\n", names)

# Filtering Data
young_customers = df[df['customer_id'] < 104]
print("\nCustomers with 'customer_id' less than 104:\n", young_customers)

# Sorting Data
sorted_df = df.sort_values(by='customer_name', ascending=False)
print("\nDataFrame sorted by 'customer_name' in descending order:\n", sorted_df)

1.Displaying the DataFrame:

print(df) This prints the entire DataFrame, showing all the rows and columns.

2.Selecting Columns:

names = df[‘customer_name’] print(“\nSelected ‘customer_name’ column:\n”, names) It extracts the ‘customer_name’ column from the DataFrame and prints it. names is now a Series containing only the ‘customer_name’ column.

3.Filtering Data:

young_customers = df[df[‘customer_id’] < 104] print(“\nCustomers with ‘customer_id’ less than 104:\n”, young_customers) It filters the DataFrame to include only the rows where the ‘customer_id’ is less than 104.

4. Sorting Data:

sorted_df = df.sort_values(by=’customer_name’, ascending=False) print(“\nDataFrame sorted by ‘customer_name’ in descending order:\n”, sorted_df) It sorts the DataFrame based on the ‘customer_name’ column in descending order and then prints the sorted DataFrame. Seems like you’re manipulating and exploring your data. Anything specific you’re trying to achieve with this DataFrame? OUTPUT
 customer_id customer_name       customer_email
0          101         Alice    alice@example.com
1          102           Bob      bob@example.com
2          103       Charlie  charlie@example.com
3          104         David    david@example.com
4          105           Eve      eve@example.com
5          106        Friday        Fri@fmail.com

Selected 'customer_name' column:
 0      Alice
1        Bob
2    Charlie
3      David
4        Eve
5     Friday
Name: customer_name, dtype: object

Customers with 'customer_id' less than 104:
    customer_id customer_name       customer_email
0          101         Alice    alice@example.com
1          102           Bob      bob@example.com
2          103       Charlie  charlie@example.com

DataFrame sorted by 'customer_name' in descending order:
    customer_id customer_name       customer_email
5          106        Friday        Fri@fmail.com
4          105           Eve      eve@example.com
3          104         David    david@example.com
2          103       Charlie  charlie@example.com
1          102           Bob      bob@example.com
0          101         Alice    alice@example.com

Adding a new row using loc with label-based indexing

new_data_loc = {'customer_id': 110.0, 'customer_name': 'Grace', 'customer_email': 'grace@example.com'}
df.loc[len(df)] = new_data_loc

new_data_loc

is a dictionary that contains information about a new customer. It has keys such as ‘customer_id’, ‘customer_name’, and ‘customer_email’, along with their respective values.

df.loc[len(df)]

is selecting a specific location within the DataFrame df. len(df) returns the length of the DataFrame, which corresponds to the index where the new row will be added.

df.loc[len(df)] =

new_data_loc assigns the values in the new_data_loc dictionary to the row in the DataFrame at the index position obtained by len(df). This effectively appends a new row to the DataFrame df with the information provided in the new_data_loc dictionary.

So, the code snippet is taking the dictionary new_data_loc and using it to add a new row to the DataFrame df with the information of a customer named Grace.

Adding a new row using iloc with integer-based indexing

new_data_iloc = {'customer_id': 111.0, 'customer_name': 'Harry', 'customer_email': 'harry@example.com'}
df.loc[df.index.max() + 1] = new_data_iloc

# Displaying the updated DataFrame
print(df)

This code is similar to the previous one but uses df.index.max() + 1 to find the next available index for the new row. Here’s a breakdown:

new_data_iloc

 

is a dictionary containing details about a new customer with keys ‘customer_id’, ‘customer_name’, and ‘customer_email’, each with their respective values.

df.loc[df.index.max() + 1]

locates the row in the DataFrame df at the next available index. df.index.max() returns the maximum index value in the DataFrame, and adding 1 ensures that the new row is added to the next index.

df.loc[df.index.max() + 1]

= new_data_iloc assigns the values from the new_data_iloc dictionary to the row in the DataFrame df at the index position determined by df.index.max() + 1. This effectively appends a new row to the DataFrame df with the information provided in the new_data_iloc dictionary.

The print(df) statement displays the updated DataFrame with the new row added, showing the customer named Harry with the given details.

This code is adding a new row to the DataFrame df with the information of a customer named Harry. The printed output will show the updated DataFrame with this new addition.

Delete row in data frame

df=df[df['customer_id']!=106]
print(df)

This line of code performs a filtering operation on the DataFrame df using a condition to remove rows where the ‘customer_id’ column is equal to 106. Let’s break it down:

df[‘customer_id’] != 106

creates a boolean Series where each row is evaluated to True or False based on whether the ‘customer_id’ for that row is not equal to 106.

df[df[‘customer_id’] != 106]

filters the DataFrame df by selecting only those rows where the condition df[‘customer_id’] != 106 is True. This effectively removes the rows where the ‘customer_id’ is equal to 106 from the DataFrame.

The updated DataFrame is reassigned to the variable df. Now, df contains all the rows where the ‘customer_id’ is not equal to 106.

The result displayed by print(df) will show the DataFrame after removing all rows where the ‘customer_id’ is equal to 106. It filters out those specific rows, leaving only the rows where the ‘customer_id’ is different from 106.

Update row in data frame

df.loc[df['customer_id']==104,'customer_name']='Ramesh'
print(df)

This line of code modifies the ‘customer_name’ in the DataFrame df for the rows where the ‘customer_id’

is equal to 104, setting their ‘customer_name’ to ‘Ramesh’. Let’s break it down:

df[‘customer_id’] == 104

creates a boolean Series that checks each row in the DataFrame to identify where the ‘customer_id’ column is equal to 104.

df.loc[df[‘customer_id’] == 104, ‘customer_name’] = ‘Ramesh’ uses the .loc function to locate rows meeting the condition df[‘customer_id’] == 104. It specifically targets the ‘customer_name’ column and sets the value to ‘Ramesh’ for those rows.

So, this code line updates the ‘customer_name’ to ‘Ramesh’ for all rows in the DataFrame df where the ‘customer_id’ is equal to 104. This operation effectively changes the ‘customer_name’ selectively for rows that meet the specified condition. The printed output via print(df) will display the DataFrame after this modification.

Inserting a New Column

ages = [25, 30, 28, 33, 27,22,21]

# Inserting 'age' column into the DataFrame
df['age'] = ages

To insert a new column ‘age’ in the DataFrame:


 Assuming ‘age’ data for each customer
ages = [25, 30, 28, 33, 27]

 Inserting ‘age’ column into the DataFrame


df[‘age’] = ages

  1. The code uses this line to add the ‘age’ column to the DataFrame df and populates it with the values from the ages list.

    • This operation aligns the ages in the list with the rows of the DataFrame based on their indices. The length of the list should match the number of rows in the DataFrame.

    • Each age value in the list corresponds to a row in the DataFrame, with the first age (25) being assigned to the first row, the second age (30) to the second row, and so on.

By executing this code, the ‘age’ column will be added to your DataFrame, providing age information aligned with the existing rows of your data.

Deleting a Column:

Deleting a Column:
Scroll to Top
Data Analytics course in Dehradun Uttarakhand