Python Pandas – Merging/Joining

Pandas Merge and Join Operations

Pandas provides high-performance, in-memory join operations similar to those in SQL databases. These operations allow you to merge multiple DataFrame objects based on common keys or indexes efficiently.

The merge() Method in Pandas

The DataFrame.merge() method in Pandas enables merging of DataFrame or named Series objects using database-style joins. A named Series is treated as a DataFrame with a single named column. Joins can be performed on columns or indexes.

If merging on columns, DataFrame indexes are ignored. If merging on indexes or indexes with columns, then the index remains the same. However, in cross merges (how='cross'), you cannot specify column names for merging.

Below is the syntax of this method −

DataFrame.merge(right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False)

The key parameters are −

right: A DataFrame or a named Series to merge with.
on: Columns (names) to join on. Must be found in both the DataFrame objects.
left_on: Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
right_on: Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
left_index: If True, use the index (row labels) from the left DataFrame as its join key(s).
right_index: Same usage as left_index for the right DataFrame.
how: Determines type of join operation, available options are ‘left’, ‘right’, ‘outer’, ‘inner’, and ‘cross’. Defaults to ‘inner’.
sort: Sort the result DataFrame by the join keys in lexicographical order. Defaults to True.

Example

Let’s create two DataFrames and perform merge operations on them.

import pandas as pd

# Creating the first DataFrame
left = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']
})

# Creating the second DataFrame
right = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']
})

print("Left DataFrame:")
print(left)
print("\nRight DataFrame:")
print(right)

Left DataFrame:
   id    Name subject_id
0   1    Alex       sub1
1   2     Amy       sub2
2   3   Allen       sub4
3   4   Alice       sub6
4   5  Ayoung       sub5

Right DataFrame:
   id   Name subject_id
0   1  Billy       sub2
1   2  Brian       sub4
2   3   Bran       sub3
3   4  Bryce       sub6
4   5  Betty       sub5

Merge Two DataFrames on a Key

result = left.merge(right, on='id')
print(result)

   id Name_x subject_id_x Name_y subject_id_y
0   1   Alex         sub1  Billy         sub2
1   2    Amy         sub2  Brian         sub4
2   3  Allen         sub4   Bran         sub3
3   4  Alice         sub6  Bryce         sub6
4   5 Ayoung         sub5  Betty         sub5

Merge Two DataFrames on Multiple Keys

result = left.merge(right, on=['id', 'subject_id'])
print(result)

   id   Name_x subject_id Name_y
0   4   Alice       sub6  Bryce
1   5  Ayoung       sub5  Betty

Merge Using ‘how’ Argument

The how argument determines which keys to include in the resulting DataFrame. If a key combination does not appear in either the left or right DataFrame, the values in the joined table will be NaN.

Merge Methods and Their SQL Equivalents

Merge Method	SQL Equivalent	Description
left	LEFT OUTER JOIN	Use keys from left object
right	RIGHT OUTER JOIN	Use keys from right object
outer	FULL OUTER JOIN	Union of keys from both DataFrames
inner	INNER JOIN	Intersection of keys from both DataFrames

Example: Left Join

print(left.merge(right, on='subject_id', how='left'))

   id_x Name_x subject_id  id_y Name_y
0     1   Alex       sub1   NaN    NaN
1     2    Amy       sub2   1.0  Billy
2     3  Allen       sub4   2.0  Brian
3     4  Alice       sub6   4.0  Bryce
4     5 Ayoung       sub5   5.0  Betty

Example: Right Join

print(left.merge(right, on='subject_id', how='right'))

   id_x Name_x subject_id  id_y Name_y
0   2.0    Amy       sub2     1  Billy
1   3.0  Allen       sub4     2  Brian
2   NaN    NaN       sub3     3   Bran
3   4.0  Alice       sub6     4  Bryce
4   5.0 Ayoung       sub5     5  Betty

Example: Outer Join

print(left.merge(right, how='outer', on='subject_id'))

   id_x Name_x subject_id  id_y Name_y
0   1.0   Alex       sub1   NaN    NaN
1   2.0    Amy       sub2   1.0  Billy
2   3.0  Allen       sub4   2.0  Brian
3   4.0  Alice       sub6   4.0  Bryce
4   5.0 Ayoung       sub5   5.0  Betty
5   NaN    NaN       sub3   3.0   Bran

Inner Join

Joining will be performed on index. Join operation honors the object on which it is called. So, a.join(b) is not equal to b.join(a).

print(left.merge(right, on='subject_id', how='inner'))

   id_x Name_x subject_id  id_y Name_y
0     2    Amy       sub2     1  Billy
1     3  Allen       sub4     2  Brian
2     4  Alice       sub6     4  Bryce
3     5 Ayoung       sub5     5  Betty

The join() Method in Pandas

Pandas also provides a DataFrame.join() method, which is useful for merging DataFrames based on their index. It works similarly to DataFrame.merge() but is more efficient for index-based operations.

Below is the syntax of this method −

DataFrame.join(other, on=None, how=’left’, lsuffix=”, rsuffix=”)

Example

result = left.join(right, lsuffix='_left', rsuffix='_right')
print(result)

   id_left Name_left subject_id_left  id_right Name_right subject_id_right
0        1      Alex            sub1         1      Billy             sub2
1        2       Amy            sub2         2      Brian             sub4
2        3     Allen            sub4         3       Bran             sub3
3        4     Alice            sub6         4      Bryce             sub6
4        5    Ayoung            sub5         5      Betty             sub5

Pandas Written Edition English Tutorial

Curriculum

Python Pandas – Merging/Joining

Pandas Merge and Join Operations

The merge() Method in Pandas

Example

Merge Two DataFrames on a Key

Merge Two DataFrames on Multiple Keys

Merge Using ‘how’ Argument

Merge Methods and Their SQL Equivalents

Example: Left Join

Example: Right Join

Example: Outer Join

Inner Join

The join() Method in Pandas

Example

Modal title