Concatenation in Pandas refers to the process of joining two or more Pandas objects (like DataFrames or Series) along a specified axis. This operation is very useful when you need to merge data from different sources or datasets.
The primary tool for this operation is pd.concat() function, which can useful for Series, DataFrame objects, whether you’re combining rows or columns. Concatenation in Pandas involves combining multiple DataFrame or Series objects either row-wise or column-wise.
In this tutorial, we’ll explore how to concatenate Pandas objects using the pd.concat() function. By discussing the different scenarios including concatenating along rows, using keys to distinguish concatenated DataFrames, ignoring indexes during concatenation, and concatenating along columns.
The pandas.concat() function is the primary method used for concatenation in Pandas. It allows you to concatenate pandas objects along a particular axis with various options for handling indexes.
The syntax of the pd.concat() functions as follows −
pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None,
levels=None, names=None, verify_integrity=False, sort=False, copy=None)
Where,
The concat() function does all of the heavy lifting of performing concatenation operations along an axis. Let us create different objects and do concatenation.
In this example, the two DataFrames are concatenated along rows, with the resulting DataFrame having duplicated indices.
import pandas as pd
# Creating two DataFrames
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
# Concatenating DataFrames
result = pd.concat([one, two])
print(result)
Name subject_id Marks_scored
1 Alex sub1 98
2 Amy sub2 90
3 Allen sub4 87
4 Alice sub6 69
5 Ayoung sub5 78
1 Billy sub2 89
2 Brian sub4 80
3 Bran sub3 79
4 Bryce sub6 97
5 Betty sub5 88
If you want to distinguish between the concatenated DataFrames, you can use the keys parameter to associate specific keys with each part of the DataFrame.
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print(pd.concat([one,two],keys=['x','y']))
Name subject_id Marks_scored
x 1 Alex sub1 98
2 Amy sub2 90
3 Allen sub4 87
4 Alice sub6 69
5 Ayoung sub5 78
y 1 Billy sub2 89
2 Brian sub4 80
3 Bran sub3 79
4 Bryce sub6 97
5 Betty sub5 88
If the resultant object has to follow its own indexing, set ignore_index to True.
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print(pd.concat([one,two],keys=['x','y'],ignore_index=True))
Name subject_id Marks_scored
0 Alex sub1 98
1 Amy sub2 90
2 Allen sub4 87
3 Alice sub6 69
4 Ayoung sub5 78
5 Billy sub2 89
6 Brian sub4 80
7 Bran sub3 79
8 Bryce sub6 97
9 Betty sub5 88
Observe, the index changes completely and the Keys are also overridden.
Instead of concatenating along rows, you can concatenate along columns by setting the axis parameter to 1.
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
print(pd.concat([one,two],axis=1))
Name subject_id Marks_scored Name subject_id Marks_scored
1 Alex sub1 98 Billy sub2 89
2 Amy sub2 90 Brian sub4 80
3 Allen sub4 87 Bran sub3 79
4 Alice sub6 69 Bryce sub6 97
5 Ayoung sub5 78 Betty sub5 88
