Sorting is a fundamental operation when working with data in Pandas, whether you’re organizing rows, columns, or specific values. Sorting can help you to arrange your data in a meaningful way for better understanding and easy analysis.
Pandas provides powerful tools for sorting your data efficiently, which can be done by labels or actual values. In this tutorial, we’ll explore various methods for sorting data in Pandas, from basic sorting by index or column labels to more advanced techniques like sorting by multiple columns and choosing specific sorting algorithms.
There are two kinds of sorting available in Pandas. They are −
To sort by the index labels, you can use the sort_index() method, by passing the axis arguments and the order of sorting, data structure object can be sorted. By default, this method sorts the DataFrame in ascending order based on the row labels.
Let’s take a basic example of demonstrating the sorting a DataFrame by using the sort_index() method.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1'])
print("Original DataFrame:\n", unsorted_df)
# Sort the DataFrame by labels
sorted_df=unsorted_df.sort_index()
print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame:
col2 col1
1 1.116188 1.631727
4 0.287900 -1.097359
6 0.058885 -0.642273
2 -2.070172 0.148255
3 -1.458229 1.298907
5 -0.723663 2.220048
9 -1.271494 2.001025
8 -0.412954 -0.808688
0 0.922697 -0.429393
7 -0.476054 -0.351621
Output Sorted DataFrame:
col2 col1
0 0.922697 -0.429393
1 1.116188 1.631727
2 -2.070172 0.148255
3 -1.458229 1.298907
4 0.287900 -1.097359
5 -0.723663 2.220048
6 0.058885 -0.642273
7 -0.476054 -0.351621
8 -0.412954 -0.808688
9 -1.271494 2.001025
By passing the Boolean value to ascending parameter, the order of the sorting can be controlled. Let us consider the following example to understand the same.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1'])
print("Original DataFrame:\n", unsorted_df)
# Sort the DataFrame by ascending order
sorted_df = unsorted_df.sort_index(ascending=False)
print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame:
col2 col1
1 -0.668366 0.576422
4 0.605218 -0.066065
6 1.140478 0.236687
2 0.137617 0.312423
3 -0.055631 0.774057
5 0.108002 1.038820
9 -0.929134 -0.982358
8 -0.207542 -1.283386
0 -0.210571 -0.656371
7 -0.106388 0.672418
Output Sorted DataFrame:
col2 col1
9 -0.929134 -0.982358
8 -0.207542 -1.283386
7 -0.106388 0.672418
6 1.140478 0.236687
5 0.108002 1.038820
4 0.605218 -0.066065
3 -0.055631 0.774057
2 0.137617 0.312423
1 -0.668366 0.576422
0 -0.210571 -0.656371
By passing the axis argument with a value 0 or 1, the sorting can be done on the column labels. By default, axis=0, sort by row. Let us consider the following example to understand the same.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(6,4),index=[1,4,2,3,5,0],columns = ['col2','col1', 'col4', 'col3'])
print("Original DataFrame:\n", unsorted_df)
# Sort the DataFrame columns
sorted_df=unsorted_df.sort_index(axis=1)
print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame:
col2 col1 col4 col3
1 -0.828951 -0.798286 -1.794752 -0.082656
4 0.440243 -0.693218 -0.218277 -0.790168
2 1.017670 1.443679 -1.939119 -1.887223
3 -0.992471 -1.425046 0.651336 -0.278247
5 -0.103537 -0.879433 0.471838 0.860885
0 -0.222297 1.094805 0.501531 -0.580382
Output Sorted DataFrame:
col1 col2 col3 col4
1 -0.798286 -0.828951 -0.082656 -1.794752
4 -0.693218 0.440243 -0.790168 -0.218277
2 1.443679 1.017670 -1.887223 -1.939119
3 -1.425046 -0.992471 -0.278247 0.651336
5 -0.879433 -0.103537 0.860885 0.471838
0 1.094805 -0.222297 -0.580382 0.501531
Like index sorting, sorting by actual values can be done using the sort_values() method. This method allows sorting by one or more columns. It accepts a by argument which will use the column name of the DataFrame with which the values are to be sorted.
import pandas as pd
panda_series = pd.Series([18, 95, 66, 12, 55, 0])
print("Unsorted Pandas Series: \n", panda_series)
panda_series_sorted = panda_series.sort_values(ascending=True)
print("\nSorted Pandas Series: \n", panda_series_sorted)
On executing the above code you will get the following output −
Unsorted Pandas Series: 0 18 1 95 2 66 3 12 4 55 5 0 dtype: int64 Sorted Pandas Series: 5 0 3 12 0 18 4 55 2 66 1 95 dtype: int64
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,9,5,0],'col2':[1,3,2,4]})
print("Original DataFrame:\n", unsorted_df)
# Sort the DataFrame by values
sorted_df = unsorted_df.sort_values(by='col1')
print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame:
col1 col2
0 2 1
1 9 3
2 5 2
3 0 4
Output Sorted DataFrame:
col1 col2
3 0 4
0 2 1
2 5 2
1 9 3
You can also sort by multiple columns by passing a list of column names to the by parameter.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,1,0,1],'col2':[1,3,4,2]})
print("Original DataFrame:\n", unsorted_df)
# Sort the DataFrame multiple columns by values
sorted_df = unsorted_df.sort_values(by=['col1','col2'])
print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame:
col1 col2
0 2 1
1 1 3
2 0 4
3 1 2
Output Sorted DataFrame:
col1 col2
2 0 4
3 1 2
1 1 3
0 2 1
Pandas allows you to specify the sorting algorithm using the kind parameter in the sort_values() method. You can choose between ‘mergesort’, ‘heapsort’, and ‘quicksort’. ‘mergesort’ is the only stable algorithm.
The following example sorts a DataFrame using the sort_values() method with specific algorithm.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,5,0,1],'col2':[1,3,0,4]})
print("Original DataFrame:\n", unsorted_df)
# Sort the DataFrame
sorted_df = unsorted_df.sort_values(by='col1' ,kind='mergesort')
print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame:
col1 col2
0 2 1
1 5 3
2 0 0
3 1 4
Output Sorted DataFrame:
col1 col2
2 0 0
3 1 4
0 2 1
1 5 3
