Iterating over pandas objects is a fundamental task in data manipulation, and the behavior of iteration depends on the type of object you’re dealing with. This tutorial explains how iteration works in pandas, specifically focusing on Series and DataFrame objects.
The iteration behavior in pandas varies between Series and DataFrame objects −
To iterate over the rows of the DataFrame, we can use the following methods −
items(): to iterate over the (key,value) pairsiterrows(): iterate over the rows as (index,series) pairsitertuples(): iterate over the rows as namedtuplesThe items() method allows you to iterate over each column as a key-value pair, with the label as the key and the column values as a Series object. This method is consistent with the dictionary-like interface of a DataFrame.
Example
The following example iterates a DataFrame rows using the items() method. In this example each column is iterated separately as a key-value pair in a Series.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])
print("Original DataFrame:\n", df)
# Iterate Through DataFrame rows
print("Iterated Output:")
for key,value in df.items():
print(key,value)
Its output is as follows −
Original DataFrame:
col1 col2 col3
0 0.422561 0.094621 -0.214307
1 0.430612 -0.334812 -0.010867
2 0.350962 -0.145470 0.988463
3 1.466426 -1.258297 -0.824569
Iterated Output:
col1 0 0.422561
1 0.430612
2 0.350962
3 1.466426
Name: col1, dtype: float64
col2 0 0.094621
1 -0.334812
2 -0.145470
3 -1.258297
Name: col2, dtype: float64
col3 0 -0.214307
1 -0.010867
2 0.988463
3 -0.824569
Name: col3, dtype: float64
Observe, each column is iterated separately, where key is the column name, and value is the corresponding Series object.
The iterrows() method returns an iterator that yields index and row pairs, where each row is represented as a Series object, containing the data in each row.
Example
The following example iterates the DataFrame rows using the iterrows() method.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
print("Original DataFrame:\n", df)
# Iterate Through DataFrame rows
print("Iterated Output:")
for row_index,row in df.iterrows():
print(row_index,row)
Its output is as follows −
Original DataFrame:
col1 col2 col3
0 0.468160 -0.634193 -0.603612
1 1.231840 0.090565 -0.449989
2 -1.645371 0.032578 -0.165950
3 1.956370 -0.261995 2.168167
Iterated Output:
0 col1 0.468160
col2 -0.634193
col3 -0.603612
Name: 0, dtype: float64
1 col1 1.231840
col2 0.090565
col3 -0.449989
Name: 1, dtype: float64
2 col1 -1.645371
col2 0.032578
col3 -0.165950
Name: 2, dtype: float64
3 col1 1.956370
col2 -0.261995
col3 2.168167
Name: 3, dtype: float64
Note: Because iterrows() iterate over the rows, it doesn’t preserve the data type across the row. 0,1,2 are the row indices and col1,col2,col3 are column indices.
The itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values. This method is generally faster than iterrows() and preserves the data types of the row elements.
Example
The following example uses the itertuples() method to loop through a DataFrame’s rows as Namedtuples.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
print("Original DataFrame:\n", df)
# Iterate Through DataFrame rows
print("Iterated Output:")
for row in df.itertuples():
print(row)
Its output is as follows −
Original DataFrame:
col1 col2 col3
0 0.501238 -0.353269 -0.058190
1 -0.426044 -0.012733 -0.532594
2 -0.704042 2.201186 -1.960429
3 0.514151 -0.844160 0.508056
Iterated Output:
Pandas(Index=0, col1=0.5012381423628608, col2=-0.3532690739340918, col3=-0.058189913290578134)
Pandas(Index=1, col1=-0.42604395958954777, col2=-0.012733326002509393, col3=-0.5325942971498149)
Pandas(Index=2, col1=-0.7040424042099052, col2=2.201186165472291, col3=-1.9604285032438307)
Pandas(Index=3, col1=0.5141508750506754, col2=-0.8441600001815068, col3=0.5080555294913854)
When you iterate over a DataFrame, it will simply return the column names.
Example
Let us consider the following example to understand the iterate over a DataFrame columns.
import pandas as pd
import numpy as np
N = 5
df = pd.DataFrame({
'A': pd.date_range(start='2016-01-01', periods=N, freq='D'),
'x': np.linspace(0, stop=N-1, num=N),
'y': np.random.rand(N),
'C': np.random.choice(['Low', 'Medium', 'High'], N).tolist(),
'D': np.random.normal(100, 10, size=N).tolist()
})
print("Original DataFrame:\n", df)
# Iterate Through DataFrame Columns
print("Output:")
for col in df:
print(col)
Its output is as follows −
Original DataFrame:
A x y C D
0 2016-01-01 0.0 0.990949 Low 114.143838
1 2016-01-02 1.0 0.314517 High 95.559640
2 2016-01-03 2.0 0.180237 Low 121.134817
3 2016-01-04 3.0 0.170095 Low 95.643132
4 2016-01-05 4.0 0.920718 Low 96.379692
Output:
A
x
y
C
D
While iterating over a DataFrame, you should not modify any object. Iteration is meant for reading, and the iterator returns a copy of the original object (a view), meaning changes will not reflect on the original object. The following example demonstrates the above statement.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3']) for index, row in df.iterrows(): row['a'] = 10 print(df)
Its output is as follows −
col1 col2 col3
0 -1.739815 0.735595 -0.295589
1 0.635485 0.106803 1.527922
2 -0.939064 0.547095 0.038585
3 -1.016509 -0.116580 -0.523158
As you can see, no changes are reflected in the DataFrame since the iteration only provides a view of the data.
