Python Pandas – IO Tools

Pandas I/O Tools (Reading and Writing Data)

The Pandas library offers powerful I/O tools (API) for data import and export, enabling seamless handling of various file formats like CSV, Excel, JSON, and many more. This API includes top-level reader functions like, pd.read_csv(), read_clipboard() and corresponding writer methods like, to_csv(), to_clipboard() for easy data handling.

In this tutorial, we will learn about the overview of the Pandas I/O tools and learn how to use them effectively.

Overview of Pandas IO Tools

The Pandas I/O API supports a wide variety of data formats. Here is a summary of supported formats and their corresponding reader and writer functions −

Format	Reader Function	Writer Function
Tabular Data	read_table()	NA
CSV	read_csv()	to_csv()
Fixed-Width Text File	read_fwf()	NA
Clipboard	read_clipboard()	to_clipboard()
Pickling	read_pickle()	to_pickle()
Excel	read_excel()	to_excel()
JSON	read_json()	to_json()
HTML	read_html()	to_html()
XML	read_xml()	to_xml()
LaTeX	NA	to_latex()
HDF5 Format	read_hdf()	to_hdf()
Feather	read_feather()	to_feather()
Parquet	read_parquet()	to_parquet()
ORC	read_orc()	to_orc()
SQL	read_sql()	to_sql()
Stata	read_stata()	to_stata()

Among these, the most frequently used functions for handling text files are read_csv() and read_table(). Both convert flat files into DataFrame objects.

Example: Reading CSV Data

This example shows reading the CSV data using the pandas read_csv() function. In this example we are using the StringIO to load the CSV string into a Pandas DataFrame object.

import pandas as pd
from io import StringIO

data = """S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900"""

obj = StringIO(data)
df = pd.read_csv(obj)
print(df)

Output:

S.No	Name	Age	City	Salary
0	1	Tom	28	Toronto	20000
1	2	Lee	32	HongKong	3000
2	3	Steven	43	Bay Area	8300
3	4	Ram	38	Hyderabad	3900

Customizing Parsing Options

Pandas allows several customization options when parsing data. You can modify how the data is parsed using parameters like −

index_col
dtype
names
skiprows

Customizing the index

You can customize the row labels or index of the Pandas object by using index_col parameter. Setting index_col=False forces Pandas to not use the first column as the index, which can be helpful when handling malformed files with extra delimiters.

import pandas as pd
from io import StringIO

data = """S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900"""

obj = StringIO(data)
df = pd.read_csv(obj, index_col=['S.No'])
print(df)

Output:

S.No	Name	Age	City	Salary
1	Tom	28	Toronto	20000
2	Lee	32	HongKong	3000
3	Steven	43	Bay Area	8300
4	Ram	38	Hyderabad	3900

Converters

Pandas also provides the ability to specify the data type for columns using the dtype parameter.

import pandas as pd
from io import StringIO
import numpy as np

data = """[
    {"Name": "Braund", "Gender": "Male", "Age": 30},
    {"Name": "Cumings", "Gender": "Female", "Age": 25},
    {"Name": "Heikkinen", "Gender": "Female", "Age": 35}
]"""

obj = StringIO(data)
df = pd.read_json(obj, dtype={'Age': np.float64})
print(df.dtypes)

Output:

Name       object
Gender     object
Age       float64
dtype: object

Thus, the data looks like float −

Name	Gender	Age
0	Braund	Male	30.0
1	Cumings	Female	25.0
2	Heikkinen	Female	35.0

Customizing the Header Names

When reading data files, Pandas assumes the first row as the header. However, you can customize this using the names Parameter.

import pandas as pd
from io import StringIO

xml = """

  
    Everyday Italian
    Giada De Laurentiis
    2005
    30.00
  
  
    Harry Potter
    J K. Rowling
    2005
    29.99
  
  
    Learning XML
    Erik T. Ray
    2003
    39.95
  
"""

df = pd.read_xml(StringIO(xml), names=['a', 'b', 'c','d','e'])
print('Output DataFrame from XML:')
print(df)

Output:

a	b	c	d	e
0	cooking	Everyday Italian	Giada De Laurentiis	2005	30.00
1	children	Harry Potter	J K. Rowling	2005	29.99
2	web	Learning XML	Erik T. Ray	2003	39.95

Pandas Written Edition English Tutorial

Curriculum

Python Pandas – IO Tools

Pandas I/O Tools (Reading and Writing Data)

Overview of Pandas IO Tools

Example: Reading CSV Data

Customizing Parsing Options

Customizing the index

Converters

Customizing the Header Names

Modal title