A DataFrame in Python’s pandas library is a two-dimensional labeled data structure that is used for data manipulation and analysis. It can handle different data types such as integers, floats, and strings. Each column has a unique label, and each row is labeled with a unique index value, which helps in accessing specific rows.
DataFrame is used in machine learning tasks which allow the users to manipulate and analyze the data sets in large size. It supports the operations such as filtering, sorting, merging, grouping and transforming data.
You can think of a DataFrame as similar to an SQL table or a spreadsheet data representation. Let us assume that we are creating a data frame with student’s data.
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
The parameters of the constructor are as follows −
| Sr.No | Parameter & Description |
|---|---|
| 1 | data data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame. |
| 2 | index For the row labels, the Index to be used for the resulting frame is Optional Default np.arange(n) if no index is passed. |
| 3 | columns This parameter specifies the column labels, the optional default syntax is – np.arange(n). This is only true if no index is passed. |
| 4 | dtype Data type of each column. |
| 5 | copy This command (or whatever it is) is used for copying of data, if the default is False. |
A pandas DataFrame can be created using various inputs like −
#import the pandas library and aliasing as pd import pandas as pd df = pd.DataFrame() print(df)
Output:
Empty DataFrame Columns: [] Index: []
import pandas as pd data = [1,2,3,4,5] df = pd.DataFrame(data) print(df)
Output:
0 0 1 1 2 2 3 3 4 4 5
