Pandas DataFrame
A DataFrame is a two-dimensional labeled data structure, containing heterogeneous data. The data is arranged in a tabular format – with both a row and a column index.
DataFrame is, in fact, the most widely used data structure of Pandas. It can be visualized as a spreadsheet or a collection of series, for example,
Student_Name | Age | Marks | Grade |
Tom | 19 | 75.5 | C |
Maria | 20 | 83.5 | B |
John | 18 | 92.0 | A |
This dataset is represented in rows and columns, with attributes such as Student_Name, Age, Marks, and Grade.
The columns of a DataFrame can be of different data types. For example, in the above table Student_Name is a String, Age is an Integer, Marks is Float, and Grade is a String data type.
DataFrame is size mutable as well as data mutable.
Creating a DataFrame
The syntax for creating a Pandas DataFrame is:
pandas.DataFrame( data, index, columns, dtype, copy) |
data, index, columns, dtype, copy are its parameters –
- data can contain lists, maps, dict, ndarray, etc.
- index contains the indexing values for the frames
- columns provide column labels for the resulting frame
- dtype is the data type for each column
- copy is for copying of data
Pandas DataFrame can be created from the lists, maps, dictionary, ndarray, series, or another DataFrame. In the real world applications, however, the data in DataFrame is generally loaded from CSV files, SQL databases, etc.
Let us look at some basic examples of creating a DataFrame.
Creating an empty DataFrame:
# import pandas with the alias as pd import pandas as pd # Creating an empty DataFrame df = pd.DataFrame() print(df) |
The output of the above code would be:
Empty DataFrame Columns: [] Index: [] |
Creating a DataFrame using lists
import pandas as pd # a list of values fruits = [‘Apple’, ‘Mango’, ‘Banana’, ‘Pineapple’, ‘Grapes’] # creating DataFrame using a list df = pd.DataFrame(fruits) print(df) |
The output of the above code would be:
0 Apple 1 Mango 2 Banana 3 Pineapple 4 Grapes |
Similarly, we can create DataFrames with several rows and columns using a list of lists, as:
import pandas as pd # a list of lists of values student_records = [[‘John’,14,82.5],[‘Maria’,12,90.0],[‘Tom’,13,77.0]] # creating DataFrame using list df = pd.DataFrame(student_records,columns=[‘Name’,’Age’,’Marks’]) print(df) |
The output of the above is:
Name Age Marks 0 John 14 82.5 1 Maria 12 90.0 2 Tom 13 77.0 |
Creating a DataFrame using a dictionary
import pandas as pd # a list of lists of values student_records = {‘Name’: [‘John’,’Maria’,’Tom’], ‘Age’: [14,12,13], ‘Marks’: [82.5,90.0,77.0]} # creating DataFrame using list df = pd.DataFrame(student_records,index=[‘Student-1′,’Student-3′,’Student-2’]) print(df) |
The output of the above is:
Age Marks Name Student-1 14 82.5 John Student-3 12 90.0 Maria Student-2 13 77.0 Tom |
Summary
In this article, we have looked at the main data structures of Pandas – DataFrame. In the upcoming articles, we will focus on one more advanced data structure of pandas- Series.