A DataFrame is a two-dimensional labeled data structure, containing heterogeneous data. The data is arranged in a tabular format – with both a row and a column index.
DataFrame is, in fact, the most widely used data structure of Pandas. It can be visualized as a spreadsheet or a collection of series, for example,
Student_Name | Age | Marks | Grade |
Tom | 19 | 75.5 | C |
Maria | 20 | 83.5 | B |
John | 18 | 92.0 | A |
This dataset is represented in rows and columns, with attributes such as Student_Name, Age, Marks, and Grade.
The columns of a DataFrame can be of different data types. For example, in the above table Student_Name is a String, Age is an Integer, Marks is Float, and Grade is a String data type.
DataFrame is size mutable as well as data mutable.
The syntax for creating a Pandas DataFrame is:
pandas.DataFrame( data, index, columns, dtype, copy) |
data, index, columns, dtype, copy are its parameters –
Pandas DataFrame can be created from the lists, maps, dictionary, ndarray, series, or another DataFrame. In the real world applications, however, the data in DataFrame is generally loaded from CSV files, SQL databases, etc.
Let us look at some basic examples of creating a DataFrame.
# import pandas with the alias as pd import pandas as pd # Creating an empty DataFrame df = pd.DataFrame() print(df) |
The output of the above code would be:
Empty DataFrame Columns: [] Index: [] |
import pandas as pd # a list of values fruits = [‘Apple’, ‘Mango’, ‘Banana’, ‘Pineapple’, ‘Grapes’] # creating DataFrame using a list df = pd.DataFrame(fruits) print(df) |
The output of the above code would be:
0 Apple 1 Mango 2 Banana 3 Pineapple 4 Grapes |
Similarly, we can create DataFrames with several rows and columns using a list of lists, as:
import pandas as pd # a list of lists of values student_records = [[‘John’,14,82.5],[‘Maria’,12,90.0],[‘Tom’,13,77.0]] # creating DataFrame using list df = pd.DataFrame(student_records,columns=[‘Name’,’Age’,’Marks’]) print(df) |
The output of the above is:
Name Age Marks 0 John 14 82.5 1 Maria 12 90.0 2 Tom 13 77.0 |
import pandas as pd # a list of lists of values student_records = {‘Name’: [‘John’,’Maria’,’Tom’], ‘Age’: [14,12,13], ‘Marks’: [82.5,90.0,77.0]} # creating DataFrame using list df = pd.DataFrame(student_records,index=[‘Student-1′,’Student-3′,’Student-2’]) print(df) |
The output of the above is:
Age Marks Name Student-1 14 82.5 John Student-3 12 90.0 Maria Student-2 13 77.0 Tom |
In this article, we have looked at the main data structures of Pandas – DataFrame. In the upcoming articles, we will focus on one more advanced data structure of pandas- Series.
In this tutorial, we will focus on MapReduce Algorithm, its working, example, Word Count Problem,…
Learn how to use Pyomo Packare to solve linear programming problems. In recent years, with…
In today's rapidly evolving technological landscape, machine learning has emerged as a transformative discipline, revolutionizing…
Analyze employee churn, Why employees are leaving the company, and How to predict, who will…
Airflow operators are core components of any workflow defined in airflow. The operator represents a…
Machine Learning Operations (MLOps) is a multi-disciplinary field that combines machine learning and software development…