Categories: pandasPython

Pandas DataFrame

A DataFrame is a two-dimensional labeled data structure, containing heterogeneous data. The data is arranged in a tabular format – with both a row and a column index.

DataFrame is, in fact, the most widely used data structure of Pandas. It can be visualized as a spreadsheet or a collection of series, for example,

Student_NameAgeMarksGrade
Tom1975.5C
Maria2083.5B
John1892.0A

This dataset is represented in rows and columns, with attributes such as Student_Name, Age, Marks, and Grade.

The columns of a DataFrame can be of different data types. For example, in the above table Student_Name is a String, Age is an Integer, Marks is Float, and Grade is a String data type.

DataFrame is size mutable as well as data mutable.

Creating a DataFrame

The syntax for creating a Pandas DataFrame is:

pandas.DataFrame( data, index, columns, dtype, copy)

data, index, columns, dtype, copy are its parameters –

  • data can contain lists, maps, dict, ndarray, etc.
  • index contains the indexing values for the frames
  • columns provide column labels for the resulting frame
  • dtype is the data type for each column
  • copy is for copying of data

Pandas DataFrame can be created from the lists, maps, dictionary, ndarray, series, or another DataFrame. In the real world applications, however, the data in DataFrame is generally loaded from CSV files, SQL databases, etc.

Let us look at some basic examples of creating a DataFrame.

Creating an empty DataFrame:

# import pandas with the alias as pd
import pandas as pd

# Creating an empty DataFrame
df = pd.DataFrame()
print(df)

The output of the above code would be:

Empty DataFrame
Columns: []
Index: []

Creating a DataFrame using lists

import pandas as pd

# a list of values
fruits = [‘Apple’, ‘Mango’, ‘Banana’, ‘Pineapple’, ‘Grapes’]

# creating DataFrame using a list
df = pd.DataFrame(fruits)
print(df)

The output of the above code would be:

0 Apple
1 Mango
2 Banana
3 Pineapple
4 Grapes

Similarly, we can create DataFrames with several rows and columns using a list of lists, as:

import pandas as pd

# a list of lists of values
student_records = [[‘John’,14,82.5],[‘Maria’,12,90.0],[‘Tom’,13,77.0]]

# creating DataFrame using list
df = pd.DataFrame(student_records,columns=[‘Name’,’Age’,’Marks’])
print(df)

The output of the above is:

Name Age Marks
0 John 14 82.5
1 Maria 12 90.0
2 Tom 13 77.0

Creating a DataFrame using a dictionary

import pandas as pd

# a list of lists of values
student_records = {‘Name’: [‘John’,’Maria’,’Tom’], ‘Age’: [14,12,13], ‘Marks’: [82.5,90.0,77.0]}

# creating DataFrame using list
df = pd.DataFrame(student_records,index=[‘Student-1′,’Student-3′,’Student-2’])
print(df)

The output of the above is:

Age Marks Name
Student-1 14 82.5 John
Student-3 12 90.0 Maria
Student-2 13 77.0 Tom

Summary

In this article, we have looked at the main data structures of Pandas – DataFrame. In the upcoming articles, we will focus on one more advanced data structure of pandas- Series.

Pallavi Pandey

Recent Posts

MapReduce Algorithm

In this tutorial, we will focus on MapReduce Algorithm, its working, example, Word Count Problem,…

8 months ago

Linear Programming using Pyomo

Learn how to use Pyomo Packare to solve linear programming problems. In recent years, with…

1 year ago

Networking and Professional Development for Machine Learning Careers in the USA

In today's rapidly evolving technological landscape, machine learning has emerged as a transformative discipline, revolutionizing…

1 year ago

Predicting Employee Churn in Python

Analyze employee churn, Why employees are leaving the company, and How to predict, who will…

2 years ago

Airflow Operators

Airflow operators are core components of any workflow defined in airflow. The operator represents a…

2 years ago

MLOps Tutorial

Machine Learning Operations (MLOps) is a multi-disciplinary field that combines machine learning and software development…

2 years ago