In this article, we will work with a few of the general functions of Pandas, namely crosstab, pivot_table, and melt.
Pandas crosstab() function is used for computing a simple cross-tabulation of two or more factors. It computes a frequency table of the factors by default unless an array of values and an aggregation function are passed.
The syntax is:
pandas.crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize) |
The parameters are:
Let us look at some examples of Pandas crosstab. Consider the following data:
import pandas as pd student_records = [[‘John’,14,’M’,’USA’,82.5],[‘Maria’,12,’F’,’Canada’,90.0],[‘Tom’,13,’M’,’Brazil’,77.0],[‘Adam’,15,’M’,’Canada’,87.0],[‘Carla’,14,’F’,’UK’,73.0],[‘Ben’,12,’M’,’USA’,65.5],[‘David’,14,’M’,’UK’,91.5],[‘Laila’,15,’F’,’Brazil’,81.0],[‘Amy’,12,’F’,’USA’,71.0],[‘Tina’,14,’F’,’Canada’,63.5]] df = pd.DataFrame(student_records, columns=[‘Name’,’Age’,’Gender’,’Country’,’Marks’]) print(df) |
The DataFrame is:
Name Age Gender Country Marks 0 John 14 M USA 82.5 1 Maria 12 F Canada 90.0 2 Tom 13 M Brazil 77.0 3 Adam 15 M Canada 87.0 4 Carla 14 F UK 73.0 5 Ben 12 M USA 65.5 6 David 14 M UK 91.5 7 Laila 15 F Brazil 81.0 8 Amy 12 F USA 71.0 9 Tina 14 F Canada 63.5 |
Now let’s create a crosstab table by ‘Country’ and ‘Gender’:
pd.crosstab(df.Country, df.Gender, margins=True) |
The resulting crosstab table is:
Gender F M All Country Brazil 1 1 2 Canada 2 1 3 UK 1 1 2 USA 1 2 3 All 5 5 10 |
The table stores the count of the number of records by ‘Country’ and ‘Gender’.
We can also create a crosstab of the number of students of Country age-wise and ‘Gender’. For this take ‘Country’ and ‘Age’ as indices and ‘Gender’ as columns:
pd.crosstab([df.Country, df.Age], df.Gender, margins=True) |
Output:
Gender F M All Country Age Brazil 13 0 1 1 15 1 0 1 Canada 12 1 0 1 14 1 0 1 15 0 1 1 UK 14 1 1 2 USA 12 1 1 2 14 0 1 1 All 5 5 10 |
Pandas pivot_table is used to create a spreadsheet-style pivot table as a DataFrame. The levels of the pivot table will be stored in multiIndex objects on the index and columns of the resulting DataFrame.
The syntax is:
pandas.pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed) |
The parameters are:
Let’s look at some examples of pivot tables. Consider the following student record:
import pandas as pd student_records = [[‘John’,14,’M’,’USA’,82.5],[‘Maria’,12,’F’,’Canada’,90.0],[‘Tom’,13,’M’,’Brazil’,77.0],[‘Adam’,15,’M’,’Canada’,87.0],[‘Carla’,14,’F’,’Canada’,73.0],[‘Ben’,12,’M’,’USA’,65.5],[‘David’,14,’M’,’Brazil’,91.5],[‘Laila’,15,’F’,’Brazil’,81.0],[‘Amy’,12,’F’,’USA’,71.0],[‘Tina’,14,’F’,’Canada’,63.5]] df = pd.DataFrame(student_records, columns=[‘Name’,’Age’,’Gender’,’Country’,’Marks’]) print(df) |
The DataFrame is:
Name Age Gender Country Marks 0 John 14 M USA 82.5 1 Maria 12 F Canada 90.0 2 Tom 13 M Brazil 77.0 3 Adam 15 M Canada 87.0 4 Carla 14 F Canada 73.0 5 Ben 12 M USA 65.5 6 David 14 M Brazil 91.5 7 Laila 15 F Brazil 81.0 8 Amy 12 F USA 71.0 9 Tina 14 F Canada 63.5 |
Now to create a pivot table which stores the average age of male and female students of different countries, the code is:
table = pd.pivot_table(df, values=[‘Age’], index=[‘Country’], columns=[‘Gender’]) print(table) |
This results in the following Pivot table:
Age Gender F M Country Brazil 15.000000 13.5 Canada 13.333333 15.0 USA 12.000000 13.0 |
Here, by default, the aggfunc is np.mean.
We can also store max and min age of respective genders of the countries, as:
table = pd.pivot_table(df, values=[‘Age’], index=[‘Country’], columns=[‘Gender’], aggfunc=[max,min]) print(table) |
This gives:
max min Age Age Gender F M F M Country Brazil 15 14 15 13 Canada 14 15 12 15 USA 12 14 12 12 |
Pandas melt() function is used for transforming or reshaping DataFrames. It unpivots a DataFrame from wide to long format.
The syntax is:
pandas.melt(frame, id_vars, value_vars, var_name, value_name, col_level, ignore_index) |
The parameters are:
Let’s look at some examples of the melt() operation. Consider the following data:
import pandas as pd student_records = [[‘John’,14,’M’,’USA’,82.5],[‘Maria’,12,’F’,’Canada’,90.0],[‘Tom’,13,’M’,’Brazil’,77.0],[‘Adam’,15,’M’,’Canada’,87.0],[‘Amy’,12,’F’,’USA’,71.0],[‘Tina’,14,’F’,’Canada’,63.5]] df = pd.DataFrame(student_records, columns=[‘Name’,’Age’,’Gender’,’Country’,’Marks’]) print(df) |
The DataFrame is:
Name Age Gender Country Marks 0 John 14 M USA 82.5 1 Maria 12 F Canada 90.0 2 Tom 13 M Brazil 77.0 3 Adam 15 M Canada 87.0 4 Amy 12 F USA 71.0 5 Tina 14 F Canada 63.5 |
Applying melt() function on this DataFrame:
pd.melt(df, id_vars=[‘Name’], value_vars=[‘Age’,’Gender’]) |
This results in:
Name variable value 0 John Age 14 1 Maria Age 12 2 Tom Age 13 3 Adam Age 15 4 Amy Age 12 5 Tina Age 14 6 John Gender M 7 Maria Gender F 8 Tom Gender M 9 Adam Gender M 10 Amy Gender F 11 Tina Gender F |
In this article, we look at three functions of Pandas, namely, crosstab, pivot_tables, and melt. In the upcoming article, we will work with Pandas Date and Time.
In this tutorial, we will focus on MapReduce Algorithm, its working, example, Word Count Problem,…
Learn how to use Pyomo Packare to solve linear programming problems. In recent years, with…
In today's rapidly evolving technological landscape, machine learning has emerged as a transformative discipline, revolutionizing…
Analyze employee churn, Why employees are leaving the company, and How to predict, who will…
Airflow operators are core components of any workflow defined in airflow. The operator represents a…
Machine Learning Operations (MLOps) is a multi-disciplinary field that combines machine learning and software development…