Python

Data Visualization using Matplotlib

Matplotlib is the most popular Python library for Data Visualization. It is a multi-platform, 2D plotting library and supports a wide variety of Operating Systems. In this article, we will focus on Data Visualization using matplotlib.

matplotlib

We generally import matplotlib as:

import matplotlib.pyplot as plt

Let’s consider the following sales_records data for visualization using matplotlib:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

sales_records = [[200,500,450], [700,750,550], [250,450,350], [300,550,250], [600,300,350], [300,350,150], [700,850,600], [650,700,700], [900,450,500], [400,300,200]]
year = [2000,2001,2002,2003,2004,2005,2006,2007,2008,2009]

df = pd.DataFrame(sales_records, columns=[‘Company1′,’Company2′,’Company3’], index=year)

print(df)

The DataFrame is:

Company1 Company2 Company3
2000 200 500 450
2001 700 750 550
2002 250 450 350
2003 300 550 250
2004 600 300 350
2005 300 350 150
2006 700 850 600
2007 650 700 700
2008 900 450 500
2009 400 300 200

This sales_records dataset consists of the sales profile of three companies over the years. Let’s plot some graphs to visualize this data more clearly.

Line Plot

A line plot is useful to visualize the frequency of data along the number line. This is highly useful in the case of Time-series data. We can visualize the trend in Sales of the Company1 over the given 10 years using a line graph. The code for doing so using Matplotlib is:

plt.plot(year, df.Company1)
plt.xlabel(‘Year’)
plt.ylabel(‘Sales’)
plt.show()

Output:

plt.xlabel() is used to label the x-axis. Similarly, plt.ylabel() labels the y-axis. plt.show() is used to display the plot. The color of the plot can also be modified. For example, to get the line plot in red color:

plt.plot(year, df.Company1, color=’r’)
plt.xlabel(‘Year’)
plt.ylabel(‘Sales’)
plt.show()

Output:

We can also view the Sales trend of three companies together in the same plot, as:

plt.plot(year, df.Company1)
plt.plot(year, df.Company2)
plt.plot(year, df.Company3)
plt.legend([‘Company1′,’Company2′,’Company3’])
plt.xlabel(‘Year’)
plt.ylabel(‘Sales’)
plt.title(‘Sales of Companies’)
plt.show()

Output:

plt.title() is used to specify the title of the chart. plt.legend() displays associated legend.

Functions can be plotted using line graphs of matplotlib, as:

x = np.linspace(-20, 19, 20)
y = (x**2) – 7

plt.plot(x, y)
plt.xlabel(‘x-axis’)
plt.ylabel(‘y-axis’)
plt.show()

Output:

Bar Graph

Data can also be visualized using horizontal or vertical straight lines. To do so in matplotlib:

plt.bar(df.index,df.Company1) plt.xlabel(‘Year’) plt.ylabel(‘Sales’) plt.show()

Output:

Multiple variables can also be represented on the same graph. This can be done with some modifications as:

plt.bar(df.index + 0.0, df.Company1, width = 0.2)
plt.bar(df.index + 0.2, df.Company2, width = 0.2)
plt.bar(df.index + 0.4, df.Company3, width = 0.2)
plt.xlabel(‘Year’)
plt.ylabel(‘Sales’)
plt.legend([‘Company1′,’Company2′,’Company3’])
plt.show()

Output:

Histogram

A histogram is useful for showing distribution frequency for continuous data. For example, for the generated random data, we can view the distribution, as:

x = np.random.randn(1000)
plt.hist(x)
plt.show()

Output:

Pie Chart

We can also visualize the sales data using a Pie chart. This is useful for a quick comparison between the quantities. Consider the example for the following data:

company = [‘Company1′,’Company2′,’Company3′,’Company4′,’Company5’]
sales = [650,200,700,450,350]
plt.pie(sales, labels = company)
plt.show()

Output:

Stack Plot

We can also plot a stack plot where data of different categories are stacked together. This is an extension of the line chart and bar plot. For example, for sales_records data, the stack plot is:

plt.stackplot(year, df.Company1, df.Company2, df.Company3)
plt.legend([‘Company1′,’Company2′,’Company3’])
plt.xlabel(‘Year’)
plt.ylabel(‘Sales’)
plt.title(‘Sales of Companies’)
plt.show()

Output:

Scatter Plot

To view data in form of Scatter plots, can be done using matplotlib as:

plt.scatter(year, df.Company1)
plt.scatter(year, df.Company2)
plt.scatter(year, df.Company3)
plt.legend([‘Company1′,’Company2′,’Company3’])
plt.xlabel(‘Year’)
plt.ylabel(‘Sales’)
plt.title(‘Sales of Companies’)
plt.show()

Output:

Box Plots

Box plot displays from 1st to 3rd quartile of a set of data containing the minimum, maximum, first quartile, third quartile, and median. Consider the example on the following random data:

df = [np.random.normal(100, 10, 150), np.random.normal(80, 20, 150)]

plt.boxplot(df)
plt.show()

Output:

Summary

In this article, we worked with Data Visualization using Matplotlib, its various features, and the different types of graphical representations that can be achieved through it. In the next tutorial, we will focus on visualization using Seaborn.

Leave a Reply

Your email address will not be published. Required fields are marked *