Data visualization is an important topic that allows us to represent complex data in a visually appealing and summarized manner. We often hear that a picture is worth a thousand words and this could be said for visualizations of data.
In this lesson, we're going to create some data visualizations using the Matplotlib library.
Firstly, open up a terminal window and install Matplotlib via pip as follows:
pip install matplotlib
Let's start by creating a simple plot using Matplotlib.
Here, we're going to plot a simple line graph that shows the population growth of a hypothetical city over time using the following code:
import matplotlib.pyplot as plt
# Data
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020]
population = [2.5, 3.0, 3.7, 4.5, 5.3, 6.1, 6.9, 7.7]
# Create plot
plt.plot(years, population)
# Show plot
plt.show()
See code explanation
Here's a line-by-line breakdown of the code:
- Import the
matplotlib.pyplot
asplt
(so that we can later refer tomatplotlib.pyplot
literally asplt
instead of having to type the full version ofmatplotlib.pyplot
. - Create
years
andpopulation
variables that will be used for subsequent steps in creating the plot. - Create the plot via
plt.plot()
and specifyingyears
andpopulation
as input arguments. This will create a line plot. - Finally, we're going to display the plot via
plt.show()
.
This gives us the following plot:
Now, we're going to add labels to the X and Y axes:
# Add labels
plt.title('Population Growth')
plt.xlabel('Year')
plt.ylabel('Population (millions)')
Adding this to the above code snippet in the above line plot gives us:
import matplotlib.pyplot as plt
# Data
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020]
population = [2.5, 3.0, 3.7, 4.5, 5.3, 6.1, 6.9, 7.7]
# Create plot
plt.plot(years, population)
# Add labels <- (New line of code)
plt.title('Population Growth')
plt.xlabel('Year')
plt.ylabel('Population (millions)')
# Show plot
plt.show()
See code explanation
Here's a line-by-line breakdown of the code:
- Import the
matplotlib.pyplot
asplt
(so that we can later refer tomatplotlib.pyplot
literally asplt
instead of having to type the full version ofmatplotlib.pyplot
. - Create
years
andpopulation
variables that will be used for subsequent steps in creating the plot. - Create the plot via
plt.plot()
and specifyingyears
andpopulation
as input arguments. This will create a line plot. - New line of code: Add labels to the plot as well as the X and Y axes.
- Finally, we're going to display the plot via
plt.show()
.
This gives us the following revised plot with labels:
Let's perform customization to the plot. To customize the line style and color in our plot, we can pass in additional arguments to the plot function (i.e. plt.plot()
).
For example, we can change the line color from the default blue line to a red line by adding r
as a third argument to the plt.plot()
function. Additionally, the line style can be adjusted to a dashed line by adding --
as a third argument. Taken together, we can add r--
as the third argument as follows:
plt.plot(years, population, 'r--')
Adding this to the full code gives us:
import matplotlib.pyplot as plt
# Data
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020]
population = [2.5, 3.0, 3.7, 4.5, 5.3, 6.1, 6.9, 7.7]
# Create plot
plt.plot(years, population, 'r--')
# Add labels
plt.title('Population Growth')
plt.xlabel('Year')
plt.ylabel('Population (millions)')
# Show plot
plt.show()
See code explanation
Here's a line-by-line breakdown of the code:
- Import the
matplotlib.pyplot
asplt
(so that we can later refer tomatplotlib.pyplot
literally asplt
instead of having to type the full version ofmatplotlib.pyplot
. - Create
years
andpopulation
variables that will be used for subsequent steps in creating the plot. - Create the plot via
plt.plot()
and specifyingyears
andpopulation
as input arguments. This will create a line plot. - New line of code: The line is changed to red dashed line via the third argument
r--
. - Add labels to the plot as well as the X and Y axes.
- Finally, we're going to display the plot via
plt.show()
.
And the revised plot gives us the red dashed line:
We can also add data points to the plot by using the plt.scatter()
function and while we're at it, let's also color the data points black (c='k'
) and translucent (alpha=0.6
):
plt.scatter(years, population, c='k', alpha=0.6)
Adding this to the full code gives us the revised code:
import matplotlib.pyplot as plt
# Data
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020]
population = [2.5, 3.0, 3.7, 4.5, 5.3, 6.1, 6.9, 7.7]
# Create plot
plt.plot(years, population, 'r--')
plt.scatter(years, population, c='k', alpha=0.6)
# Add labels
plt.title('Population Growth')
plt.xlabel('Year')
plt.ylabel('Population (millions)')
# Show plot
plt.show()
See code explanation
Here's a line-by-line breakdown of the code:
- Import the
matplotlib.pyplot
asplt
(so that we can later refer tomatplotlib.pyplot
literally asplt
instead of having to type the full version ofmatplotlib.pyplot
. - Create
years
andpopulation
variables that will be used for subsequent steps in creating the plot. - Create the plot via
plt.plot()
and specifyingyears
andpopulation
as input arguments. This will create a line plot. - The line is changed to red dashed line via the third argument
r--
. - New line of code: Data points are added via the
plt.scatter
function. - Add labels to the plot as well as the X and Y axes.
- Finally, we're going to display the plot via
plt.show()
.
And the revised plot gives us additional black (translucent) data points:
Let's say that we want to add a second line (and data points) to the plot, we can by simply using plt.plot
and plt.scatter
again on a new set of data (i.e. population2
).
import matplotlib.pyplot as plt
# Data
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020]
population = [2.5, 3.0, 3.7, 4.5, 5.3, 6.1, 6.9, 7.7]
population2 = [7.7, 8.1, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0]
# Create plot
plt.plot(years, population, 'r--')
plt.scatter(years, population, c='r', alpha=0.6)
plt.plot(years, population2, 'g')
plt.scatter(years, population2, c='g', alpha=0.6)
# Add labels
plt.title('Population Growth')
plt.xlabel('Year')
plt.ylabel('Population (millions)')
# Show plot
plt.show()
See code explanation
Here's a line-by-line breakdown of the code:
- Import the
matplotlib.pyplot
asplt
(so that we can later refer tomatplotlib.pyplot
literally asplt
instead of having to type the full version ofmatplotlib.pyplot
. - Create
years
andpopulation
variables that will be used for subsequent steps in creating the plot. - Create the plot via
plt.plot()
and specifyingyears
andpopulation
as input arguments. This will create a line plot. - The line is changed to red dashed line via the third argument
r--
. - Data points are added via the
plt.scatter
function. - New line of code: A second green line with green translucent data points are also added here.
- Add labels to the plot as well as the X and Y axes.
- Finally, we're going to display the plot via
plt.show()
.
And the revised plot gives us a second green line (and while we're at it red data points for the first line):
In this lesson, we've learned how to install Matplotlib, create and customize a basic line plot, and finally displaying the plot in a Streamlit app. All of this will be taught incrementally so that you can evolve your plot from a basic plot to a more refined one.
🚀 Proceed to Project 4 to build a Streamlit app that shows the use of Matplotlib to create a plot in a Streamlit app.