Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to plot scatter plot in matplotlib

Understanding Scatter Plots

Before diving into the technicalities of plotting a scatter plot using Matplotlib, let's first understand what a scatter plot is. Imagine you have a bunch of sticky notes, and you want to arrange them on a wall to see if there's any pattern in how they relate to each other. Each note has two pieces of information, like the amount of time you study and the grades you get. If you stick them on the wall, with the horizontal direction representing study time and the vertical direction representing grades, you've essentially created a scatter plot!

Scatter plots are a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points on the plot are color-coded, one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

Setting Up Your Environment

To start plotting, you'll need to set up your programming environment. This means you'll need Python installed on your computer along with the Matplotlib library. You can install Matplotlib by running the command pip install matplotlib in your terminal or command prompt. Once you have Matplotlib installed, you're ready to start coding.

Importing Matplotlib

The first step in your code is to import the Matplotlib library, specifically its pyplot module. This module provides a MATLAB-like interface for making plots and is commonly used for its simplicity. Here's how you do it:

import matplotlib.pyplot as plt

By importing plt, you're bringing in all the plotting capabilities of Matplotlib into your program.

Creating Your First Scatter Plot

Now, let's create a simple scatter plot. You'll need two lists of numbers: one for the x-axis (horizontal) and one for the y-axis (vertical). These lists should be of the same length because each pair of numbers will correspond to a point on the plot.

Here's an example:

# Sample data
x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 7, 8, 2, 5]

# Creating scatter plot
plt.scatter(x, y)
plt.show()

When you run this code, a window will pop up showing your scatter plot with points corresponding to the pairs of values from your x and y lists.

Customizing Your Scatter Plot

A plain scatter plot can be informative, but adding customization can make it clearer and more appealing. Here are some ways to customize your scatter plot:

Adding Titles and Labels

To make your plot more understandable, you should add a title and labels to your axes:

plt.scatter(x, y)

# Adding a title
plt.title('My First Scatter Plot')

# Adding labels to the axes
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')

plt.show()

Changing Point Colors and Sizes

You can change the color and size of the points to convey more information or to make your plot more visually appealing:

plt.scatter(x, y, c='red', s=10)  # 'c' is for color, 's' is for size

plt.title('Scatter Plot with Red Points')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')

plt.show()

Adding a Grid

A grid can help in estimating the position of the points:

plt.scatter(x, y, c='green', s=40)

plt.title('Scatter Plot with a Grid')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')

# Adding a grid
plt.grid(True)

plt.show()

Plotting Multiple Groups

Often, you'll want to compare different groups in your scatter plot. You can do this by plotting multiple sets of points and giving them different colors:

# Data for three different groups
x1 = [5, 7, 8, 7, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6]
y1 = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 7, 8, 2, 5]

x2 = [2, 2, 3, 3, 2, 7, 9, 10, 11, 14, 14, 13, 12, 9, 11, 15, 15, 16]
y2 = [10, 11, 12, 13, 14, 10, 8, 5, 6, 1, 10, 11, 12, 9, 8, 7, 2, 6]

# Plotting the first group
plt.scatter(x1, y1, c='blue', s=10, label='Group 1')

# Plotting the second group
plt.scatter(x2, y2, c='orange', s=10, label='Group 2')

plt.title('Scatter Plot with Multiple Groups')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')

# Adding a legend to differentiate the groups
plt.legend()

plt.grid(True)
plt.show()

Understanding the Code Intuitively

Think of your scatter plot code as a recipe. You start by gathering your ingredients (import matplotlib.pyplot as plt), then you prepare your main dish (the scatter() function), and finally, you garnish it to make it presentable (adding titles, labels, and customizations).

Conclusion

Creating scatter plots with Matplotlib can be likened to storytelling. Each plot you create is a visual story of the data you're presenting. With the simple steps outlined above, you've learned to tell your story compellingly, using colors, labels, and grouping to highlight the key messages in your data.

Remember, the best plots are not just technically accurate but also intuitive and accessible to your audience. As you become more comfortable with Matplotlib, you'll find yourself able to wield these plots like an artist uses their palette, combining technical know-how with creativity to convey insights in the most impactful way. Keep practicing, and soon you'll be plotting like a pro, ready to tackle even more complex data visualizations with confidence and ease.