Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to make a scatter plot in matplotlib

Understanding Scatter Plots

Before diving into code, let's understand what a scatter plot is. Imagine you have a handful of pebbles and you throw them on the ground. Each pebble represents a point on the ground, and together, they form a pattern. In the world of data, a scatter plot is a type of graph that shows the relationship between two variables by displaying points at the intersection of their values. It's like a snapshot of where each pebble (or data point) falls on an imaginary grid based on two characteristics.

Setting Up Your Environment

To create a scatter plot in Python, we'll use a library called Matplotlib. Think of Matplotlib as a box of crayons that lets you draw different types of graphs. Before we can start drawing, we need to make sure we have these crayons ready to use. If you haven't already, install Matplotlib by running this command in your terminal or command prompt:

pip install matplotlib

Now, let's get our canvas ready by importing the necessary tools from Matplotlib:

import matplotlib.pyplot as plt

Here, plt is a common shorthand for Matplotlib's plotting module. It's like giving a nickname to your favorite crayon so you can quickly grab it when you need it.

Your First Scatter Plot

Let's start with something simple. We'll create a scatter plot of random points. Imagine you're plotting the favorite ice cream flavors and ages of a group of people. The flavor will be represented by numbers (just for simplicity) and the age by actual ages.

Here's how you can create this scatter plot:

# Sample data: ages and favorite ice cream flavors (as numbers)
ages = [25, 36, 47, 58, 22, 34, 49, 28]
flavors = [1, 3, 2, 5, 2, 4, 4, 1]

# Create a scatter plot
plt.scatter(ages, flavors)

# Add a title and labels to the axes
plt.title("Favorite Ice Cream Flavor by Age")
plt.xlabel("Age")
plt.ylabel("Ice Cream Flavor (Number)")

# Show the plot
plt.show()

When you run this code, you'll see a graph with dots scattered around. Each dot represents a person's age and their corresponding ice cream flavor preference.

Customizing Your Scatter Plot

Now that you've made a basic scatter plot, let's make it prettier and more informative. We can change the color and size of the points, add a grid for easier reading, and more.

Changing Point Colors and Sizes

Suppose you want to color the points based on another variable, like the number of scoops. A darker color could mean more scoops. And maybe the size of the points could represent how much they love ice cream.

Here's how you can do that:

# Additional data: number of scoops and love for ice cream
scoops = [2, 3, 1, 5, 4, 2, 3, 1]
love_for_ice_cream = [7, 9, 6, 10, 8, 5, 9, 6]

# Create a scatter plot with colors and sizes based on additional data
plt.scatter(ages, flavors, c=scoops, cmap='Greys', s=love_for_ice_cream)

# Add a color bar to show the color scale
plt.colorbar(label='Number of Scoops')

# Add a title and labels to the axes
plt.title("Favorite Ice Cream Flavor by Age with Scoops and Love")
plt.xlabel("Age")
plt.ylabel("Ice Cream Flavor (Number)")

# Show the plot
plt.show()

In this plot, the colors and sizes of the points give us more information at a glance. The cmap parameter changes the color palette (here, 'Greys' is used for shades of gray).

Adding a Grid and Customizing Axes

Sometimes, a grid can help us read the scatter plot more accurately. Let's add one and play with the axes a bit:

# Create a scatter plot with a grid
plt.scatter(ages, flavors, c=scoops, cmap='Greys', s=love_for_ice_cream)
plt.colorbar(label='Number of Scoops')

# Customize axes limits
plt.xlim(20, 60)  # Set x-axis limits
plt.ylim(0, 6)    # Set y-axis limits

# Add a grid
plt.grid(True, which='both', linestyle='--', linewidth=0.5)

# Add a title and labels to the axes
plt.title("Favorite Ice Cream Flavor by Age with Scoops and Love")
plt.xlabel("Age")
plt.ylabel("Ice Cream Flavor (Number)")

# Show the plot
plt.show()

The xlim and ylim functions control the range of the axes, and grid adds a grid with a specific style and width.

Understanding Your Data Through Scatter Plots

Scatter plots are not just about throwing points on a graph; they're tools for storytelling with data. By looking at how the points are spread out, you can start to see patterns. For example, if most of the points are clustered in one area, it might mean that people of a certain age group prefer a specific ice cream flavor. If the points form a line going upwards, it could suggest that as people get older, they prefer more scoops.

Saving Your Scatter Plot

Once you're happy with your scatter plot, you might want to save it to share with others or to include in a report. Here's how you can save your plot:

# Create a scatter plot
plt.scatter(ages, flavors, c=scoops, cmap='Greys', s=love_for_ice_cream)
plt.colorbar(label='Number of Scoops')

# Add a title and labels to the axes
plt.title("Favorite Ice Cream Flavor by Age with Scoops and Love")
plt.xlabel("Age")
plt.ylabel("Ice Cream Flavor (Number)")

# Save the plot as a PNG file
plt.savefig('scatter_plot.png', dpi=300)

# Show the plot
plt.show()

The savefig function saves the plot as an image file. The dpi parameter controls the quality of the image.

Conclusion

Creating a scatter plot with Matplotlib is like painting a picture of your data. Each point tells a small part of the story, and together, they reveal insights that might not be obvious at first glance. Whether you're exploring the relationship between age and ice cream preferences or something more complex, scatter plots are a powerful way to visualize and understand the nuances of your data.

As you continue your journey in programming and data visualization, remember that each graph you create is an opportunity to communicate something unique about the data you're working with. With each scatter plot, you're not just plotting points; you're laying out a constellation of information that can illuminate understanding for yourself and others. So keep experimenting with different styles and customizations, and enjoy the process of discovering and sharing the stories hidden within your data.