Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to plot histogram in Python

Understanding Histograms

Before we delve into the coding aspect, let's take a moment to understand what a histogram is. Remember the bar graphs you used to make in school? A histogram is quite similar, but there's a subtle difference. While a bar graph represents categorical data, a histogram represents numerical data.

In a histogram, the data is divided into a set of intervals (or 'bins'), and the number of data points that fall into each bin are represented by the height of the corresponding bar. This provides a visual representation of data distribution.

Let's take an example. Imagine you have a bag of apples and you want to understand their weight distribution. You could weigh each apple and plot a histogram that shows how many apples fall into different weight categories (like 100-120g, 120-140g, and so on).

Getting Started with Python Libraries

To plot a histogram in Python, we're going to use two important libraries: matplotlib and numpy. Think of them as a set of pre-written code specifically designed to handle numerical operations and data visualization respectively.

Here's how to import them:

import matplotlib.pyplot as plt
import numpy as np

Creating A Basic Histogram

Now that we've set the groundwork, let's start with creating a simple histogram. For the sake of this tutorial, let's generate a random set of data using numpy's np.random.normal() function. This function generates a set of numbers that follow a normal distribution (don't worry about the term 'normal distribution'; it's just a way data can be spread).

# Generating 1000 random numbers
data = np.random.normal(size=1000)

Next, let's use the plt.hist() function from matplotlib to create a histogram from this data:

plt.hist(data, bins=30)
plt.show()

Here, bins=30 means we're dividing our data into 30 equal intervals. plt.show() is used to display the plot.

If you run this code, you'll see a histogram showing the distribution of the generated random numbers.

Customizing Your Histogram

The histogram we just created is pretty basic. Let's add some customization to make it more informative and aesthetically pleasing.

Adding Labels and Title

Firstly, let's add labels to our X and Y axis, and a title to our histogram:

plt.hist(data, bins=30)
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('My First Histogram')
plt.show()

Changing Bin Size

The choice of bin size can greatly affect your histogram's appearance. Let's try different bin sizes:

# Bin size of 10
plt.hist(data, bins=10)
plt.title('Histogram with Bin Size = 10')
plt.show()

# Bin size of 50
plt.hist(data, bins=50)
plt.title('Histogram with Bin Size = 50')
plt.show()

You'll notice that smaller bin size gives a more generalized view, while a larger bin size provides more detailed insight into data distribution.

Changing Color

Let's add some color to our histogram. You can choose any color you like:

plt.hist(data, bins=30, color='skyblue')
plt.title('A Colorful Histogram')
plt.show()

Wrapping Up

And there you have it! You're now equipped with the know-how to plot and customize histograms in Python. It's just like baking cookies - you start with a basic recipe and then add your favorite ingredients to make it your own.

Remember, the best way to learn is by doing. So, play around with different data sets, try different bin sizes, change colors, and see how it affects your histogram. Soon, you'll find that creating histograms is not just about coding, but an art of visualizing data.

In the next blog, we'll dive deeper into other types of data visualization techniques. Until then, keep exploring, keep learning, and most importantly, have fun with it. Happy coding!