Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to drop columns Pandas

Understanding DataFrames in Pandas

Before we dive into the specifics of dropping columns, let's quickly review what DataFrames are in the context of Pandas. Think of a DataFrame as a table, much like one you'd see in Excel or Google Sheets. It has rows and columns, with each column having a label, and each row being identifiable by an index. When working with data, it's common to realize that not all columns are necessary for your analysis, and that's where the concept of dropping columns becomes relevant.

Identifying Columns to Drop

The first step in dropping columns is to identify which ones you want to remove. This can be based on various factors such as the relevance of the data, the presence of too many missing values, or simply the need to simplify your dataset.

Imagine you have a table representing a garden. Each column is a type of flower, and each row is a day of the week recording how many of each flower you have. If you're only interested in roses and tulips, you might decide to drop the columns for daisies and sunflowers.

The drop Method: The Basics

Pandas provides a method called drop that allows you to remove columns (and rows, but we'll focus on columns here). The basic syntax to drop a column is as follows:

import pandas as pd

# Assume df is our DataFrame
df = pd.DataFrame({
    'Roses': [10, 15, 20],
    'Tulips': [5, 10, 15],
    'Daisies': [7, 8, 9],
    'Sunflowers': [1, 3, 5]
})

# Dropping the 'Daisies' column
df = df.drop('Daisies', axis=1)
print(df)

After running this code, the DataFrame df will no longer include the 'Daisies' column. Notice the axis=1 part? This tells Pandas that we want to drop a column, not a row. If you wanted to drop a row, you would use axis=0.

Dropping Multiple Columns

Sometimes you need to drop more than one column. This is just as straightforward as dropping a