Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to remove column in Pandas

Understanding DataFrames in Pandas

Before we dive into the specifics of removing a column, let's first understand what we're working with. In Pandas, the primary data structure we use is called a DataFrame. You can think of a DataFrame as a table, similar to what you might see in a spreadsheet. It has rows and columns, with the columns often representing different variables and the rows representing individual records.

Identifying the Column to Remove

Imagine your DataFrame is like a bookshelf, with each column being a book. When you want to remove a book, you need to know its title. Similarly, in Pandas, each column has a label, and you'll need to know this label to remove the column.

Removing a Column Using drop

The drop method is the equivalent of taking a book off the shelf. It's a versatile tool that can remove both rows and columns. However, for the purpose of this blog, we'll focus on column removal.

Here's a simple example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

# Removing the 'Age' column
df = df.drop('Age', axis=1)
print(df)

In this code, axis=1 specifies that we want to remove a column, not a row. If you think of axis as the direction in which you're moving, axis=0 moves along the rows (downward), and axis=1 moves along the columns (across).

Using del to Remove a Column

Another way to remove a column is by using the del keyword. This is like grabbing a book from your shelf and giving it away. Once you do this, the book (or in our case, the column) is gone for good.

Here's how you can use del:

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

# Using `del` to remove the 'Age' column
del df['Age']
print(df)

The del keyword is straightforward and doesn't require specifying an axis. However, it's a bit more 'brutal' because it doesn't allow for the error-checking and flexibility that drop provides.

Using pop to Remove and Retrieve a Column

Sometimes when you remove a book from your shelf, you might want to read it one last time before it's gone. The pop method in Pandas allows you to do just that with columns. It removes the column and gives it back to you, so you can use it one last time.

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

# Using `pop` to remove and retrieve the 'Age' column
age_column = df.pop('Age')
print(df)
print(age_column)

After using pop, you'll see that the 'Age' column has been removed from df, and we also have a separate Series (a one-dimensional array in Pandas) containing the data from the 'Age' column.

Handling Errors When Removing Columns

When you're trying to take a book off your shelf, what happens if the book isn't there? You can't remove what doesn't exist. Pandas will raise a KeyError if you try to remove a column that isn't in the DataFrame.

To handle this gracefully, you can check if the column exists before attempting to remove it:

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

# Attempting to remove a non-existent column
column_to_remove = 'Age'
if column_to_remove in df.columns:
    df = df.drop(column_to_remove, axis=1)
else:
    print(f"The column {column_to_remove} does not exist in the DataFrame.")

This code snippet checks for the presence of the 'Age' column before trying to remove it, thus preventing a KeyError.

Removing Multiple Columns at Once

What if you want to remove more than one book from your shelf at the same time? In Pandas, you can remove multiple columns by passing a list of column names to the drop method.

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago'],
    'Occupation': ['Engineer', 'Doctor', 'Artist']
})

# Removing multiple columns
df = df.drop(['Age', 'Occupation'], axis=1)
print(df)

By providing a list of column names, you can remove them all in one go.

In-Place Removal

In all the examples above, we've been reassigning the DataFrame to df after removing a column. However, Pandas allows you to make changes directly to the original DataFrame using the inplace parameter.

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

# Removing a column in-place
df.drop('Age', axis=1, inplace=True)
print(df)

When you set inplace=True, the DataFrame df is updated directly, and there's no need to reassign it.

Conclusion: Tidying Up Your DataFrame

Removing columns from a DataFrame is like tidying up your bookshelf: it's about keeping what's necessary and clearing out the rest to make room for new information. Whether you use drop, del, or pop, each method has its own use case and can help you manipulate your data effectively.

As you become more comfortable with these operations, you'll find that managing the structure of your DataFrame becomes as intuitive as organizing your own bookshelf. And just like with books, handling your data with care and understanding will lead to a more enjoyable and productive experience in your programming journey.