Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to delete a column in Pandas

Understanding Pandas DataFrame

Before diving into the process of deleting a column, it's essential to grasp what a DataFrame is. In simple terms, a DataFrame is like a table or a spreadsheet that you can manipulate with code. It's one of the primary data structures in Pandas, a popular Python library for data analysis. Think of a DataFrame as a collection of columns, each of which can be thought of as a list of entries. These entries can be numbers, strings, or even more complex objects.

Adding and Viewing Columns in a DataFrame

To understand how to delete a column, let's first quickly go over how to add one and view it. This will give us a better idea of how a DataFrame is structured. Here's a simple example:

import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Adding a new column
df['City'] = ['New York', 'Los Angeles', 'Chicago']

# Viewing the DataFrame
print(df)

The output will look like this:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

Deleting a Column Using drop

Now, let's say we want to remove the 'City' column. Pandas provides a method called drop that allows us to do this. Here's how you can use it:

# Deleting the 'City' column
df = df.drop('City', axis=1)

# Viewing the DataFrame after deletion
print(df)

The axis=1 argument tells Pandas that we want to drop a column, not a row (which would be axis=0). The output will be:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Deleting Multiple Columns

What if we want to delete more than one column at a time? We can pass a list of column names to the drop method. For example:

# Adding columns 'City' and 'Country' again for demonstration
df['City'] = ['New York', 'Los Angeles', 'Chicago']
df['Country'] = ['USA', 'USA', 'USA']

# Deleting the 'City' and 'Country' columns
df = df.drop(['City', 'Country'], axis=1)

# Viewing the DataFrame after deletion
print(df)

This will result in the 'City' and 'Country' columns being removed:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Using del to Delete a Column

Another way to delete a column is by using the del keyword. This is more straightforward but less flexible than drop. Here's how it works:

# Adding the 'City' column back for demonstration
df['City'] = ['New York', 'Los Angeles', 'Chicago']

# Deleting the 'City' column
del df['City']

# Viewing the DataFrame after deletion
print(df)

The 'City' column will be gone, and you'll see:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Using pop to Delete and Retrieve a Column

Sometimes, you might want to delete a column but also keep its data for later use. The pop method allows you to do just that. It deletes the column and returns it. Here's an example:

# Adding the 'City' column back for demonstration
df['City'] = ['New York', 'Los Angeles', 'Chicago']

# Deleting the 'City' column and storing its data
city_data = df.pop('City')

# Viewing the DataFrame after deletion
print(df)
print(city_data)

This will print the DataFrame without the 'City' column and the data that was in the 'City' column:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

And the city_data will be:

0       New York
1    Los Angeles
2        Chicago
Name: City, dtype: object

In-Place Deletion

When using drop, you might have noticed that we reassigned the DataFrame (df = df.drop(...)) to apply the deletion. If you want to avoid this and modify the DataFrame directly, you can use the inplace=True parameter:

# Adding the 'City' column back for demonstration
df['City'] = ['New York', 'Los Angeles', 'Chicago']

# Deleting the 'City' column in place
df.drop('City', axis=1, inplace=True)

# Viewing the DataFrame after deletion
print(df)

Now the 'City' column will be deleted without needing to reassign df.

Intuition Behind Deleting Columns

Imagine your DataFrame as a bookshelf, and each column is a book. When you use drop, it's like you're telling someone, "Please remove this book and give me the updated shelf." If you use del, it's as if you're directly pulling the book out yourself. With pop, you're asking someone to remove the book but also hand it to you, so you can read it later.

Conclusion

Deleting columns in Pandas is a fundamental task that you'll often encounter in data manipulation. Whether you choose to use drop, del, or pop will depend on your specific needs. Remember that drop is versatile and can handle multiple columns at once, del is straightforward but limited to one column, and pop gives you the additional benefit of retrieving the deleted data. By understanding these methods, you'll have the tools to keep your data tidy and focused, ensuring that your analysis is as clear and efficient as possible. Just like maintaining a well-organized bookshelf, keeping your DataFrame neat will make your data storytelling all the more compelling.