Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to add column to Pandas dataframe

Understanding DataFrames in Pandas

Before we dive into the specifics of adding a column, let's familiarize ourselves with what a DataFrame is in the context of Pandas. A DataFrame can be thought of as a table, much like the ones you might create in Excel. It has rows and columns, with each column having a name and each row having an index. When you're working with data in Python using Pandas, you're often manipulating these DataFrames - adding columns, removing rows, sorting the data, and so on.

Adding a New Column with a Default Value

Imagine you have a list of fruits and their prices, and you want to add a column that shows the quantity of each fruit in stock. Let's start by creating a simple DataFrame.

import pandas as pd

# Create a DataFrame with fruits and prices
data = {
    'Fruit': ['Apple', 'Banana', 'Cherry'],
    'Price': [1.2, 0.5, 2.0]
}
df = pd.DataFrame(data)

print(df)

This will output:

    Fruit  Price
0   Apple    1.2
1  Banana    0.5
2  Cherry    2.0

Now, let's add a new column called 'Quantity' with a default value of 10.

df['Quantity'] = 10

print(df)

After adding the 'Quantity' column, the DataFrame looks like this:

    Fruit  Price  Quantity
0   Apple    1.2        10
1  Banana    0.5        10
2  Cherry    2.0        10

Inserting a Column with Different Values for Each Row

What if we want to specify different quantities for each fruit? We can do this by assigning a list to the new column where each element of the list corresponds to a row in the DataFrame.

df['Quantity'] = [15, 30, 45]

print(df)

The DataFrame now reflects the different quantities:

    Fruit  Price  Quantity
0   Apple    1.2        15
1  Banana    0.5        30
2  Cherry    2.0        45

Using the assign Method to Add Columns

Another way to add columns to a DataFrame is by using the assign method. This method is useful for chaining commands or when you want to create temporary DataFrames.

df = df.assign(In_Stock = ['Yes', 'No', 'Yes'])

print(df)

The DataFrame with the 'In_Stock' column:

    Fruit  Price  Quantity In_Stock
0   Apple    1.2        15      Yes
1  Banana    0.5        30       No
2  Cherry    2.0        45      Yes

Adding a Column Based on Other Columns

Sometimes, you might want to create a new column whose values depend on other columns. For instance, you might want to calculate the total value of each fruit in stock. You can do this by multiplying the 'Price' column by the 'Quantity' column.

df['Total_Value'] = df['Price'] * df['Quantity']

print(df)

This results in a new 'Total_Value' column:

    Fruit  Price  Quantity In_Stock  Total_Value
0   Apple    1.2        15      Yes         18.0
1  Banana    0.5        30       No         15.0
2  Cherry    2.0        45      Yes         90.0

Using the insert Method to Add Columns at Specific Positions

If you want to add a column at a specific position, rather than at the end, you can use the insert method. For example, let's say you want to add a 'Color' column between 'Fruit' and 'Price'.

df.insert(1, 'Color', ['Red', 'Yellow', 'Red'])

print(df)

The DataFrame now has the 'Color' column in the desired position:

    Fruit   Color  Price  Quantity In_Stock  Total_Value
0   Apple     Red    1.2        15      Yes         18.0
1  Banana  Yellow    0.5        30       No         15.0
2  Cherry     Red    2.0        45      Yes         90.0

Using Functions to Populate a New Column

For more complex operations, you can use functions to determine the values of the new column. For example, if you want to add a column that categorizes the fruits based on their price.

def categorize_price(price):
    if price < 1.0:
        return 'Cheap'
    elif price < 2.0:
        return 'Moderate'
    else:
        return 'Expensive'

df['Price_Category'] = df['Price'].apply(categorize_price)

print(df)

The DataFrame with the 'Price_Category' column:

    Fruit   Color  Price  Quantity In_Stock  Total_Value Price_Category
0   Apple     Red    1.2        15      Yes         18.0       Moderate
1  Banana  Yellow    0.5        30       No         15.0          Cheap
2  Cherry     Red    2.0        45      Yes         90.0      Expensive

Conclusion: The Flexibility of Adding Columns

In this post, we've explored several methods for adding columns to a Pandas DataFrame. Whether you're setting a default value, using a function to calculate the new column, or inserting it at a specific position, Pandas offers a flexible set of tools to help you manage and analyze your data. Remember, adding columns is just one part of the data wrangling process, and as you become more comfortable with these operations, you'll find that they are like the ingredients in a recipe, each contributing to the final dish - your analyzed and understood dataset. Keep experimenting and discovering the various functionalities that Pandas provides, and you'll be well on your way to becoming a proficient data handler!