Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to add columns to Pandas dataframe

Understanding DataFrames in Pandas

Before we delve into the process of adding columns to a Pandas DataFrame, it's essential to grasp what a DataFrame is. Think of a DataFrame as a table, much like one you would find in a spreadsheet program like Microsoft Excel. This table is composed of rows and columns, with the rows representing individual records (like different people) and the columns representing attributes or features of these records (like age, height, etc.).

Pandas is a powerful library in Python that allows for easy manipulation of these tables, including adding, deleting, and modifying rows and columns.

Adding a New Column with a Default Value

The simplest way to add a new column to a DataFrame is by assigning a default value to all rows. Imagine you have a list of fruits and their prices, and you want to add a column indicating the stock status with a default value of 'In Stock'.

import pandas as pd

# Sample DataFrame
data = {'Fruit': ['Apple', 'Banana', 'Cherry'],
        'Price': [1.2, 0.5, 2.0]}
df = pd.DataFrame(data)

# Adding a new column with a default value
df['Stock Status'] = 'In Stock'

print(df)

This code will output:

    Fruit  Price Stock Status
0   Apple    1.2      In Stock
1  Banana    0.5      In Stock
2  Cherry    2.0      In Stock

Adding a Column with Different Values for Each Row

Sometimes, you'll want to add a column where each row has a different value. Let's say you now have the quantity for each fruit and want to add that as a new column. You can do this by assigning a list of values to the new column.

# Quantities for each fruit
quantities = [15, 30, 7]

# Adding a new column with different values
df['Quantity'] = quantities

print(df)

The DataFrame now looks like this:

    Fruit  Price Stock Status  Quantity
0   Apple    1.2      In Stock        15
1  Banana    0.5      In Stock        30
2  Cherry    2.0      In Stock         7

Adding a Column Based on Operations with Existing Columns

In many cases, you might want to add a column that is a result of some calculation based on other columns. For instance, if you want to calculate the total price for each fruit based on its price and quantity, you can do the following:

# Adding a new column by calculating total price
df['Total Price'] = df['Price'] * df['Quantity']

print(df)

This will add a new column 'Total Price' to the DataFrame:

    Fruit  Price Stock Status  Quantity  Total Price
0   Apple    1.2      In Stock        15         18.0
1  Banana    0.5      In Stock        30         15.0
2  Cherry    2.0      In Stock         7         14.0

Using the assign Method to Add Columns

Another way to add columns to a DataFrame is by using the assign method. This method is useful when you want to add multiple columns at once or when you want to create a new DataFrame while keeping the original unchanged.

# Using assign to add a new column
new_df = df.assign(Discounted_Price=lambda x: x['Price'] * 0.9)

print(new_df)

This will create a new DataFrame with an additional 'Discounted_Price' column:

    Fruit  Price Stock Status  Quantity  Total Price  Discounted_Price
0   Apple    1.2      In Stock        15         18.0              1.08
1  Banana    0.5      In Stock        30         15.0              0.45
2  Cherry    2.0      In Stock         7         14.0              1.80

Inserting a Column at a Specific Position

Sometimes the order of columns matters, and you might want to insert a new column at a specific position. Pandas provides the insert method, which allows you to specify the location for the new column.

Let's say you want to add a 'Country of Origin' column as the second column in the DataFrame:

# Inserting a new column at a specific position
df.insert(1, 'Country of Origin', ['USA', 'Ecuador', 'Turkey'])

print(df)

The DataFrame now has the new column inserted at the specified position:

    Fruit Country of Origin  Price Stock Status  Quantity  Total Price
0   Apple              USA    1.2      In Stock        15         18.0
1  Banana          Ecuador    0.5      In Stock        30         15.0
2  Cherry           Turkey    2.0      In Stock         7         14.0

Adding a Column from Another DataFrame

Sometimes, you have two DataFrames and you want to add a column from one to the other. This is common when you have related data in separate tables. To do this, we can use the merge function.

Imagine we have another DataFrame with the 'Fruit' column and a 'Color' column:

# Another DataFrame with fruit colors
colors_df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Cherry'],
                          'Color': ['Red', 'Yellow', 'Red']})

# Merging the new column into the original DataFrame
df = pd.merge(df, colors_df, on='Fruit')

print(df)

After merging, the 'Color' column is added to our original DataFrame:

    Fruit Country of Origin  Price Stock Status  Quantity  Total Price   Color
0   Apple              USA    1.2      In Stock        15         18.0     Red
1  Banana          Ecuador    0.5      In Stock        30         15.0  Yellow
2  Cherry           Turkey    2.0      In Stock         7         14.0     Red

Handling Missing Values When Adding Columns

When adding columns, you might encounter situations where some data is missing. Pandas handles missing values using a special marker called NaN (Not a Number). If you're adding a column with missing values, those will be represented as NaN.

# Adding a column with a missing value
df['Season'] = ['Fall', 'Summer', None]  # None represents a missing value

print(df)

The new 'Season' column includes a missing value:

    Fruit Country of Origin  Price Stock Status  Quantity  Total Price   Color  Season
0   Apple              USA    1.2      In Stock        15         18.0     Red    Fall
1  Banana          Ecuador    0.5      In Stock        30         15.0  Yellow  Summer
2  Cherry           Turkey    2.0      In Stock         7         14.0     Red    None

Conclusion: Expanding Your DataFrame Horizons

Adding columns to a Pandas DataFrame is a fundamental skill that opens up a world of possibilities for data manipulation and analysis. Whether you're setting default values, calculating new data based on existing columns, or merging information from multiple sources, Pandas provides a variety of ways to enrich your data.

As you continue to experiment with adding columns, remember that each method serves different purposes and that choosing the right one depends on your specific needs. With practice, you'll find that these techniques become second nature, allowing you to manage and analyze data with ease and confidence. So, go forth and transform your DataFrames into treasure troves of insightful information!