Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to add a column to a dataframe in Python

Introduction

As you learn programming, you will sometimes need to manipulate data to make it more useful or easier to understand. One common data structure that you'll encounter is the dataframe. A dataframe is a two-dimensional table that can store and manage data. In Python, the Pandas library is a popular tool for working with dataframes.

In this tutorial, we will explore how to add a column to a dataframe in Python using the Pandas library. We will cover different methods to achieve this, along with examples to help you understand the concepts better. By the end of this tutorial, you will learn how to add a new column to an existing dataframe with ease.

What is a Dataframe?

A dataframe is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). In simpler terms, it's like a table in a spreadsheet program (like Microsoft Excel or Google Sheets) where data is organized in rows and columns. Each column in a dataframe can be of a different data type (e.g., numbers, strings, dates), which makes it extremely versatile for handling real-world data.

Getting Started with Pandas

Before we dive into adding columns to a dataframe, let's first ensure that you have the Pandas library installed in your Python environment. You can install Pandas using pip:

pip install pandas

Once Pandas is installed, you can import it in your Python script or notebook like this:

import pandas as pd

The import pandas as pd statement is a common convention that allows you to refer to the Pandas library using the shorter alias 'pd' in your code.

Creating a Dataframe

Let's start by creating a simple dataframe to work with. We will create a dataframe with three columns: 'Name', 'Age', and 'City'. The dataframe will have four rows of data.

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Austin']
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles
3    David   40         Austin

Now that we have a dataframe to work with, let's explore different methods for adding a new column to it.

Method 1: Adding a Column using Bracket Notation

One of the simplest ways to add a new column to a dataframe is by using bracket notation. You can add a new column by specifying the column name inside square brackets and assigning the values as a list, Series, or another dataframe. Let's add a new column named 'Salary' to our example dataframe.

df['Salary'] = [70000, 80000, 90000, 100000]
print(df)

Output:

      Name  Age           City  Salary
0    Alice   25       New York   70000
1      Bob   30  San Francisco   80000
2  Charlie   35    Los Angeles   90000
3    David   40         Austin  100000

As you can see, a new column named 'Salary' has been added to the dataframe with the specified values.

Method 2: Adding a Column using the assign() Function

Pandas provides an assign() function that allows you to add one or more new columns to a dataframe. The assign() function creates a new dataframe with the additional columns rather than modifying the existing dataframe. This can be useful if you want to keep the original dataframe unchanged.

Here's how you can use the assign() function to add a new column named 'Experience' to our example dataframe:

experience = [2, 5, 7, 10]
df_new = df.assign(Experience=experience)
print(df_new)

Output:

      Name  Age           City  Salary  Experience
0    Alice   25       New York   70000           2
1      Bob   30  San Francisco   80000           5
2  Charlie   35    Los Angeles   90000           7
3    David   40         Austin  100000          10

Notice that we have created a new dataframe 'df_new' with the additional 'Experience' column, while the original dataframe 'df' remains unchanged.

Method 3: Adding a Column with Derived Values

In many cases, you may want to add a new column to a dataframe with values derived from existing columns. You can perform arithmetic operations, apply functions, or use conditional statements to create new columns based on existing data.

For example, let's add a new column 'Income Tax' to our example dataframe, where the tax is calculated as 20% of the 'Salary' column.

df['Income Tax'] = df['Salary'] * 0.2
print(df)

Output:

      Name  Age           City  Salary  Income Tax
0    Alice   25       New York   70000     14000.0
1      Bob   30  San Francisco   80000     16000.0
2  Charlie   35    Los Angeles   90000     18000.0
3    David   40         Austin  100000     20000.0

In this example, we have added a new column 'Income Tax' to the dataframe by multiplying the 'Salary' column values by 0.2.

Method 4: Adding a Column using a Function

Sometimes, you may want to apply a custom function to each value in an existing column to create a new column. You can use the apply() function for this purpose. The apply() function takes a function as an argument and applies it to each element in the specified column.

Let's say we want to add a column 'Salary Category' to our dataframe, where the category is determined based on the salary range:

  • Low: Salary <= 75,000
  • Medium: 75,000 < Salary <= 90,000
  • High: Salary > 90,000

We can define a function get_salary_category() and use the apply() function to create the new column:

def get_salary_category(salary):
    if salary <= 75000:
        return 'Low'
    elif salary <= 90000:
        return 'Medium'
    else:
        return 'High'

df['Salary Category'] = df['Salary'].apply(get_salary_category)
print(df)

Output:

      Name  Age           City  Salary  Income Tax Salary Category
0    Alice   25       New York   70000     14000.0             Low
1      Bob   30  San Francisco   80000     16000.0          Medium
2  Charlie   35    Los Angeles   90000     18000.0          Medium
3    David   40         Austin  100000     20000.0            High

In this example, the 'Salary Category' column has been added to the dataframe using the apply() function and a custom function get_salary_category().

Conclusion

In this tutorial, we have covered the basics of dataframes in Python using the Pandas library and explored different methods for adding new columns to a dataframe. We've discussed how to add a column using bracket notation, the assign() function, derived values, and custom functions with the apply() function.

As you continue to learn programming, you will find that adding and manipulating columns in dataframes is a common task. With these techniques in your toolbox, you'll be well-equipped to handle real-world data analysis tasks in Python.