Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to add row to Pandas dataframe

Understanding DataFrames in Pandas

Before we dive into the process of adding rows to a DataFrame, it's essential to understand what a DataFrame is. Think of a DataFrame as a table, much like one you'd find in a spreadsheet program like Microsoft Excel. It has rows and columns, with the rows representing individual records (or observations) and the columns representing attributes (or variables) of those records.

In Pandas, a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). Imagine it as a grid that is flexible and can hold data of different types.

Setting Up Your Environment

To follow along with the code examples in this blog, you need to have Python and Pandas installed on your computer. If you haven't installed Pandas yet, you can do so using pip, Python's package installer. Run the following command in your terminal or command prompt:

pip install pandas

Once you have Pandas installed, you can import it into your Python script or notebook using:

import pandas as pd

We use pd as an alias for Pandas to make our code cleaner and to type less when calling Pandas functions.

Creating a Simple DataFrame

Let's start by creating a simple DataFrame. This will give us a foundation to work with when we add new rows.

import pandas as pd

# Create a DataFrame with some data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)

print(df)

This code will output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

Adding a Single Row to a DataFrame

Now, suppose we want to add a new person's details to our DataFrame. We can do this by using the append method.

# New data to add as a row
new_row = {'Name': 'David', 'Age': 40, 'City': 'Miami'}

# Append the new row to the DataFrame
df = df.append(new_row, ignore_index=True)

print(df)

After appending, the DataFrame will look like this:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40        Miami

Notice that we set ignore_index=True. This tells Pandas to ignore the index and create a new one after appending the row. If we didn't set this, Pandas would expect the new row to have an index that fits into the existing index.

Adding Multiple Rows

What if we have more than one row to add? We can do this by using a list of dictionaries, where each dictionary represents a row.

# New rows to add as a list of dictionaries
new_rows = [
    {'Name': 'Eve', 'Age': 28, 'City': 'Denver'},
    {'Name': 'Frank', 'Age': 33, 'City': 'Austin'}
]

# Append the new rows to the DataFrame
df = df.append(new_rows, ignore_index=True)

print(df)

Our DataFrame now includes Eve and Frank:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40        Miami
4      Eve   28       Denver
5    Frank   33       Austin

Using loc to Add Rows

Another way to add rows to a DataFrame is by using the loc indexer. This method is more direct and can be more intuitive for some users.

# Adding a new row using the loc indexer
df.loc[len(df.index)] = ['Grace', 27, 'Seattle']

print(df)

The DataFrame will now include Grace:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40        Miami
4      Eve   28       Denver
5    Frank   33       Austin
6    Grace   27      Seattle

We use len(df.index) to find the next available index position. It's like saying, "Place this new data at the end of the DataFrame."

Using pd.concat to Add Rows

If you have a large number of rows to add or if you're combining two DataFrames, pd.concat can be very efficient. This function concatenates DataFrames along a particular axis, which is rows (axis=0) by default.

# New DataFrame to concatenate
new_data = pd.DataFrame({
    'Name': ['Hannah', 'Ian'],
    'Age': [22, 20],
    'City': ['Philadelphia', 'San Francisco']
})

# Concatenate the DataFrames
df = pd.concat([df, new_data], ignore_index=True)

print(df)

Our DataFrame has grown with Hannah and Ian:

      Name  Age            City
0    Alice   25        New York
1      Bob   30     Los Angeles
2  Charlie   35         Chicago
3    David   40           Miami
4      Eve   28          Denver
5    Frank   33          Austin
6    Grace   27         Seattle
7   Hannah   22    Philadelphia
8      Ian   20   San Francisco

Handling Different Column Names

Sometimes, the row you want to add might not have the same columns as your DataFrame. In such cases, Pandas will align the data correctly and fill any missing values with NaN (Not a Number), which signifies missing data.

# New row with different columns
new_row_different_columns = {'Name': 'Jack', 'Age': 26, 'Profession': 'Engineer'}

# Append the new row with different columns to the DataFrame
df = df.append(new_row_different_columns, ignore_index=True)

print(df)

The DataFrame will show NaN for the missing 'City' value for Jack:

      Name  Age            City  Profession
0    Alice   25        New York         NaN
1      Bob   30     Los Angeles         NaN
2  Charlie   35         Chicago         NaN
3    David   40           Miami         NaN
4      Eve   28          Denver         NaN
5    Frank   33          Austin         NaN
6    Grace   27         Seattle         NaN
7   Hannah   22    Philadelphia         NaN
8      Ian   20   San Francisco         NaN
9     Jack   26             NaN    Engineer

Conclusion

Adding rows to a DataFrame is a common task in data manipulation. Whether you're adding a single record or merging large datasets, Pandas provides a variety of methods to achieve this. By understanding append, loc, and pd.concat, you can handle most use cases efficiently.

Remember, when working with data, it's like nurturing a garden. Each row is a new plant, and you must know where to place it and how it fits into the ecosystem of your dataset. With the tools you've learned today, you're well-equipped to expand your data garden in Pandas, one row at a time. Keep practicing, and soon you'll be cultivating complex data landscapes with ease and confidence!