Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to read excel file in Python

Introduction

In this blog post, we will learn how to read Excel files using Python, a powerful and easy-to-learn programming language. Excel files are widely used for storing, editing, and analyzing data. As a beginner in programming, you might come across situations where you need to read data from Excel files and perform some operations on it. By the end of this tutorial, you will have a solid understanding of how to read data from Excel files using Python.

To make this tutorial beginner-friendly, we will avoid using technical jargons as much as possible. If we do use any jargons, we will explain them in simple terms. We will also provide code examples, intuitions, and analogies to help you better understand the concepts discussed in this tutorial.

Prerequisites

Before diving into the tutorial, make sure you have the following prerequisites:

  1. Python installed on your machine. If you don't have Python installed, you can download it from Python's official website.
  2. A basic understanding of Python programming. If you're new to Python, you can check out Python's official tutorial or other online resources to get started.
  3. An Excel file that you want to read data from.

Choosing a Library

Python has a vast ecosystem of libraries that can help you accomplish various tasks easily. To read data from Excel files, there are several libraries available, but the two most popular ones are pandas and openpyxl. In this tutorial, we will be using pandas because it is more powerful, easier to use, and offers additional functionality for data manipulation and analysis.

Installing Pandas

Before we can use pandas, we need to install it. Open your terminal or command prompt and run the following command:

pip install pandas

This command will download and install the pandas library and its dependencies. Once the installation is complete, you can start using pandas in your Python scripts.

Reading Data from Excel

Now that we have installed pandas, let's learn how to read data from an Excel file. We will break this process down into the following steps:

  1. Importing the pandas library
  2. Reading data from the Excel file
  3. Displaying the data

Step 1: Importing the Pandas Library

To use pandas in our Python script, we first need to import it. Add the following line at the beginning of your script:

import pandas as pd

Here, we are importing the pandas library and giving it an alias pd. This is a common convention that allows us to write shorter and cleaner code.

Step 2: Reading Data from the Excel File

To read data from an Excel file, we will use the pd.read_excel() function, which takes the file path as its argument. Replace your_file_path.xlsx with the path to your Excel file:

data = pd.read_excel("your_file_path.xlsx")

Here, we are calling the pd.read_excel() function and passing the file path as an argument. The function reads the data from the file and returns a DataFrame, which is a two-dimensional tabular data structure with labeled axes (rows and columns). We are storing this DataFrame in a variable called data.

Step 3: Displaying the Data

To display the data stored in the DataFrame, we can simply use the print() function:

print(data)

Putting all these steps together, our complete Python script should look like this:

import pandas as pd

data = pd.read_excel("your_file_path.xlsx")
print(data)

Run this script, and you should see the data from your Excel file printed in the terminal or command prompt.

Additional Tips

Now that you know how to read data from Excel files using Python, here are some additional tips to help you work more efficiently with Excel data:

Selecting Specific Columns

In some cases, you might only be interested in specific columns from the Excel file. You can select these columns using the following code:

selected_columns = data[["Column1", "Column2", "Column3"]]

Replace Column1, Column2, and Column3 with the names of the columns you want to select. This code will create a new DataFrame containing only the specified columns.

Slicing Rows

To select a specific range of rows from the DataFrame, you can use slicing. For example, if you want to select the first 10 rows, you can use the following code:

first_ten_rows = data[:10]

This will create a new DataFrame containing only the first 10 rows of the original DataFrame.

Accessing Individual Cells

To access the value of an individual cell in the DataFrame, you can use the .at[] accessor. For example, to get the value of the cell in the second row and third column, you can use the following code:

cell_value = data.at[1, "Column3"]

Remember that row and column indices in pandas are zero-based, so the second row has an index of 1.

Conclusion

In this tutorial, we learned how to read data from Excel files using Python and the pandas library. We also discussed some additional tips for working with Excel data, such as selecting specific columns, slicing rows, and accessing individual cells. With this knowledge, you can now easily read and manipulate data from Excel files in your Python projects.