Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

How to read a csv file in Python

Introduction

Learning to read data from a file is a crucial skill for any programmer. In this blog post, we'll be focusing on how to read data from a CSV (Comma Separated Values) file in Python. CSV files are widely used for storing and exchanging data since they're simple, portable, and easy to read and write.

Imagine a CSV file as a table, where each row represents a record and columns represent fields. Each field in a row is separated by a comma (,). You'll often encounter CSV files while working with databases, APIs, or any other data source.

In this tutorial, we'll go through the following steps:

  1. What is a CSV file?
  2. Reading a CSV file using the csv module
  3. Reading a CSV file using pandas
  4. Conclusion

What is a CSV file?

To better understand CSV files, let's take a look at an example. Consider the following file called students.csv, which contains information about students and their grades:

Name,Age,Grade
Alice,20,85
Bob,21,90
Charlie,22,92
David,19,88

Here, the first row contains the header (field names), and each subsequent row represents a student's record. The values in each row are separated by commas, hence the name "Comma Separated Values."

Now that we understand the structure of a CSV file, let's dive into how we can read this data using Python.

Reading a CSV file using the csv module

Python provides a built-in csv module to work with CSV files. We'll start by importing the module and then reading our sample students.csv file.

Import the csv module

First, let's import the csv module by adding the following line to our script:

import csv

Open the CSV file

Before we can read the contents of the file, we need to open it. We'll use Python's built-in open() function, which takes two parameters: the file name and the mode in which we want to open the file. In our case, we'll open the file in read mode ('r').

file = open('students.csv', 'r')

Create a CSV reader

Now that we have our file open, we can create a CSV reader object to read the contents of the file. The csv module provides a reader() function, which takes a file object as its parameter.

csv_reader = csv.reader(file)

Read the contents of the CSV file

We can now read the contents of the CSV file using a loop. The csv.reader() function returns an iterator, which allows us to loop through the rows of the CSV file one by one.

for row in csv_reader:
    print(row)

This will print the following output:

['Name', 'Age', 'Grade']
['Alice', '20', '85']
['Bob', '21', '90']
['Charlie', '22', '92']
['David', '19', '88']

As you can see, the CSV reader reads the file and returns a list of rows, where each row is a list of strings. The first row is the header, and the subsequent rows contain the student records.

Closing the CSV file

It's important to close the file once we're done reading it. We can do this using the close() method of the file object.

file.close()

Putting it all together

Here's the complete code to read the students.csv file using the csv module:

import csv

file = open('students.csv', 'r')
csv_reader = csv.reader(file)

for row in csv_reader:
    print(row)

file.close()

Reading a CSV file using pandas

While the csv module provides a simple way to read CSV files, it lacks many features needed for advanced data manipulation and analysis. For such tasks, the pandas library is a popular choice among Python developers.

pandas is a powerful data analysis library that provides data structures and functions needed to manipulate and analyze data in a simple and efficient manner. One of its key features is the ability to read and write data in various formats, including CSV, Excel, and SQL.

Install pandas

To use pandas, we first need to install it. You can install it using pip, the Python package manager, by running the following command:

pip install pandas

Import the pandas library

Once you have pandas installed, you can import it in your script like this:

import pandas as pd

We're using the alias pd for pandas, which is a common convention in the Python community.

Read the CSV file

To read a CSV file using pandas, we can use the read_csv() function, which takes the file name as its parameter and returns a DataFrame object.

A DataFrame is a two-dimensional tabular data structure with labeled axes (rows and columns). In our case, the rows represent student records and the columns represent the fields (name, age, and grade).

data = pd.read_csv('students.csv')

Access the data in the DataFrame

We can now access the data in the DataFrame using various methods and attributes. For example, we can print the first few rows of the DataFrame using the head() method:

print(data.head())

This will print the following output:

      Name  Age  Grade
0    Alice   20     85
1      Bob   21     90
2  Charlie   22     92
3    David   19     88

As you can see, pandas automatically detects the header and formats the data in a more readable manner.

We can also access individual columns of the DataFrame using their labels:

print(data['Name'])

This will print the following output:

0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object

Putting it all together

Here's the complete code to read the students.csv file using pandas:

import pandas as pd

data = pd.read_csv('students.csv')
print(data.head())

Conclusion

In this tutorial, we've learned two different ways to read a CSV file in Python: using the built-in csv module and the popular pandas library. The csv module is suitable for simple tasks, while pandas provides more advanced features and better performance for larger datasets.

By understanding how to read CSV files in Python, you're now equipped with the knowledge to import and manipulate data from a variety of sources. This is an essential skill in the world of data analysis and programming in general.