Altcademy - a Forbes magazine logo Best Coding Bootcamp 2023

Measure the Closeness Centrality of Nodes in a Graph

Introduction: Closeness Centrality in Graphs

Closeness centrality is a measure of how central a node is in a graph. It is computed as the inverse of the average shortest path length to all other nodes in the graph. In other words, a node with high closeness centrality is close to all other nodes in the graph, making it an important node for information dissemination, resource allocation, or network connectivity.

This tutorial will walk you through the steps to measure the closeness centrality of nodes in a graph, using a real-world example. We will also provide a code implementation in Python, along with an explanation of the code.

Real-world Examples and Scenarios

Closeness centrality is used in various real-world applications, such as:

  1. Social network analysis: Identifying influencers or key individuals in a social network, who can spread information quickly and efficiently.
  2. Transportation networks: Identifying central nodes in a transportation network that can be used to optimize routing and reduce travel times.
  3. Biological networks: Identifying key proteins or genes in a biological network, which may have important functional roles or be potential drug targets.

Real-world Scenario: Identifying Influencers in a Social Network

Consider a social network where nodes represent individuals and edges represent friendships between them. We want to identify the most influential individuals in the network, who can quickly spread information or influence others. This problem can be framed as measuring the closeness centrality of nodes in the graph.

Problem Statement and Definition

Given a graph G = (V, E) with nodes V and edges E, the closeness centrality C_c(v) of a node v is defined as the inverse of the average shortest path length from v to all other nodes in the graph:

C_c(v) = 1 / (Σ_{u ∈ V, u ≠ v} d(u, v) / (n - 1))

where d(u, v) is the shortest path length between nodes u and v, and n is the total number of nodes in the graph.

The problem is to compute the closeness centrality of all nodes in the graph and identify the node(s) with the highest closeness centrality.

Real-world Problem to Code Solution

We will now implement a solution in Python to compute the closeness centrality of nodes in a social network graph. The graph will be represented as an adjacency list, and we will use the breadth-first search (BFS) algorithm to compute the shortest path lengths between nodes.

def bfs_shortest_path(graph, start_node):
    # Initialize the distances and queue
    distances = {node: float('inf') for node in graph}
    distances[start_node] = 0
    queue = [start_node]

    # Iterate through the queue
    while queue:
        current_node = queue.pop(0)

        # Check neighbors of the current node
        for neighbor in graph[current_node]:
            # Update distances if a shorter path is found
            if distances[current_node] + 1 < distances[neighbor]:
                distances[neighbor] = distances[current_node] + 1
                queue.append(neighbor)

    return distances

def closeness_centrality(graph):
    centrality = {}
    num_nodes = len(graph)

    for node in graph:
        # Compute shortest path lengths from the node to all other nodes
        shortest_path_lengths = bfs_shortest_path(graph, node).values()

        # Compute the average shortest path length and closeness centrality
        avg_path_length = sum(shortest_path_lengths) / (num_nodes - 1)
        centrality[node] = 1 / avg_path_length

    return centrality

To test the code with a sample social network graph, we can define the graph as an adjacency list and call the closeness_centrality function:

# Sample social network graph as an adjacency list
graph = {
    'A': ['B', 'C'],
    'B': ['A', 'C', 'D'],
    'C': ['A', 'B', 'D'],
    'D': ['B', 'C', 'E'],
    'E': ['D']
}

# Compute closeness centrality
centrality = closeness_centrality(graph)
print(centrality)

Intuitions and Analogies

The code solution consists of two main functions: bfs_shortest_path and closeness_centrality. The bfs_shortest_path function computes the shortest path lengths from a given start node to all other nodes in the graph using the BFS algorithm. The closeness_centrality function iterates through all nodes in the graph, computes the average shortest path length from each node to all other nodes, and calculates its closeness centrality.

The intuition behind using BFS for computing shortest path lengths is that BFS explores nodes in increasing order of distance from the start node. This ensures that we find the shortest paths to all other nodes in the graph.

Extending the Solution to Other Real-world Problems

The code solution provided can be easily adapted to solve other real-world problems related to closeness centrality, such as identifying central nodes in transportation networks or key proteins in biological networks. By representing the problem as a graph and modifying the adjacency list accordingly, the closeness_centrality function can be used to compute the closeness centrality of nodes in any graph.