Understanding PageRank: A Simple Walk Through Web Authority

PageRank is an algorithm designed to measure the importance of web pages. Rather than simply counting links, it assigns weight to links based on their source, making it more difficult to game than naive link counting. Originally developed as part of the early Google search engine, it remains one of the most elegant and intuitive approaches to ranking nodes in a graph.

This post offers a basic walkthrough of how PageRank works, using simple Python code to simulate the core idea.

1. The Idea Behind PageRank

In a web of documents, not all links are equal. A link from an authoritative page (e.g., a major university site) should carry more weight than one from a random blog. PageRank models this using a random surfer: someone who clicks on links at random. The rank of a page is the probability that the surfer lands there.

The rank of page P depends on the ranks of pages linking to it, divided by their number of outgoing links. Mathematically, it becomes a recursive problem solved via iteration.

2. Representing the Web

Let’s represent a tiny internet as a directed graph:

graph = {
    'A': ['B', 'C'],
    'B': ['C'],
    'C': ['A'],
    'D': ['C']
}
`

Here, page A links to B and C, B links to C, etc. D is a so-called dangling node, with no outbound links.

3. Simple PageRank Implementation

The PageRank vector is initialized uniformly and updated over multiple iterations using the power iteration method. We’ll also include a damping factor (typically 0.85), representing the chance that a surfer follows a link instead of jumping to a random page.

def pagerank(graph, damping=0.85, max_iter=100, tol=1e-6):
    N = len(graph)
    ranks = {node: 1/N for node in graph}
    for _ in range(max_iter):
        new_ranks = {}
        for node in graph:
            rank_sum = 0
            for src in graph:
                if node in graph[src]:
                    rank_sum += ranks[src] / len(graph[src])
                if not graph[src]:
                    rank_sum += ranks[src] / N  # handle dangling nodes
            new_ranks[node] = (1 - damping) / N + damping * rank_sum

        # Convergence check
        delta = sum(abs(new_ranks[n] - ranks[n]) for n in graph)
        ranks = new_ranks
        if delta < tol:
            break
    return ranks

Example Output:

ranks = pagerank(graph)
for page, score in ranks.items():
    print(f"{page}: {score:.4f}")

This gives us a ranking of pages based on their structural importance in the link graph.

4. From Research to Revolution

What made PageRank groundbreaking wasn't just its cleverness, it aligned with a powerful intuition: authority flows through connections. Unlike keyword stuffing or metadata tricks, PageRank was hard to manipulate at scale.

Later evolutions include:

  • Personalized PageRank, for individual users or domains
  • Topic-sensitive PageRank, for filtering by subject
  • HITS (Hyperlink-Induced Topic Search), a related algorithm that distinguishes hubs from authorities

5. Real-World Applications

Although Google's modern ranking system is vastly more complex, PageRank remains a conceptual backbone in graph analysis. It’s used in:

  • Citation analysis for academic papers
  • Social network influence modeling
  • Recommendation engines
  • Biological network centrality (e.g., protein-protein interaction networks)

Try it with NetworkX

For practical experimentation, Python’s networkx has a built-in PageRank method:

import networkx as nx

G = nx.DiGraph(graph)
ranks = nx.pagerank(G)

This opens doors to large-scale experiments which is useful not only for search engines but for understanding any system where connections matter more than raw counts.

Want to go deeper? Read the original 1998 paper: “The PageRank Citation Ranking: Bringing Order to the Web” by Page et al.

0
Subscribe to my newsletter

Read articles from Mirza Mahrab Hossain directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mirza Mahrab Hossain
Mirza Mahrab Hossain