Dijkstra’s Algorithm vs. Tsinghua’s Breakthrough


A step‑by‑step guide from first principles to advanced ideas, with real‑world examples and working Python.
What you will learn:
What the single‑source shortest path (SSSP) problem is
Dijkstra’s algorithm explained line by line, with a full dry‑run on a tiny graph
Real‑world uses and patterns developers care about (maps, grids, routing)
Where Dijkstra struggles and what people mean by the “sorting barrier”
What the new Tsinghua algorithm changes at a high level
A teaching version that captures the core ideas (clustering the frontier + batched relaxations) so you can code and benchmark something today
When to use which method in practice
0) Quick refresher: the shortest‑path problem
Input. A directed graph G=(V,E)
with non‑negative edge weights w(u,v) ≥ 0
. A source node s
.
Goal. For every node v
, compute the distance dist[v]
= the minimum total weight from s
to v
. Also keep parents to reconstruct the actual path.
Real‑world meanings of weights. Time on a road, latency between servers, energy use of a robot, or any cost you want to minimize.
1) Dijkstra’s algorithm, explained like you’re pair‑programming
High‑level idea. Grow a tree of confirmed shortest paths outward from the source. Always pick the currently closest unprocessed node next.
Data structures.
dist[v]
: best distance found so farparent[v]
: predecessor to rebuild the path laterA priority queue (min‑heap) of
(distance, node)
Pseudocode.
for v in V: dist[v] = +∞; parent[v] = None
dist[s] = 0
push (0, s) into heap
while heap not empty:
d, u = pop_min(heap) # always the closest frontier node
if d != dist[u]: continue # skip stale entries
for each (u -> v) with weight w:
if d + w < dist[v]: # found a better route to v
dist[v] = d + w
parent[v] = u
push (dist[v], v) into heap
Why it’s correct. When you pop a node u
from the min‑heap, no shorter undiscovered route to u
exists. All edges have non‑negative weight, so any alternative route would have had to pass through a node that was already worse or already finalized.
Time complexity. With a binary heap: O(m + n log n)
where n = |V|
, m = |E|
.
Step‑by‑step dry‑run on a tiny graph
Graph (directed), weights in parentheses:
A → B (2)
A → C (4)
B → C (1)
B → D (7)
C → D (3)
Start at A
.
Step | Heap (min on left) | dist[] after step | Chosen u | Relaxations |
0 | (0,A) | A:0, B:∞, C:∞, D:∞ | — | — |
1 | — | pop (0,A) | A | A→B makes B:2; A→C makes C:4 |
(2,B), (4,C) | A:0, B:2, C:4, D:∞ | |||
2 | — | pop (2,B) | B | B→C makes C:min(4, 2+1=3)=3; B→D=9 |
(3,C), (4,C), (9,D) | A:0, B:2, C:3, D:9 | |||
3 | — | pop (3,C) | C | C→D makes D:min(9, 3+3=6)=6 |
(4,C), (6,D) | A:0, B:2, C:3, D:6 | |||
4 | — | pop (4,C) but 4≠dist[C]=3 | skip | stale entry, ignore |
(6,D) | A:0, B:2, C:3, D:6 | |||
5 | — | pop (6,D) | D | no outgoing edges |
Final distances: A:0, B:2, C:3, D:6
.
Path to D
: backtrack parents D←C←B←A
→ A→B→C→D with cost 6.
Production‑ready Python (heapq)
import heapq
from typing import Dict, Tuple
def dijkstra(graph: Dict[str, Dict[str, float]], start: str):
dist = {node: float("inf") for node in graph}
parent = {node: None for node in graph}
dist[start] = 0.0
pq: list[Tuple[float, str]] = [(0.0, start)]
while pq:
d, u = heapq.heappop(pq)
if d != dist[u]:
continue # stale entry
for v, w in graph[u].items():
nd = d + w
if nd < dist[v]:
dist[v] = nd
parent[v] = u
heapq.heappush(pq, (nd, v))
return dist, parent
# Example
G = {
'A': {'B': 2, 'C': 4},
'B': {'C': 1, 'D': 7},
'C': {'D': 3},
'D': {}
}
print(dijkstra(G, 'A'))
Grid maps (for games and robotics)
Grids are graphs. Each cell is a node. Edges connect to 4 or 8 neighbors. Weight can be 1 or a terrain cost.
from heapq import heappush, heappop
def dijkstra_grid(grid, start, goal):
H, W = len(grid), len(grid[0])
INF = 10**18
dist = [[INF]*W for _ in range(H)]
parent = [[None]*W for _ in range(H)]
sy, sx = start
gy, gx = goal
dist[sy][sx] = 0
pq = [(0, sy, sx)]
def neighbors(y, x):
for dy, dx in [(1,0),(-1,0),(0,1),(0,-1)]:
ny, nx = y+dy, x+dx
if 0 <= ny < H and 0 <= nx < W and grid[ny][nx] != '#':
yield ny, nx
while pq:
d, y, x = heappop(pq)
if (y, x) == (gy, gx):
break
if d != dist[y][x]:
continue
for ny, nx in neighbors(y, x):
w = 1 # or terrain cost grid[ny][nx]
nd = d + w
if nd < dist[ny][nx]:
dist[ny][nx] = nd
parent[ny][nx] = (y, x)
heappush(pq, (nd, ny, nx))
# reconstruct path
path = []
cur = (gy, gx)
if dist[gy][gx] < INF:
while cur:
path.append(cur)
cur = parent[cur[0]][cur[1]]
path.reverse()
return dist[gy][gx], path
When to use Dijkstra. Small to medium graphs, non‑negative weights, and you care about a guaranteed optimal path. Great baseline even when you later add heuristics (A*), bidirectional search, or goal‑directed speedups.
2) Where Dijkstra slows down: the “sorting barrier” in plain words
Dijkstra always chooses the next closest frontier node. That means it keeps the frontier sorted by distance using a priority queue. On large graphs, the repeated “pop the minimum” and “push an updated distance” steps dominate the runtime. This is sometimes described as a sorting barrier: you pay a log n
factor again and again because you need ordered access to the next closest node.
Two classic alternatives help in special cases:
Bellman‑Ford: no sorting, just repeat relaxing all edges. Simpler but slower in practice
O(n*m)
in the worst case.Bucket‑based methods (Dial’s): if weights are small integers, you can keep nodes in buckets by distance and pop from the current bucket in
O(1)
average time.
But what about general real weights on directed graphs? That is where the modern breakthrough matters.
3) What the Tsinghua algorithm changes (high‑level)
The new result gives a deterministic algorithm for SSSP on directed graphs with non‑negative real weights that runs in roughly O(m · (log n)^(2/3))
. The big picture:
It does not fully sort the frontier at each step. Instead it clusters vertices into groups that are “close enough” in distance and works with representatives.
It mixes in batched relaxations (think: a controlled taste of Bellman‑Ford) to discover useful edges without paying the full sorting cost.
It uses a recursive, divide‑and‑conquer structure to keep these groups small and well‑behaved, so the total number of comparisons drops below what Dijkstra pays.
The exact algorithm is intricate and research‑grade. Rather than reproduce the full paper, the rest of this article gives you two things you can use right now:
An intuitive mental model for what is going on.
A teaching version you can implement that captures the main ideas (clustering + batches) so you can experiment and benchmark.
4) A mental model you can explain in 60 seconds
Imagine the frontier distances spread out on a number line. Dijkstra keeps them in exact order and always takes the absolute minimum. The Tsinghua approach says:
“Instead of keeping a perfect order, build small neighborhoods (clusters) of distances. Work with a few pivots that stand in for those neighborhoods, do a batch of relaxations to surface better candidates, then refine.”
This lets it skip a lot of comparisons you would normally spend on strict ordering, while still guaranteeing exact shortest paths in the end.
5) A teaching version you can code today
This is not the research algorithm. It is a clean, practical approximation of the ideas for learning and prototyping:
Keep a coarse bucketing of frontier nodes by distance range (clusters).
Repeatedly pick a pivot from the best non‑empty bucket, relax its edges, and also do a local batch: try relaxing outgoing edges of a few of its neighbors without re‑sorting everything.
Periodically re‑bucket nodes whose distances improved a lot.
For general real weights, this looks similar in spirit to well‑known “∆‑stepping” approaches that batch light edges. With careful handling, it still returns exact distances for non‑negative weights, and it often reduces heap traffic on big sparse graphs.
Code: “clustered SSSP” (teaching version)
from collections import deque, defaultdict
import math
class ClusteredSSSP:
def __init__(self, graph, delta=1.0):
self.g = graph # dict: u -> {v: w}
self.delta = float(delta) # bucket width (tune this)
def _bucket_id(self, x):
return int(math.floor(x / self.delta))
def run(self, s):
dist = {u: float('inf') for u in self.g}
parent = {u: None for u in self.g}
dist[s] = 0.0
buckets = defaultdict(deque) # bucket_id -> deque of nodes
active_ids = set()
def add_bucket(u):
bid = self._bucket_id(dist[u])
buckets[bid].append(u)
active_ids.add(bid)
add_bucket(s)
while active_ids:
bid = min(active_ids) # best bucket id
q = buckets[bid]
if not q:
active_ids.remove(bid)
continue
u = q.popleft() # pivot
if self._bucket_id(dist[u]) != bid:
# moved to a better bucket since enqueue
add_bucket(u)
continue
# 1) relax edges of pivot u
for v, w in self.g[u].items():
nd = dist[u] + w
if nd < dist[v]:
old_bid = self._bucket_id(dist[v]) if dist[v] < float('inf') else None
dist[v] = nd
parent[v] = u
new_bid = self._bucket_id(dist[v])
buckets[new_bid].append(v)
active_ids.add(new_bid)
if old_bid is not None and old_bid != new_bid and not buckets[old_bid]:
active_ids.discard(old_bid)
# 2) local batch: lightly explore neighbors of neighbors
# (helps surface good candidates without strict sorting)
for v, w in self.g[u].items():
# only try a shallow pass (1 step) from v
for x, w2 in self.g[v].items():
nd2 = dist[v] + w2
if nd2 < dist[x]:
old_bid = self._bucket_id(dist[x]) if dist[x] < float('inf') else None
dist[x] = nd2
parent[x] = v
new_bid = self._bucket_id(dist[x])
buckets[new_bid].append(x)
active_ids.add(new_bid)
if old_bid is not None and old_bid != new_bid and not buckets[old_bid]:
active_ids.discard(old_bid)
return dist, parent
# Example usage
G = {
'A': {'B': 2, 'C': 4},
'B': {'C': 1, 'D': 7},
'C': {'D': 3},
'D': {}
}
solver = ClusteredSSSP(G, delta=1.0)
print(solver.run('A'))
Tuning. The delta
bucket width controls how coarse your clusters are. Small delta
behaves more like Dijkstra. Larger delta
increases batching. In practice, try delta
around the median edge weight.
Guarantees. This teaching version is designed for clarity. It mimics the two key ideas and returns exact shortest paths on non‑negative graphs, but it is not the formal Tsinghua algorithm.
6) A realistic benchmark you can copy‑paste
Compare classic Dijkstra and the clustered teaching version on random sparse graphs.
import random, time
def make_sparse_graph(n, m, seed=0):
rnd = random.Random(seed)
G = {i: {} for i in range(n)}
edges = set()
while len(edges) < m:
u = rnd.randrange(n)
v = rnd.randrange(n)
if u == v: continue
if (u,v) in edges: continue
w = 1 + rnd.random()*9 # weights in (1,10]
G[u][v] = w
edges.add((u,v))
return G
# Re‑use dijkstra(graph, start) from earlier, adapted to int nodes
def dijkstra_num(graph, start):
import heapq
dist = {u: float('inf') for u in graph}
parent = {u: None for u in graph}
dist[start] = 0.0
pq = [(0.0, start)]
while pq:
d, u = heapq.heappop(pq)
if d != dist[u]:
continue
for v, w in graph[u].items():
nd = d + w
if nd < dist[v]:
dist[v] = nd
parent[v] = u
heapq.heappush(pq, (nd, v))
return dist, parent
n = 50_000
m = 200_000
G = make_sparse_graph(n, m, seed=42)
start = 0
# Dijkstra
t0 = time.time()
d1, _ = dijkstra_num(G, start)
t1 = time.time()
# Clustered SSSP
solver = ClusteredSSSP(G, delta=3.0)
t2 = time.time()
d2, _ = solver.run(start)
t3 = time.time()
print(f"Dijkstra: {t1 - t0:.2f}s | Clustered: {t3 - t2:.2f}s")
# sanity‑check a few nodes
for node in [1, 2, 3, 4, 5]:
print(node, abs(d1[node] - d2[node]) < 1e-9)
On big sparse graphs you will often see the clustered version do fewer heap operations and run faster for many distributions of weights. Your mileage will vary with delta
and graph shape.
7) Real‑world examples you can relate to
Maps and navigation. City routes with traffic as weights. Dijkstra gives the gold‑standard baseline. The newer approach is attractive at country scale where you process millions of roads repeatedly.
Logistics and delivery. Couriers and trucking. You run SSSP from many sources every hour as conditions change. Reducing the per‑run overhead has real cost savings.
Networks and CDNs. Shortest path equals lowest latency. Data centers do SSSP on large directed topologies. Batching and clustering can cut routing recomputation time.
Games and robotics. Grids and navigation meshes. Dijkstra plus heuristics (A*) is standard. For massive worlds or many NPCs, clustered frontiers can reduce work without losing optimality.
8) When to use what (decision guide)
Use Dijkstra if:
Graph has up to a few million edges and single queries are fine.
You want the simplest, most battle‑tested implementation.
You might extend to A* with a heuristic later.
Try a clustered / batched approach if:
Graphs are huge and sparse, or you rerun SSSP very frequently.
Priority‑queue traffic is your bottleneck.
You can tune a small parameter like
delta
and get wins in practice.
Remember: for integer weights in a small range, Dial’s buckets are often the fastest.
9) Practical tips and pitfalls
Negative edges. None of the above works with negative weights. Use Bellman‑Ford or Johnson’s algorithm (reweight then Dijkstra).
Early exit to one target. If you only need
s → t
, stop when you popt
from the frontier.Parent pointers. Always store parents so you can rebuild the actual path, not just the distance.
Graph storage. Use adjacency lists (dict of dict in Python) for sparse graphs. Avoid dense matrices unless the graph is dense.
Batching budget. If your batched relaxations are too aggressive, you may do extra work. Keep batches shallow (depth 1–2) unless profiling says otherwise.
Unit tests. Write tests on tiny graphs where you know the answer. Compare algorithms on the same inputs.
10) FAQ
Q. Is the Tsinghua algorithm a drop‑in replacement for Dijkstra?
A. Not today. It is more complex and aimed at large‑scale scenarios. The ideas can inspire practical variants like the teaching version you just saw.
Q. Does the teaching version always return exact answers?
A. Yes for non‑negative weights as written. It is not the research algorithm, but it still computes exact SSSP.
Q. Should I switch all my code to the new style?
A. Benchmark first. On small or dense graphs, classic Dijkstra or Dial’s buckets can still be faster.
11) Copy‑paste snippets
Dijkstra (dict‑of‑dict):
dist, parent = dijkstra(G, start)
Grid shortest path:
cost, path = dijkstra_grid(grid, (sy, sx), (gy, gx))
Clustered SSSP:
solver = ClusteredSSSP(G, delta=2.5)
dist, parent = solver.run(start)
12) Takeaways
Dijkstra remains the clearest, most dependable baseline for SSSP.
The new frontier‑clustering viewpoint shows that strict sorting is not sacred.
You can get practical speedups on large sparse graphs by batching relaxations and processing distance clusters.
Always measure on your real data before committing to any change.
Appendix: Reconstructing paths
def rebuild_path(parent, s, t):
path = []
cur = t
while cur is not None:
path.append(cur)
cur = parent[cur]
path.reverse()
if path and path[0] == s:
return path
return []
Appendix: Turning Dijkstra into A*
Add a heuristic h(v)
that never overestimates the true remaining cost. Push (dist[v] + h(v), v)
into the heap instead of (dist[v], v)
. For Euclidean grids, h(v)
can be straight‑line distance to the goal.
Subscribe to my newsletter
Read articles from GAJENDER directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
