Generating Unique and Sortable IDs in Distributed Systems


When building distributed systems, generating unique identifiers becomes tricky. But why do we need them to be both unique AND sortable?
Uniqueness is obvious - you can't have duplicate IDs causing data corruption or overwriting records. But sortability brings some real benefits, e.g:
Natural ordering - IDs created later sort after earlier ones, giving you chronological order for free
Database performance - sorted inserts are faster, less index fragmentation, better query performance on ranges
Easier debugging - you can instantly see which records are newer just by looking at the ID.
In the following post, I would like to present possible solutions for this problem.
The Usual Suspects
Most developers reach for one of two standard approaches when they need unique IDs. Both seem reasonable at first, but they often show their limitations in distributed environments.
The Database auto-increment works great until you realize that properly divided services don't share databases. Each service would generate conflicting IDs, thereby breaking the uniqueness of the system.
UUIDs guarantee global uniqueness but don't provide sortability. Random UUIDs don't preserve creation order, which is often exactly what you need.
Separate ID Service
You could create a dedicated service with access to an auto-increment database, essentially serving as a single source of truth for all other systems that generate IDs.
But here's the issue: it becomes a single point of failure for your entire system. It also adds a large service dependency to other systems. A simple operation like generating an ID suddenly gets additional network latency. It seems like great overhead for our task.
Timestamp-Based IDs (The Smart Move)
Using a timestamp as part of the ID and leaving some bytes to randomize the ID seems more straightforward. Depending on your use case, you can choose one of the following options:
ULID
A 128-bit identifier that combines a millisecond timestamp with randomness. Encoded in Base32, it produces a 26-character string that is globally unique and sortable by creation time without coordination.
# pip install python-ulid
import ulid
import time
id1 = ulid.new()
id2 = ulid.new()
print(f"ULID 1: {id1}") # 01K47RE8HM22176GF3QEA5VXC4
print(f"ULID 2: {id2}") # 01K47REDKRVYZ67YBB275ZFHJB
print(f"Sortable: {id1 < id2}") # True - chronological order preserved
KSUID
A 160-bit identifier with a second-precision timestamp and 128 bits of randomness. Encoded in Base62 as a 27-character string, it’s highly collision-resistant and naturally ordered by time, making it well-suited for event logs and analytics.
# pip install ksuid
from ksuid import ksuid
import time
# KSUID with second precision
id1 = ksuid()
time.sleep(1) # 1 second delay for visibility
id2 = ksuid()
print(f"KSUID 1: {id1}") # 1545e3c6325d397529c1124712f93aeaa3b2bf5c
print(f"KSUID 2: {id2}") # 1545e3cac9c006f4bb64c753c68ff9143af33539
print(f"Sortable: {str(id1) < str(id2)}") # True
Snowflake ID
Snowflake fits everything into a 64-bit integer: 41 bits for timestamp, 10 bits for machine ID, and 12 bits for sequence number (plus 1 control bit at the very beginning). This gives you strictly monotonic, numeric IDs that are perfect for database primary keys. The catch? You need to coordinate machine IDs across your fleet.
import time
import threading
class Snowflake:
def __init__(self, machine_id: int):
self.machine_id = machine_id & 0x3FF # 10 bits for machine id
self.sequence = 0
self.last_timestamp = -1
self.lock = threading.Lock()
def _timestamp(self):
return int(time.time() * 1000) # milliseconds
def next_id(self):
with self.lock:
timestamp = self._timestamp()
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & 0xFFF # 12 bits
if self.sequence == 0:
while timestamp <= self.last_timestamp:
timestamp = self._timestamp()
else:
self.sequence = 0
self.last_timestamp = timestamp
return ((timestamp << 22) | (self.machine_id << 12) | self.sequence)
gen = Snowflake(machine_id=1)
id1 = gen.next_id()
id2 = gen.next_id()
print(id1) # 7368985504009687040
print(id2) # 7368985504009687041
print(id1 < id2) # True
Summary
Snowflake: Optimized for compact, numeric IDs (good for DB primary keys, ordered event streams). Requires coordination (machine IDs). Use Snowflake if you want small, numeric IDs (e.g., for databases, primary keys, social media posts).
ULID: Aimed at being a UUID replacement → globally unique, sortable, and human-readable. No coordination needed. Use ULID if you want a drop-in replacement for UUIDs with better sortability.
KSUID: Similar to ULID, but with more randomness and Base62 encoding, making it great for logs and analytics. Use KSUID if you want extreme collision resistance and don't mind slightly larger IDs.
Choosing the right ID scheme depends on your system’s priorities. By understanding the trade-offs (e.g. collision-free guarantees over time resolution), you can pick the approach that keeps your distributed system both scalable and easy to reason about.
Subscribe to my newsletter
Read articles from jorzel directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

jorzel
jorzel
Backend developer with special interest in software design, architecture and system modelling. Trying to stay in a continuous learning mindset. Enjoy refactoring, clean code, DDD philosophy and TDD approach.