CRDTs: The Secret Sauce Behind Real-Time Collaboration

MaverickMaverick
7 min read

Introduction

Have you ever wondered how apps like Google Docs, Figma, or Notion allow multiple people to edit the same document at the same time—without conflicts, even if someone is offline? The answer is a mind-blowing concept called CRDTs (Conflict-free Replicated Data Types).

This blog will take you on a deep dive into CRDTs, with visuals, analogies, and Java code examples. By the end, you’ll have a superpower few developers know!


What Are CRDTs?

A CRDT is a data structure designed for distributed systems, where multiple copies (replicas) can be updated independently and concurrently, then merged automatically—without conflicts or coordination.

  • Conflict-free: No matter the order or timing of updates, all replicas converge to the same state.

  • Replicated: Each user/device/server has its own copy.

  • Data Type: It can be a counter, set, list, map, etc.


Real-World Analogy

Imagine a group of friends keeping a shared to-do list on sticky notes. Each friend can add or check off items on their own copy. When they meet, they merge their lists by combining all unique items and checked-off tasks. No matter the order, everyone ends up with the same list!

  [User A]   [User B]   [User C]
     |          |          |
     v          v          v
 [Replica]  [Replica]  [Replica]
     |          |          |
     +---- Merge/Sync ----+
               |
         [Converged State]

Why Are CRDTs Exciting?

  • No Merge Conflicts: Edits from different users never clash.

  • Offline-First: Users can work offline; changes sync later.

  • No Central Server Needed: Works peer-to-peer or in distributed clouds.

  • Mathematically Proven: Guarantees eventual consistency.


Types of CRDTs

  1. State-based (CvRDT): Each replica periodically sends its full state to others, which merge using a mathematical join.

  2. Operation-based (CmRDT): Replicas send only the operations (like “add X”), which are commutative and can be applied in any order.


Example: G-Counter (Grow-only Counter)

Let’s see a simple CRDT in action—a distributed counter that only increments.

A G-Counter (Grow-only Counter) is a state-based CRDT. Each replica (user, device, or server) keeps its own integer counter. When a replica wants to increment, it only increases its own counter. To get the total value, all counters are summed. When replicas sync, they merge by taking the maximum value for each replica’s counter—so no increments are ever lost.

Why is this conflict-free?

  • Each replica only increments its own slot, so there are no write conflicts.

  • Merging is done by taking the maximum for each slot, so increments are never overwritten or lost, no matter the order of merges.

  • The sum of all slots always reflects the total number of increments across all replicas.

Here’s the code:

import java.util.HashMap;
import java.util.Map;

public class GCounter {
    private final Map<String, Integer> counts = new HashMap<>();

    public void increment(String replicaId) {
        counts.put(replicaId, counts.getOrDefault(replicaId, 0) + 1);
    }

    public void merge(GCounter other) {
        for (String id : other.counts.keySet()) {
            int current = counts.getOrDefault(id, 0);
            int incoming = other.counts.get(id);
            counts.put(id, Math.max(current, incoming));
        }
    }

    public int getValue() {
        return counts.values().stream().mapToInt(Integer::intValue).sum();
    }
}

Usage:

GCounter a = new GCounter();
GCounter b = new GCounter();
a.increment("A"); // Replica A increments its own slot
b.increment("B"); // Replica B increments its own slot
// Now, a: {A=1}, b: {B=1}
a.merge(b); // or b.merge(a)
// After merge, both: {A=1, B=1}
System.out.println(a.getValue()); // 2

No matter the order of merges, the value is always correct! This makes G-Counter a perfect fit for distributed systems where increments can happen independently and need to be merged without conflicts.


Advanced Example: 2P-Set (Two-Phase Set)

A 2P-Set (Two-Phase Set) is a CRDT that allows both adding and removing elements, but with a twist: once an element is removed, it can never be added again. This is achieved by maintaining two grow-only sets:

  • addSet: Tracks all elements ever added.

  • removeSet: Tracks all elements ever removed.

How does it work?

  • To add an element, insert it into addSet.

  • To remove an element, insert it into removeSet (but only if it was previously added).

  • The current set is addSet - removeSet (all added elements that haven't been removed).

  • When merging two 2P-Sets, simply take the union of both addSet and removeSet.

Why is this conflict-free?

  • Both sets only ever grow (no deletions), so merging is just a union—no conflicts.

  • If two replicas add the same element, it appears only once.

  • If one replica removes an element, it is removed everywhere after merging, and cannot be re-added.

Here’s the code:

import java.util.HashSet;
import java.util.Set;

public class TwoPSet<T> {
    private final Set<T> addSet = new HashSet<>();
    private final Set<T> removeSet = new HashSet<>();

    public void add(T element) {
        addSet.add(element);
    }

    public void remove(T element) {
        if (addSet.contains(element)) {
            removeSet.add(element);
        }
    }

    public void merge(TwoPSet<T> other) {
        addSet.addAll(other.addSet);
        removeSet.addAll(other.removeSet);
    }

    public Set<T> getElements() {
        Set<T> result = new HashSet<>(addSet);
        result.removeAll(removeSet);
        return result;
    }
}

Usage:

TwoPSet<String> a = new TwoPSet<>();
TwoPSet<String> b = new TwoPSet<>();
a.add("apple");      // Replica A adds "apple"
b.add("banana");     // Replica B adds "banana"
a.remove("apple");   // Replica A removes "apple"
a.merge(b);          // Merge both replicas
System.out.println(a.getElements()); // [banana]

Step-by-step:

  • Initially, a has {apple} in addSet, b has {banana} in addSet.

  • a removes "apple", so removeSet in a is {apple}.

  • After merging, addSet is {apple, banana}, removeSet is {apple}.

  • The result is {banana} (since apple is in both addSet and removeSet).

Why can't you re-add?

  • Because removeSet only grows, once an element is removed, it will always be subtracted from the result, even if you try to add it again later.

This makes 2P-Set a simple, robust CRDT for distributed sets where "remove forever" semantics are acceptable.


Advanced Example: RGA (Replicated Growable Array) for Collaborative Text

Imagine two people are writing on the same whiteboard, but they can't see each other in real time. Each person writes their version of a word. Later, they meet and combine their work so that nothing is lost.

Let's say:

  • Person 1 writes: "Hello"

  • Person 2 writes: "Hallo"

When they combine their work, both versions of the second letter ('e' and 'a') are kept. The result could look like:

H e/a l l o

This means both 'e' and 'a' are present after 'H', and the rest of the word is the same. The exact order might depend on the system, but the important thing is that no one's work is lost.

How does this work?

  • Every letter you type is given a unique tag (like a sticky note with your name and a number).

  • When you merge, you line up all the sticky notes from both people, keeping all the letters.

  • If two people write different letters in the same spot, both are kept.

Why is this useful?

  • No matter how many people are editing, or if they are offline, everyone's changes are saved and combined.

  • This is how apps like Google Docs let people type at the same time without losing anyone's work.

Snapshot:

  • Think of collaborative text CRDTs as a super-smart whiteboard that never erases anyone's writing, even if you write at the same time as someone else. When you combine everyone's work, all the letters are still there!

  • Implementing a full RGA is complex, but libraries like Automerge (JS), Yjs (JS), and Automerge-java exist for real-world use.


More Powerful CRDTs

  • PN-Counter: Supports both increments and decrements.

  • G-Set / 2P-Set: Sets that allow only adds, or both adds and removes.

  • RGA (Replicated Growable Array): For collaborative text editing.


Where Are CRDTs Used?

  • Real-time collaborative editors (Google Docs, Figma, Notion)

  • Distributed databases (Redis, Riak, AntidoteDB)

  • Offline-first mobile apps

  • Messaging and chat apps


CRDTs vs. Other Approaches

  • Operational Transformation (OT): Used in Google Docs, but requires a central server and complex logic. OT can be tricky to scale and reason about.

  • CRDTs: Peer-to-peer, mathematically simpler, and more robust for distributed systems. CRDTs guarantee eventual consistency without a central coordinator.

Visual: OT vs. CRDTs

OT:  [User A] <-> [Server] <-> [User B]
CRDT: [User A] <-> [User B] (direct, no central server needed)

Challenges and Gotchas

  • Memory Usage: State-based CRDTs can grow large, especially with many users or edits. Garbage collection and pruning strategies are important.

  • Complexity: Some CRDTs (like for text editing) are tricky to implement. Use libraries when possible.

  • Security: Merging untrusted data can be risky—validate inputs and consider cryptographic signatures for sensitive data.

  • Testing: Testing distributed, eventually consistent systems is hard! Use property-based testing and simulation frameworks.


Summary Table

FeatureCRDTsTraditional Data Types
Merge ConflictsNeverPossible
Offline SupportYesOften No
Peer-to-PeerYesRare
ComplexityMedium-HighLow

Conclusion

CRDTs are a game-changer for building collaborative, distributed, and offline-capable apps. They let you focus on features, not conflict resolution. Next time you want to impress a fellow developer, explain CRDTs—and watch their eyes light up!


May your replicas always converge, and your merges be forever conflict-free!

0
Subscribe to my newsletter

Read articles from Maverick directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Maverick
Maverick