Welcome Back! if this is your first time Welcome …

In today’s article, we dive deeper into one of the most fascinating data structure/ cryptographic proof I’ve become obsessed with. I think you’ll find it interesting too! From the title, you already know that we will be talking about Merkle Trees and Merkle Proofs, and how they are used in an airdrop distribution system.

We are going to explore:

What is a Merkle Tree?
How does a Merkle Proof work?
How are Merkle Trees applied in an airdrop distribution system

Understanding Merkle Trees

A Merkle tree, also known as a hash tree, is a fundamental data structure used in cryptography and blockchain systems. It efficiently verifies the integrity of large datasets by breaking them down into smaller parts and hashing those parts repeatedly until a single root hash is formed. Each “leaf” in a Merkle tree represents a hash of a data block, while every “parent” node represents a hash of its child nodes’ combined hashes. This structure allows for efficient and secure verification since only part of the tree needs to be traversed to confirm data validity. Think of it as a sort of upside down tree. Let’s take a look at a basic example .

Example of a Basic Merkle Tree

Let’s take the example of four data blocks, which are labeled as Data Block 1, Data Block 2, Data Block 3, and Data Block 4.

First, we hash each data block:
Hash of Data Block 1 results in Hash 1.
Hash of Data Block 2 results in Hash 2.
Hash of Data Block 3 results in Hash 3.
Hash of Data Block 4 results in Hash 4.

2. Then, we pair up adjacent hashes and combine them:

Hash 1 and Hash 2 are combined and hashed to form Hash 1–2.
Hash 3 and Hash 4 are combined and hashed to form Hash 3–4.

3. Finally, Hash 1–2 and Hash 3–4 are combined and hashed to create the Root Hash, which is the top node of the tree.

Here’s the Merkle tree visualization:

In this diagram:

The Root Hash is at the top, which represents the combined hash of all data blocks.
Each intermediate node, such as Hash 1–2 and Hash 3–4, is the hash of the concatenated hashes of the data blocks beneath it.
The leaf nodes (Data Block 1 to Data Block 4) contain the actual data, which are hashed to create the tree.

The advantage of this structure is that it allows for efficient verification of data. For example, if someone wants to verify the integrity of Data Block 1, they only need to know Hash 2, Hash 3–4, and the root hash. This reduces the amount of data needed for verification without requiring the entire dataset cool right ?.

Why Merkle Trees ?

You might be wondering, why use a Merkle tree, and why am I even writing this article? Well, the truth is, Merkle trees are a cornerstone in many verification and chain-link systems, from blockchain to version control systems like Git. Their primary strength is in ensuring data integrity while minimizing the amount of data stored and transferred.

In essence, a Merkle tree allows you to verify that a particular piece of data belongs to a larger dataset, without needing to store or transfer the entire dataset. This is especially useful for systems where efficiency and security are crucial, like blockchains (Bitcoin, Ethereum), decentralized applications, or even systems like AirDrop distribution, which we will dive into very soon.

But it’s not just in blockchain. Ever wondered how Git keeps track of all your file changes so efficiently? That’s right — Merkle trees. Git uses a Merkle-like structure to store commits, with each commit representing a snapshot of your project. When Linus Torvalds created Git, he probably wasn’t thinking, “I need a Merkle tree!” — but that’s exactly what Git ended up using. It helps Git keep track of every version of your code while ensuring that each commit can be verified for integrity.

And it’s not just Git. BitTorrent, the peer-to-peer file-sharing protocol, uses Merkle trees to efficiently verify the integrity of large files during download. By using small pieces of the file and their corresponding hashes, it ensures that a downloaded chunk matches the original without needing the entire file at once, making it a robust system for decentralized data transfer.

Even Amazon DynamoDB, a distributed database system, employs Merkle trees for replication and consistency. Dynamo uses them to compare data between replicas, ensuring that only differences are transferred, which saves bandwidth and accelerates data synchronization in distributed systems.

And hey, if Torvalds hadn’t built Git so well, we’d probably still be using CVS — and nobody wants that! now let dive into how to use it in an airdrop system and see how the proof works

Designing an Airdrop Distribution System with Merkle Tree

In an airdrop system, a Merkle Tree allows us to keep the entire list of airdrop recipients and their respective token amounts off-chain. Instead of storing all the data on-chain, we only store a commitment — a sort of cryptographic fingerprint known as the Merkle Root (or root hash as discuss earlier) — on the blockchain.

Here’s how it works: when someone wants to claim their airdrop, they submit a proof (we will look into that shortly) that includes their address and claimable amount (e.g., 100 tokens). The contract will then verify this proof by comparing it with the on-chain Merkle Root. If the proof is valid, the contract confirms that the claim is legitimate and releases the tokens.

The beauty of this system is that it significantly reduces on-chain storage costs (so storing on the blockchain is expensive ) while maintaining secure and efficient verification of claims through the Merkle Tree structure.

How It Works:

All the tree construction happens off-chain.
The contract storage only contains the Merkle Root (the “fingerprint”).
The off-chain part includes building the Merkle Tree from a set of data, such as airdrop addresses and token amounts.

At the bottom of our tree (the leaves), we have the actual data we care about:

All the addresses eligible for the airdrop.
The corresponding token amounts for each address.

In this example, we also include an index for each leaf, but I’ll explain that in a minute. For now, let’s focus on the basic chunks of data.

Hashing the Leaves: We take each of these leaves (the address and amount) and hash them. This hashing can be done by concatenating the address, amount, and other values (e.g., the index). The key point is to stay consistent. Once we concatenate the values, we hash them — represented by H. The result is a 256-bit hash.
Hashing the Pairs: After hashing each piece of data individually, we take each pair of hashes and combine them. We then hash the combined values together. This step is repeated at each level of the tree.
Constructing the Root: We continue combining and hashing pairs until we reach the top, resulting in the Merkle Root. This root is the fingerprint stored on-chain.

As developers, we compute all of this off-chain — it might require some computation, but since it’s off-chain, gas fees aren’t a concern. After the Merkle Tree is constructed, we either:

Deploy the contract with the Merkle Root stored within it, or
Use an administrative function to set the Merkle Root and write it into the contract.

Once the Merkle Root is in the contract, the airdrop system is ready to verify claims.

Here’s visualization of structure and flow of the Merkle tree for an airdrop system,

Before we implement the code let us take a sometime to talk about the way the proof works, that is how we know a particular data belongs to the tree

How Does This Work?

At first glance, it might seem strange that a system based on Merkle Trees works with such minimal data, yet maintains a high level of security. You might wonder, “How can I simply provide my data, a few additional hashes, and have the contract verify my claim without needing to see the full dataset?”

The magic lies in how hash functions and Merkle Trees interact to create a secure proof of membership, even when most of the data is not directly provided to the smart contract.

Step-by-Step Breakdown:

Merkle Tree Construction:
A Merkle Tree is a binary tree where:
Leaves are the actual data (e.g., addresses and token amounts in an airdrop).
Internal nodes are hashes derived from combining the hashes of their child nodes.
The Merkle Root at the top is the ultimate “fingerprint” of the entire dataset.

All the leaves at the bottom represent individual pieces of data (addresses and claim amounts). When the Merkle Tree is created off-chain, each leaf is hashed, and those hashes are combined in pairs, hashed again, until reaching the Merkle Root, which is stored in the contract.

2. How Verification Works: When you want to claim your tokens, instead of submitting the entire dataset, you submit:

Your leaf data (address and claim amount).
A Merkle Proof, which includes the sibling hashes along the path from your leaf to the root.

The smart contract verifies the authenticity of your data by:

Hashing your leaf data.
Using the sibling hashes to recreate the path up the Merkle Tree.
Once the contract computes the Merkle Root, it compares it to the stored root.

If the roots match, your proof is valid, and the contract approves your claim.

Proofs and Tree Structure

If we simply wanted to store a fingerprint of data on-chain, we could concatenate the entire dataset, hash it, and store the hash in the contract. However, this wouldn’t allow us to produce Merkle Proofs.

Without a tree structure, you’d need to provide all the other data alongside your own to verify it, which would be incredibly expensive. This is where Merkle Trees shine, allowing us to prove membership efficiently with only a fraction of the data.

How Merkle Proofs Work:

To claim tokens, you need to provide the following data:

Your Leaf (Data): This is your address and the token amount you’re claiming. The contract hashes this data.
Sibling Hashes: These are the hashes needed to compute the path from your leaf to the root.

The contract performs the following steps:

Hashes your data.
Uses sibling hashes to compute the next level’s hash.
Repeats this process until it reaches the Merkle Root.
Compares the computed root with the stored root.

If the computed root matches, the proof is valid.

Proof Complexity: ( for my cs nerds)

For each proof of membership in the Merkle Tree, you need to provide one additional hash (sibling) per level of the tree. The beauty of this approach lies in the fact that the height of the tree is logarithmic in relation to the number of leaves.

For example:

With 1 million data leaves, the tree has a height of 21.
If you increase the number of leaves to 1 billion (a 1000x increase), the tree’s height only grows to 30 or 31 levels.

This logarithmic property means the proof complexity is very efficient. Even if the dataset grows massively, the height of the tree increases only slightly. The formula for proof complexity is O(log₂(n)), where n is the number of leaves.

Therefore, producing and verifying proofs is highly efficient, no matter how large the dataset is. The contract only needs to store one root (a 256-bit hash), even if the tree contains billions of items.

Why the System is Secure

It seems almost too simple that I can provide just my leaf (address and amount) and a few sibling hashes, and the system will trust that my claim is legitimate. But the key lies in the properties of cryptographic hash functions and Merkle Trees. Here’s why this works securely:

1. Cryptographic Hash Functions are Collision-Resistant:

A hash function is a one-way cryptographic algorithm that maps an input (of any size) to a fixed-size output, called a hash. Good hash functions, like SHA-256, have a few important properties:

Deterministic: The same input will always produce the same output.
Irreversible: It is practically impossible to reverse-engineer the original input from its hash.
Collision-Resistant: It is incredibly difficult to find two different inputs that produce the same hash (a collision).

Because of this, the hash function ensures that even the smallest change to the input data results in a completely different hash. For example, if I try to modify the amount I’m eligible to claim by even a single bit, the resulting hash will be wildly different from the correct hash.

2. The Role of the Merkle Tree:

The structure of the Merkle Tree takes advantage of this collision resistance. Let’s say you try to manipulate your claim:

Step 1: You alter your address or claim amount. This change results in a completely different hash for your leaf.
Step 2: You would then need to find a fake sibling hash that, when combined with your new leaf hash, still produces the same parent hash as before.
Step 3: You continue this process all the way up the tree to the root.

The challenge here is that you need to ensure each new hash at every level matches what the contract expects. But because hash functions are so sensitive to input changes, finding valid sibling hashes that would result in the same root is essentially impossible.

3. Impossibility of Faking Proofs:

If you manipulate any part of your data (e.g., increasing the amount of tokens you want to claim), the hash of your leaf changes.
This means you would need to invent fake hashes at every level of the tree to produce the correct root.
Due to the collision-resistant nature of hash functions, finding such a set of fake hashes would take an astronomically long time — far longer than the lifetime of the universe, using any current computational methods.

Even though there are technically multiple inputs that could hash to the same output (due to the pigeonhole principle), finding those collisions is computationally infeasible with modern hash functions like SHA-256 incase of the EVM Keccak256 . So, it’s effectively impossible to cheat the system.

Implementing a Merkle Tree Airdrop

Let’s walk through the process of implementing a Merkle tree-based airdrop system using Solidity and JavaScript.

1. Creating the Merkle Tree

First, we need to create our Merkle tree off-chain. Here’s a JavaScript implementation using the OpenZeppelin Merkle tree library:

const { StandardMerkleTree } = require("@openzeppelin/merkle-tree");  
const fs = require("fs");  
const values = \[  

  \["0x70997970C51812dc3A010C7d01b50e0d17dc79C8", "0", "10000000000000000000"\],  
  \["0x3C44CdDdB6a900fa2b585dd299e03d12FA4293BC", "1", "20000000000000000000"\],  
  // ... more addresses and amounts ...  
\];  

const tree = StandardMerkleTree.of(values, \["address", "uint256", "uint256"\]);  

console.log("Merkle Root:", tree.root);  
fs.writeFileSync("tree.json", JSON.stringify(tree.dump(), null, 2), "utf8");

This script creates a Merkle tree from a list of addresses, indices, and token amounts, then saves the tree structure to a JSON file.

2. Smart Contract Implementation

Now, let’s look at the Solidity contract that handles the airdrop claims:

// SPDX-License-Identifier: MIT  
pragma solidity 0.8.24;  

import {IERC20} from "@openzeppelin/contracts/token/ERC20/IERC20.sol";  
import {BitMaps} from "@openzeppelin/contracts/utils/structs/BitMaps.sol";  
import {MerkleProof} from "@openzeppelin/contracts/utils/cryptography/MerkleProof.sol";  

contract MerkleDrop {  
    IERC20 public immutable tokenAddress;  
    bytes32 public merkleRoot;  
    BitMaps.BitMap internal airdropCheckList;  

    constructor(address _tokenAddress, bytes32 _merkleRoot) {  

        tokenAddress = IERC20(_tokenAddress);  
        merkleRoot = _merkleRoot;  
    }  

    function claimAirDrop(bytes32[] calldata proof, uint256 index, uint256 amount) external {  

        require(!BitMaps.get(airdropCheckList, index), "Already claimed");  

        bytes32 leaf = keccak256(bytes.concat(keccak256(abi.encode(msg.sender, index, amount))));  

        require(MerkleProof.verify(proof, merkleRoot, leaf), "Invalid proof");  

        BitMaps.setTo(airdropCheckList, index, true);  

        tokenAddress.transfer(msg.sender, amount);  
    }  
}

Key points in this contract:

We use a bitmap to efficiently track claimed airdrops we could have used mapping but we are just trying to save a bit of 1(true) and 0(false) so using mapping will be an over kill .
The **claimAirDrop** function verifies the Merkle proof before transferring tokens.
Double-hashing is used to prevent preimage attacks.

3. Generating Proofs for Claims

To claim their airdrop, users need to provide a Merkle proof. Here’s how we can generate these proofs:

const { StandardMerkleTree } = require("@openzeppelin/merkle-tree");  
const fs = require("fs");  

function generateProof(address) {  
  const tree = StandardMerkleTree.load(JSON.parse(fs.readFileSync("tree.json", "utf8")));  
  for (const \[i, v\] of tree.entries()) {  
    if (v\[0\].toLowerCase() === address.toLowerCase()) {  
      return {  
        value: v,  
        proof: tree.getProof(i),  
      };  
    }  
  }  
  throw new Error(\`Address ${address} not found in Merkle tree\`);  
}  
/// the output of this will be out proof that we will pass into the claimairdrop   
/// function

This function loads the Merkle tree from our JSON file and generates a proof for a given address.

Full Codebase

If you want the full codebase, check out my GitHub repository and wrote some hardhat tests too.

Gas Efficiency and Scalability

The beauty of using Merkle trees for airdrops lies in their gas efficiency and scalability:

Constant Gas Cost: The gas cost for claiming an airdrop remains constant regardless of the number of total recipients.
Minimal On-Chain Storage: Only the Merkle root is stored on-chain, not the entire list of recipients.
Scalable to Millions: The system can handle millions of recipients without significant increases in complexity or cost.

Security Considerations

While Merkle trees provide an efficient solution, there are security aspects to consider:

Proof Generation: Ensure that proof generation is done securely off-chain.
Double-Claiming Prevention: The bitmap in the contract prevents double-claiming.
Root Updates: If the airdrop list needs updating, a new Merkle root can be set by the contract owner.

Conclusion: Security Without Sacrifice — Why Merkle Trees Are Pure Genius

Merkle Trees work like magic — but it’s all math! Here’s the quick rundown:

First, hash functions (like SHA-256) are super secure. They take any input, hash it, and even the tiniest change in the input completely alters the hash. This makes tampering with data nearly impossible.

Second, Merkle Trees are incredibly efficient. Even with millions of entries, verifying a single airdrop claim only takes a few steps. All you need is your claim and a few sibling hashes. The smart contract then verifies everything without storing huge amounts of data on-chain, keeping costs low while maintaining security.

The reason this system works so well is that it leverages the collision resistance of hash functions and the logarithmic structure of Merkle Trees. The Merkle Proof lets the contract efficiently check whether your data is part of a larger set, all while keeping on-chain operations minimal.

In short, Merkle Trees make airdrop verification fast, secure, and scalable. Pretty genius, right?

Thanks for Sticking With Me!

If you’ve made it this far, pat yourself on the back! You’re now well-versed in the magical world of Merkle Trees, and you’ve seen firsthand how they can make airdrop distribution (and other use cases) both scalable and secure. It’s not every day that you dive into the inner workings of cryptographic proofs and come out on the other side with a smile!

So, whether you’re a developer looking to implement this system, or someone just curious about how blockchain sorcery works under the hood, I hope you enjoyed the ride! Thanks for reading to the end — and remember, you now hold the power of Merkle Magic in your hands!

Go forth and prove your membership with confidence!

Referecences

https://hackernoon.com/merkle-trees-181cb4bc30b4
Merkle Trees for Airdrops on EVM: Logic, Use Cases, and Solidity Implementation (I strongly recommend)

Merkle Trees and Efficient Airdrop Distribution System

Table of contents