5/5
## Understanding Incremental Merkle Trees Incremental Merkle Trees (IMTs) are a specialized type of Merkle tree designed to address a critical challenge in blockchain systems: efficiently updating Merkle trees that are stored on-chain. While standard Merkle trees are powerful for data verification, they become cumbersome and expensive when data needs to be added or changed dynamically within a decentralized environment. IMTs provide an elegant solution, enabling Merkle trees to be stored entirely on-chain and updated incrementally without excessive computational overhead. This lesson delves into the conceptual workings of IMTs, exploring how their fixed size, pre-population with zero values, and caching mechanisms achieve remarkable efficiency. Understanding IMTs is key to grasping the mechanics behind advanced blockchain technologies like ZK-sync and applications such as Tornado Cash, which rely on dynamic, on-chain Merkle trees. ## The Challenge with Standard Merkle Trees On-Chain Standard Merkle trees are a cornerstone of blockchain technology, primarily used for verifying data integrity and inclusion. Typically, the full Merkle tree is constructed and maintained off-chain. Only its single, unique Merkle root hash is stored on-chain, for instance, within a smart contract. This setup is efficient for static datasets. However, a "sneaky little problem" arises when the dataset is dynamic – what if we need to add a new piece of data, like an address to an allowlist, or update an existing entry? If a leaf node in the off-chain tree changes, its hash changes. This, in turn, changes the hash of its parent node, and this effect cascades all the way up to the Merkle root. The new Merkle root would then need to be resubmitted to the on-chain smart contract. This process typically requires a centralized entity to manage the off-chain tree, perform the recalculations, and submit the new root. Such centralization introduces a point of trust, which is often undesirable in decentralized systems. Consider the alternative: storing the entire Merkle tree on-chain. If we attempted this with a standard Merkle tree, adding a new leaf node would necessitate recalculating all affected parent nodes up to the root, *all on the blockchain*. Each hash calculation consumes gas. For a large tree, the cumulative gas cost of these on-chain recalculations would be astronomically high, likely exceeding block gas limits and rendering the operation impractical. This is where Incremental Merkle Trees offer a viable path forward. ## A Quick Refresher on Merkle Trees Before diving deeper into IMTs, let's briefly revisit the fundamentals of standard Merkle trees. A Merkle tree is a hierarchical data structure used for efficient data storage and verification. It allows one to prove that a specific piece of data is part of a larger set without revealing the entire set. Construction involves taking individual data items (leaves), hashing them, then pairing adjacent hashes and hashing them together. This process repeats, with pairs of hashes being combined and hashed, moving up the levels of the tree until a single hash, the Merkle root, is produced. In a typical scenario, this tree is managed off-chain. To verify if a piece of data (let's call its hash "Hash X") is part of the set, a user provides the original data (the preimage of Hash X) and a Merkle proof. The Merkle proof consists of the sibling hashes along the path from Hash X up to the Merkle root. A smart contract on-chain can then take the preimage, hash it, and use the provided sibling hashes to recalculate the path to the root. If the calculated root matches the Merkle root stored in the contract, the data's inclusion is verified. This works well for static data. However, applications like Tornado Cash, where new deposits (leaves) are continually added to the tree, require a dynamic Merkle tree. For trustlessness, the entire tree, or at least its state, needs to be accessible and verifiable on-chain. Furthermore, adding new leaves must be efficient and decentralized, avoiding the high gas costs and centralization issues of standard on-chain tree recalculations. This necessitates two primary requirements for dynamic on-chain Merkle trees: 1. The ability to store the Merkle tree (or its representation) on-chain. 2. A mechanism to efficiently add or modify leaves without a full, costly recalculation. Incremental Merkle Trees are engineered to meet these precise requirements. ## Introducing Incremental Merkle Trees: The Core Concepts An Incremental Merkle Tree (IMT) is a Merkle tree distinguished by two fundamental properties: it has a **fixed depth (or height)**, and its leaves are **initially populated with "zero" values**. These characteristics, combined with intelligent caching, enable efficient on-chain updates. **1. Fixed Size (Depth)** IMTs are defined with a predetermined, fixed size, specifically a fixed depth. The depth of a tree is the number of hash operations (or "hops") required to go from any leaf node to the root. For instance, a tree with 4 leaves might have a depth of 2 (leaf -> intermediate hash -> root hash). An 8-leaf tree would have a depth of 3. The levels of a tree can be visualized as distinct layers: * Level 0: The leaf nodes (actual data or their hashes). * Level 1: Hashes of pairs of Level 0 nodes. * Level 2: Hashes of pairs of Level 1 nodes. * And so on, until the final level, which contains the single Merkle root. Nodes in the tree can be identified by their (level, index) coordinates, both typically zero-indexed. The fixed depth (D) directly determines the maximum number of leaves the tree can hold: **Number of leaves = 2<sup>D</sup>**. Each additional level in the tree doubles the potential number of leaves. For example, Tornado Cash famously uses an IMT with a depth of 20, allowing for 2<sup>20</sup> (1,048,576) leaves. This is a substantial number, and if updating it required recalculating a significant portion of 2<sup>19</sup> hashes at the first level of hashing, it would be computationally prohibitive on-chain without the optimizations IMTs provide. **2. Pre-populating Leaves with "Zero" Values** The second key property is that, upon initialization, all leaf nodes in an IMT are filled with a predefined "zero" value. This isn't literally the number zero, but rather a constant, known value, such as the hash of a specific string (e.g., `hash("tornado")` in Tornado Cash's case). This initial state, where all leaves are "zero," allows for the pre-calculation of the entire tree. The initial intermediate nodes and the initial Merkle root are all derived from these zero leaves. For example, in an 8-leaf tree (depth 3), all 8 leaves at Level 0 would initially be `hash("tornado")`. The Level 1 nodes would be `hash(hash("tornado") || hash("tornado"))`, and this pattern would continue up to the root. **Understanding Zero Subtrees** A "subtree" is any node within the main Merkle tree along with all its descendants; it can be thought of as a smaller Merkle tree in its own right, with that node as its root. Even a single leaf node is a subtree of depth 0. In an IMT, subtrees composed entirely of these initial "zero" values are called "zero subtrees." The crucial insight is that the Merkle roots of these zero subtrees can be pre-calculated and known beforehand. Instead of storing the entire initial zero tree, smart contracts leveraging IMTs often store the roots of these zero subtrees for every possible depth up to the main tree's depth (minus one, as the full tree's initial zero root can be derived). For a depth 3 tree, this means storing: * The hash of a single zero leaf (root of a zero subtree of depth 0). * The root of a zero subtree of depth 1 (formed from two zero leaves). * The root of a zero subtree of depth 2 (formed from four zero leaves). These precomputed zero subtree roots are vital for efficient updates. **Adding Leaves: The Incremental Update Process** New, actual data leaves are added to the IMT by replacing the "zero" leaves, typically in a left-to-right sequence. When a new leaf is inserted at a specific index (say, index `i`), it replaces the zero value at that position. Consider an 8-leaf tree (depth 3). If we insert the first actual leaf at index 0, this new leaf value is used. To calculate the new Merkle root, we only need to recompute the nodes along the direct path from this new leaf at index 0 to the root. Its sibling at index 1 is still a zero leaf (or more precisely, we use the precomputed root of a zero subtree of depth 0 for that position). Their parent hash is computed. This new parent is then hashed with its sibling, which would be the root of a precomputed zero subtree of depth 1 (representing zero leaves at indices 2 and 3), and so on. If we later add a leaf at index 4, leaves 0-3 might already be filled with actual data. Leaf 4 is the new insertion. Leaves 5-7 are still zero values. Again, only the path from index 4 to the root needs recomputation, utilizing the actual data to its left (or cached representations) and the precomputed zero values to its right. The key observations for efficiency here are: 1. Everything to the **left** of the newly inserted leaf (which is already filled with actual data) remains unchanged and can be represented by cached values. 2. Everything to the **right** of the inserted leaf's path consists of **zero values**, whose combined hashes (zero subtree roots) are already known. **The Role of Caching Filled Subtrees** As leaves are added from left to right, subtrees become entirely filled with actual (non-zero) data. Once a subtree is completely populated, its root hash can be calculated and **cached** within the smart contract. This is a powerful optimization. A cached subtree root represents all the information of its child nodes. Since new leaves are added sequentially to the right, a filled subtree to the left will not change until the entire tree is filled (a scenario not typically focused on for incremental additions). How many cached subtree roots do we need to store? Remarkably, we only need to cache the roots of the smallest set of non-overlapping, fully populated subtrees that represent the current state of the occupied portion of the tree. This set of necessary cached subtree roots corresponds directly to the binary representation of the index of the *next leaf to be added*. For example, in a depth 3 tree (8 leaves), if we are about to insert a leaf at index 4: * The binary representation of 4 is `100`. * Reading from left to right (most significant bit to least significant bit, corresponding to larger to smaller subtrees): * The `1` indicates we need one cached subtree of depth 2 (2<sup>2</sup>=4 leaves). This subtree covers leaves 0-3, which are presumed to be filled. Its root is retrieved from cache. * The next `0` indicates no cached subtree of depth 1 is needed from this segment. * The final `0` indicates no cached subtree of depth 0 is needed from this segment. So, to update for leaf 4, we'd use the cached root of the subtree covering leaves 0-3, the new leaf 4 itself, and then zero subtree roots for the remaining parts to its right. If inserting at index 13 in a depth 4 tree (16 leaves): * 13 in binary is `1101`. * We would need: * `1` cached root of a depth 3 subtree (covering leaves 0-7). * `1` cached root of a depth 2 subtree (covering leaves 8-11). * `0` cached roots of depth 1 subtrees. * `1` cached root of a depth 0 subtree (representing leaf 12). These three cached roots, plus the new leaf 13, are combined with appropriate zero subtree roots to calculate the new Merkle root. **Determining Hashing Order** When combining two nodes (either two leaves, two intermediate hashes, a leaf and a zero subtree root, or a cached root and another value) to compute their parent hash, the order matters: `Hash(A || B)` is different from `Hash(B || A)`. IMTs follow a consistent rule: **hashing is performed Left to Right**. * Nodes at an **even index** within their level are considered to be on the left. * Nodes at an **odd index** within their level are considered to be on the right. Let's trace an example of adding a leaf at an **even index**: inserting into index 4 (depth 3 tree, leaves 0-3 are filled and their combined root is cached as a depth 2 subtree). 1. The new leaf is at **index 4 (even)**. Its sibling is at index 5 (odd), which is currently a zero leaf. We take the new leaf value (left) and hash it with the precomputed "zero subtree root of depth 0" (right): `Hash(new_leaf_4 || zero_depth_0_root)`. This result is the intermediate node at Level 1, Index 2. 2. This new node (Level 1, Index 2 - also an even index relative to its hashing partner) is then combined with the node to its right. This right-hand node represents leaves 6 and 7, which are currently zero values. So, we use the precomputed "zero subtree root of depth 1". The hash is: `Hash(intermediate_node_L1_I2 || zero_depth_1_root)`. This result is the intermediate node at Level 2, Index 1. 3. This new node (Level 2, Index 1 - an odd index relative to its hashing partner) is combined with the node to its left. This left-hand node represents leaves 0-3, which are filled. We use the "cached subtree root of depth 2" for this. The hash is: `Hash(cached_depth_2_root || intermediate_node_L2_I1)`. This gives the new overall Merkle Root. The newly inserted leaf at index 4 is now cached as a "filled subtree of depth 0" for future updates (e.g., when adding to index 5). Now, consider adding a leaf at an **odd index**: inserting into index 5 (depth 3 tree, leaves 0-3 are filled, leaf 4 is filled). 1. The new leaf is at **index 5 (odd)**. Its sibling is at index 4 (even), which is already filled (and its value cached as a depth 0 subtree from the previous step). We take the "cached root for leaf 4" (left) and hash it with the new leaf value for index 5 (right): `Hash(cached_leaf_4_root || new_leaf_5)`. This result is the intermediate node at Level 1, Index 2. 2. This new node (Level 1, Index 2 - even index) represents leaves 4 and 5, both now filled. It is combined with the node to its right (representing leaves 6-7, still zero values), so we use the "zero subtree root of depth 1". The hash is: `Hash(intermediate_node_L1_I2 || zero_depth_1_root)`. This result is the intermediate node at Level 2, Index 1. 3. This new node (Level 2, Index 1 - odd index) is combined with the node to its left (the cached root for leaves 0-3). The hash is: `Hash(cached_depth_2_root || intermediate_node_L2_I1)`. This is the new Merkle Root. Now, the intermediate node at Level 1, Index 2 (representing filled leaves 4 and 5) can be cached as a "filled subtree root of depth 1". This systematic use of precomputed zero values and cached filled subtrees drastically reduces the number of on-chain hash calculations needed for each update. ## The Efficiency and Benefits of Incremental Merkle Trees Incremental Merkle Trees achieve remarkable efficiency through a combination of their core design principles: * **Precomputation of Zero Subtrees:** By knowing the structure and the initial "zero" placeholder value, the hashes for all-zero subtrees of various depths can be calculated once and stored. These serve as default values for unoccupied parts of the tree. * **Caching of Filled Subtrees:** As sections of the tree are populated with actual data, their root hashes are computed and cached. These cached roots represent known, filled portions of the tree, avoiding redundant calculations. * **Logarithmic Complexity:** When adding a new leaf, the number of new hash calculations required is proportional to the depth of the tree (D), not the total number of leaves (2<sup>D</sup>). For a depth 3 tree, only 3 new hashes (nodes along the path to the root) need to be computed for each new leaf insertion, utilizing the precomputed zero roots and cached filled roots. This logarithmic complexity (O(log N) where N is the number of leaves) is a massive improvement over potentially recalculating large portions of the tree. The primary benefit of IMTs is their ability to enable Merkle trees to be stored and managed directly within smart contracts, allowing for on-chain updates without incurring prohibitive gas costs or creating vulnerabilities like unbounded loops or denial-of-service vectors through excessive computation. This makes them extremely useful for a variety of on-chain applications that require dynamic, trustlessly verifiable datasets, such as the deposit management in Tornado Cash or state management in ZK-Rollups like ZK-sync. IMTs represent a significant advancement in building scalable and efficient decentralized systems.
A crucial dive into Understanding Incremental Merkle Trees - Discover how IMTs tackle the challenge of efficient on-chain updates, outperforming standard Merkle trees for dynamic data. This lesson covers their fixed-depth structure, zero-value pre-population, and caching strategies for optimized blockchain interactions.
Previous lesson
Previous
Next lesson
Next
Give us feedback
Course Overview
About the course
Noir syntax
Create a witness, a proof, and Solidity verifier contracts
Use the Poseidon commitment scheme
Create ZK circuits and build a full ZK protocol
ZK Merkle trees and hashing in Noir
Verify signatures without revealing the signer
Build the backend for a full-stack ZK application with noir.js and bb.js
How to create proofs and verify them in a front-end
Last updated on June 12, 2025
Duration: 6min
Duration: 1h 11min
Duration: 2h 12min
Duration: 3h 19min
Course Overview
About the course
Noir syntax
Create a witness, a proof, and Solidity verifier contracts
Use the Poseidon commitment scheme
Create ZK circuits and build a full ZK protocol
ZK Merkle trees and hashing in Noir
Verify signatures without revealing the signer
Build the backend for a full-stack ZK application with noir.js and bb.js
How to create proofs and verify them in a front-end
Last updated on June 12, 2025