At their core, blockchains are databases where data is cryptographically secured using data structures such as Merkle trees, Merkle Patricia trees, and more. The unique characteristic of these data structures is that once data is securely committed to them, it is possible to produce proofs to confirm the inclusion of data inside the structure.
Ethereum employs different types of these trees for different purposes. The Merkle Patricia tree, a variant of the Merkle tree, is used for the State Trie, Receipts Trie, and the Transactions Trie.
Using these data structures provides multiple benefits. First and foremost, the use of Merkle trees and Merkle Patricia trees enhances the security of the Ethereum blockchain. By cryptographically hashing data at every level of the tree, it becomes virtually impossible to alter the data without detection. Any change to a data point would necessitate changes to its corresponding hashes up the tree to the root hash, which is publicly visible in the blockchain header. This fundamental feature of blockchain provides a high level of data integrity and immutability.
Secondly, these trees allow for efficient data verification. For example, when verifying the inclusion of a transaction or the state of a contract, instead of having to search through the entire Ethereum blockchain, one only needs to verify a path within the relevant Merkle tree.
To make generating proofs easy, Ethereum nodes support a eth_getProof
JSON RPC method which can be invoked to get a proof of inclusion (or commonly known as a state proof). Smart contracts cannot directly invoke this method as it is part of the Ethereum node’s interface and is intended to be used by external entities that interact with the node, such as users or applications running on the same machine or over a network. It’s not part of the Ethereum Virtual Machine’s (EVM) capabilities, which is the runtime environment for smart contracts. The EVM and the smart contracts that run within it are intentionally isolated for security reasons. They have limited capabilities and can only perform operations that are deterministic and consume a known amount of gas. They don’t have the ability to directly invoke JSON RPC methods, make HTTP requests, or perform other similar operations.
However, it’s possible for an external entity, such as a client or a server-side application, to call eth_getProof
to get a proof and then supply this proof as data to a function call to a smart contract. Finally, the proof can be verified against the state root of the block who’s hash can be retrieved from the EVM.
A key limitation of this method is that as the amount of data included in the proof grows , the size of the proof correspondingly increases. As a result , the amount of gas necessary to validate such proofs on-chain increases, making standard inclusion proofs not economically feasible in most cases.
Zero-knowledge proofs
Zero-knowledge proofs, such as zk-SNARKs and zk-STARKs, can be leveraged to reduce costs associated with proving the inclusion of data in large data sets and to validate that a workflow has been properly executed.
This is possible due to their ability to allow an entity (the prover) to demonstrate the accuracy of a computation to another party (the verifier) without disclosing any specifics of the computation or the underlying data.
The prover performs the required computation and creates a proof verifying its correctness. The verifier’s task is then to simply check the validity of this proof rather than having to repeat the entire computation. Due to this property, the verifier only needs to access a small piece of data, such as the root of a Merkle tree and not the full dataset involved in the computation.
This shift in approach is significant as it provides a viable way to lower costs associated with using inclusion proofs in smart contracts, especially when large datasets are involved.
Proving the continuity of the chain
A fundamental property of a blockchain is that it is a cryptographically linked list structure, where each element in the list is connected to its predecessor through a cryptographic hash. This linkage ensures the continuity and integrity of the chain, as any tampering with the data in a block would change its hash and break the connection to subsequent blocks.
By leveraging this property, a prover can demonstrate the authenticity of data originating from a blockchain by providing the verifier with a block hash that corresponds to a block older or equal to the blocks on which the computation is being performed. The verifier can then use this block hash to confirm the validity of the data used in the computation without needing to access the entire blockchain.
Providing millions of block headers for on-chain verification would be costly and difficult. To address this challenge, the computation of block hashes and corresponding computation validity proofs can be off-loaded to an off-chain prover. Even then, repeating this process each time data is accessed would be highly inefficient. To address this issue, we implement a cache that enables the prover to continuously update a cryptographic accumulator we call the “Headers Store” .
A defining feature of cryptographic accumulators is their adaptability. Depending on the specific blockchain or layer, accumulators can be tailored to utilize different hashing functions, optimizing both efficiency and computational costs. This flexibility is crucial given the varied designs of blockchains.
For instance, when dealing with a system that primarily uses STARKs or SNARKs, the Poseidon hash would be an ideal choice for hashing block headers and building the accumulator. This hash is more efficient than alternatives, such as the Keccak hash, making the accumulator both STARK and SNARK friendly.
Building the headers store
The first step to building the initial header store is to establish the authenticity of a block header. The verifier must start with a known block hash. A recent blockhash can be obtained by invoking the BLOCKHASH
opcode from within the EVM. We register this blockhash in the state of the contract that manages the process of proving the entire Ethereum blockchain.
The prover’s responsibility is to showcase that they hold the original data (preimage) from which this hash was produced. This data corresponds to the block header.
Next, we need to verify the lineage of blocks by tracing back their origins. From the affirmed block header, one can decode and pinpoint the hash of the preceding (or parent) block. The prover must demonstrate that they are aware of the preimage of this parent block’s hash. This process is iterative, allowing for verification back to the foundational block, commonly termed the ‘genesis block’.
On the prover side, once the original data (preimage) from the hash is identified, it undergoes a hash function specifically tailored for STARK and/or SNARK compatibility. The resulting hash is then incorporated into the accumulator.
The Merkle Mountain Range (MMR) data structure has been selected as our accumulator of choice primarily because it offers cheap appends compared to other similar data structures. This is crucial given the constant and predictable growth of blockchains. Additionally, their design also allows for cheap and efficient data queries, particularly when reading sequential data from large datasets.
Storage Proofs
A storage proof defined by Herodotus merges inclusion proofs, which confirm data’s presence, and proofs of computation which validate the execution of a multi-step workflow to attest the validity of one or multiple elements in a large dataset such as the whole Ethereum blockchain or a rollup.
The off-chain provers general flow is as follows:
Step 1
The Header Store accumulator will need to be used to obtain a commitment we can verify a proof against. If a recent block header is not yet in the accumulator, we must first prove the continuity of the chain to cover the block range where the target header data is found. For example, if the Headers Store smart contract only covers blocks 1,000,000 to 0, and the data we want to prove is found inside block 1,000,001, then an append to the Headers Store has to be made. If the target block is already covered by the accumulator, then we may proceed with generating the required proof.
Step 2
Next, the existence of the specific account within that block must be proven. This is achieved by generating an inclusion proof from the State Trie, which is a Merkle Patricia tree of all the accounts in the Ethereum network. The state root is a component of the block header that was used to derive the blocks commitment hash, which is a part of the Headers Store. Note that the block header hash appended to the accumulator may not necessarily be identical to the blockhash as we may have hashed the block header with a different hash to improve the efficiency of the system.
Step 3
The next step is to prove specific data within the accounts tree. Inclusion proofs for nonce, balance, storage root, or codeHash can now be generated. Each account in Ethereum also has its own storage trie — a Merkle Patricia Tree that holds the account’s storage data. If the data we want to prove is located in the Account Storage Trie, an additional inclusion proof is generated for a particular data point within this storage.
Once the off-chain prover generates all required inclusion proofs and associated proofs of computation we have what we define as a storage proof. The storage proof is next sent on-chain where it will be verified against a single initial commitment such as a blockhash or against the Header Store MMR root.
The use of zero-knowledge proofs, such as zk-SNARKs and zk-STARKs, plays a crucial role in reducing the costs associated with proving the inclusion of data in large datasets and validating the proper execution of workflows. By allowing the prover to demonstrate the accuracy of a computation without disclosing specifics, zero-knowledge proofs enable efficient verification while maintaining data privacy.
The flexibility of cryptographic accumulators, such as the Merkle Mountain Range (MMR) data structure, further enhances the efficiency of the storage proof verification process. By tailoring the hashing functions to optimize efficiency and computational costs, the system can be adapted to various blockchain designs and layer configurations.
In conclusion, Storage Proofs provide a robust and efficient solution for verifying the inclusion and validity of data within the Ethereum blockchain. By leveraging zero-knowledge proofs and cryptographic accumulators, this approach enables cost-effective and secure verification of large datasets, paving the way for more efficient and trustworthy blockchain-based applications.