Introduction

This document describes experiments in Geth using chunk based merkleization according to EIP 2926, analyzing the Merkleization of contracts touched in blocks execution during a full sync.

We are using SSZ Merkleization which was implemented in the fastssz library and included:

New structures/scheme were defined in order to represent a binary tree.
Proof Generation Proof verification

Geth changes

A Geth fork was created and added some features to merkleize contracts when performing a full sync.

Changes are described below.

New modules

Two new modules are introduced to the Geth code base:

ssz (codetrie/ssz): contains the schema defined for the code tree.
codetrie: contains the logic needed to manipulate the tree.

codemerkleization flag

changes:

cmd/geth/chaincmd.go
cmd/geth/usage.go
cmd/geth/main.go
cmd/utils/flags.go
eth/gen_config.go
eth/backend.go
eth/config.go

A new flag was added in order to indicate geth we want to analyze how the contracts in a block would be merkleized and get the size of proofs required for the contracts in each block.

One of the main changes in Geth is in the state_processor. In function Process a new "Contract Bag" is created. A ContractBag is a map where the key is the code hash and the value is the contract code, this avoids duplicating identical contracts with the same bytecode.

Then, each transaction in the block is applied. When the EVM Interpreter is executing the opcodes, collects information about touched opcodes and which chunks the opcodes belong.

After contracts bytecode and touched opcodes/chunks are collected, the stats are calculated by calling the bag.Stats() method, this method merkleizes the contract, and generates the proof needed for that contract in that specific block. The proof consists in indices, hashes, zero levels and leaves. And the sum of those values for all the contracts is considered as a metric of "proof size" of the block. Indices, Hashes, and Zero Levels are serialized as RLP and the RLP size is used as another measure of the proof size.

The results are stored in a csv (cm-result.csv) file with the following fields:

Block number
Code size: The sum of all contract's bytecode size in the block.
Proof size: The sum of all indices, zero levels, hashes, and leaves.
RLP Size: Size of the RLP-encoded Compressed Multiproof (indices, zero levels, hashes and leaves.)
UnRLPSize: Size of the uncompressed-RLP Encoded Proof (Indices, Leaves, Hashes)
SnappySize: Size of the Indices, Zero Levels, Hashes, and Leaves, serialized as RLP and then compressed using Snappy compression.
Total indices
Total zero levels
Total hashes
Total Leaves

Run (EVM Interpreter)

path: core/vm/interpreter.go

When evaluating the contract code in the Run function, checks if the -codemerkleization flag is set, and ContractBag was initialized correctly, also avoids merkleizing code which does not corresponds to a contract (i.e. contract creation code).

Retrieve the Contract from the contract bag, otherwise, if the contract does not exist in the contract bag yet, a new Contract object is created.

Marks the chunk corresponding to the current opcode at the current Program Counter (pc) as "touched".

If the current opcode is CODECOPY it will also touch the range of opcodes being copied.

When the current opcode is EXTCODECOPY, it will also retrieve the code for the "external" contract code we want to copy and that code will be marked as touched.

In the case if init code (contract creation code) it checks if the total length of the code size is greater than 0xc000 (49152), if it is greater it will be Added to a map in ContractBag, where the key is the codeHash of the contract and the value is its total code size.

Added code

This is the directory tree for the new code added to Geth

├── codetrie
│   ├── ssz
│   │   ├── types_encoding.go
│   │   └── types.go
│   ├── bin_hex_test.go
│   ├── codetrie.go
│   ├── codetrie_test.go
│   ├── contract.go
│   ├── op.go
│   ├── ssz_test.go
│   └── transition.go

Changed code

├── cmd
│   ├── geth
│   │   ├── chaincmd.go
│   │   ├── main.go
│   │   └── usage.go
│   └── utils
│       ├── flags.go
├── core
│   ├── ...
│   ├── state_processor.go
│   ├── ...
├── eth
│   ├── backend.go
│   ├── config.go
│   ├── gen_config.go

Schema

The SSZ schema is defined as a series of types needed to represent the Code trie. After defining these types, the schema is processed by FastSSZ in order to generate all the data needed to create the tree, generate and verify proofs, this process generates Go source code in the types_encoding.go file.

Types

path: codetrie/ssz/types.go

CodeTrie

CodeTrie is the type created to represent the code tree:

type CodeTrie struct {
	Metadata *Metadata
	Chunks   []*Chunk `ssz-max:"1024"`
}

The code trie consists in some Metadata and the code Chunks.

Metadata

The metadata will store the Version, CodeHash and total CodeLength, according to the EIP

type Metadata struct {
	Version    uint8
	CodeHash   Hash `ssz-size:"32"`
	CodeLength uint16
}

Hash

The Hash used in the Matadata's CodeHash is defined as a byte array.

type Hash []byte

Chunk

Each Chunk contains a FIO, which corresponds to the First Instruction Offset (i.e. not part of a PUSH* opcode's input).

The Chunk type also contains an array of byte code up to 32 bytes each.

type Chunk struct {
	FIO  uint8
	Code []byte `ssz-size:"32"` // Last chunk is right-padded with zeros
}

CodeTrie

A new go module codetrie was added, this module contains functions that allows to create contract objects, and the corresponding trees for each contract.

ContractBag (Type)

path: codetrie/contract.go.

A ContractBag contains a contracts map where the key is a Hash (the contract code hash) and the value is a Contract object pointer.

type ContractBag struct {
	contracts map[common.Hash]*Contract
}

Contract (Type)

path: codetrie/contract.go.

A contract object contains code, and a list of chunks that were touched during the execution of the contract. When a chunk is touched, the value of the element with key index will be set to true.

type Contract struct {
	code          []byte
	touchedChunks map[int]bool
}

CMStats (type)

path: codetrie/contract.go

type CMStats struct {
	NumContracts int
	ProofSize    int
	CodeSize     int
	ProofStats   *ssz.ProofStats
	RLPStats     *ssz.RLPStats
}

Chunk (Type)

path: codetrie/codetrie.go.

A chunk object containes two fields:

fio: is the first instrucction offset in the chunk.
code: the bytecode in the chunk, it can be up to 32 bytes.

type Chunk struct {
	fio  uint8 // firstInstructionOffset
	code []byte
}

RLPStats (type)

type RLPStats struct {
	RLPSize    int
	UnRLPSize  int
	SnappySize int
}

NewContractBag

path: codetrie/contract.go

Returns a new empty ContractBag object.

func NewContractBag() *ContractBag {
	return &ContractBag{
		contracts: make(map[common.Hash]*Contract),
	}
}

Stats (ContractBag Method)

path: codetrie/contract.go

For each contract in the ContractBag, gets the code size, Proves the contract, obtaining a Multiproof and a Compressed Multiproof.

Stats for the Multiproofs, CompressedMultiproofs, and for the RLP encoded proof are collected in the main Stats object.

func (b *ContractBag) Stats() (*CMStats, error) {
	stats := NewCMStats()
	stats.NumContracts = len(b.contracts)
	for _, c := range b.contracts {
		stats.CodeSize += c.CodeSize()
		rawProof, err := c.Prove()
		if err != nil {
			return nil, err
		}
		p := ssz.NewMultiproof(rawProof)
		cp := ssz.NewCompressedMultiproof(rawProof.Compress())

		ps := cp.ProofStats()
		stats.ProofStats.Add(ps)

		rs, err := ssz.NewRLPStats(p, cp)
		if err != nil {
			return nil, err
		}
		stats.RLPStats.Add(rs)
	}
	stats.ProofSize = stats.ProofStats.Sum()
	return stats, nil
}

Get (ContractBag Method)

path: codetrie/contract.go

Checks if contract is already in the contracts bag. To do that checks if the hash for the input code already exists in the contracts map. If it exists it will return the pointer to the Contract corresponding to that code hash.

If the contract was not previously added to the contract bag it will create a new contract corresponding to the provided bytecode.

CodeSize (Contract Method)

path: codetrie/contract.go

Returns the current contracts code size.

Prove (Contract Method)

path: codetrie/contract.go

Creates a new SSZ tree using the contracts code and specifying the chunk size of 32.

Create new array of metadata indices, it is initialized with the indexes 7, 8, 9, and 10.

mdIndices := []int{7, 8, 9, 10}
=> [7, 8, 9, 10]

Another array is created containing the touched chunks, each one of this chunks obtains an index which will correspond to the place they belong in the tree.

Creates a Multiproof based on the array of metadata indices and chunk indices.

The multiproof is compressed and returned.

This Multiproof will have the following Tree structure:

          1
         / \
        /   \
       /     \
      /       \
     /         \
    2           3
   / \        /   \ 
  4   5      6     7
 / \ / \    / \  
8  9 10 11 ... ...

6 - Root of the tree of depth 10 (Contains the touched chunks)
7 - Number of chunks
8 - Version
9 - CodeHash
10 - Code length
11 - Empty field

NewContract

path: codetrie/contract.go

Returns a Contract object containing the code and an empty map of touchedChunks.

TouchPC (Contract Method)

path: codetrie/contract.go

Calculates the corresponding chunk number for the pc (chunk_number = pc / 32).

Adds the chunk as touched by creating an element (if not exists) in the touchedChunks map, where the key is the chunk number and the value is true.

TouchRange (Contract Method)

path: codetrie/contract.go

Calculate the chunks for the given range of opcodes and marks them as touched.

GetSSZTree

path: codetrie/codetrie.go

It gets a new CodeTrie. Then calls the fastssz generated code to get the SSZ Tree which will get returned.

prepareSSZ

path: codetrie/codetrie.go

Chunkifys the provided code, if the requested chunk size is different than 32, an error MerkleizeSSZ only supports chunk size of 32 occurs.

Calculates the First Instruction Offsets (FIO) for each one of the chunks

Creates the metadata, which consists in version (0), the code hash and code length.

Returns a new CodeTrie containing the metadataand chunks.

Chunkify

path: codetrie/codetrie.go

Calculates the total number of chunks are required for the given bytecode.

Splits the code into 32 bits chunks and also finds the first instruction offset (FIO), this is to consider that the initical bytes might be part of a previous PUSH* input and to not consider those as opcodes.

setFIO

Loops the bytecode contained in the chunk's, when a PUSH* opcode is found, it will calculate if the input data for PUSH* is exceeding the current chunk, if that's the case it will calculate which offset is the Firs Instruction in the next chunk (avoid considering PUSH* data as instructions).

NewRLPStats

path: codetrie/ssz/proof.go

Serializes the CompressedMultiproof as RLP and gets the RLP size (RLPSize).

Serializes the uncompressed Multiproof as RLP and gets the RLP size (UnRLPSize).

The uncompressed Multiproof's RLP is compresssed using Snappy compression (SnappySize).

The RLPStats object is returned containing the previous fields RLPSize, UnRLPSize, and SnappySize.

FastSSZ

https://github.com/ferranbt/fastssz

FastSSZ Changes

#26 - Added the possibility to use a different hash function. Sina wanted to use keccak256 as hash function.

#27 - Added PutUint32, PutUint16, and PutUint8 to the hasher. Used to only have PutUint64.

#28 - Created new schema, - Added verification function for one of the structs, - Some hand-crafted test cases, - Pending to make verify more general so its not only for one struct. - New Files: - tests/codetrie.go: - tests/codetrie_encoding.go: Automatically generated by fastssz - tests/codetrie_test.go: Verify proof tests

#32 - Added new proof.go; which included functions to verify proofs and verify multiproof as other helper functions.

#33 - Added support for multiproof generation

#38 - Adds a --experimental flag to get the tree-backing of an object - Given the tree-backing you can compute the root, generate and verify proofs.

#39 (pending PR) - Remove duplicate leafs from proof

ProveMulti

path: tree.go

Input: indices
Output: Multiproof

Get the required indices (getRequiredIndices).

Defines proof as a Multiproof object, declared with the provided indices, an empty Leaves array or arrays, an empty Hashes array of arrays.

For each provided index, get the node based on the index. Adds the node as a Leave.

For each reqIndices, get the node based on that index, Adds the node as a Hash.

Returns the proof.

func (n *Node) ProveMulti(indices []int) (*Multiproof, error) {
	reqIndices := getRequiredIndices(indices)
	proof := &Multiproof{Indices: indices, Leaves: make([][]byte, len(indices)), Hashes: make([][]byte, len(reqIndices))}

	for i, gi := range indices {
		node, err := n.Get(gi)
		if err != nil {
			return nil, err
		}
		proof.Leaves[i] = node.value
	}

	for i, gi := range reqIndices {
		cur, err := n.Get(gi)
		if err != nil {
			return nil, err
		}
		proof.Hashes[i] = hashNode(cur)
	}

	return proof, nil
}

Compress (Multiproof)

(Multiproof) Compress: Returns a *CompressedMultiproof

tree.go

// Compress returns a new proof with zero hashes omitted.
// See `CompressedMultiproof` for more info.
func (p *Multiproof) Compress() *CompressedMultiproof {
	compressed := &CompressedMultiproof{
		Indices:    p.Indices,
		Leaves:     p.Leaves,
		Hashes:     make([][]byte, 0, len(p.Hashes)),
		ZeroLevels: make([]int, 0, len(p.Hashes)),
	}

	for _, h := range p.Hashes {
		if l, ok := zeroHashLevels[string(h)]; ok {
			compressed.ZeroLevels = append(compressed.ZeroLevels, l)
			compressed.Hashes = append(compressed.Hashes, nil)
		} else {
			compressed.Hashes = append(compressed.Hashes, h)
		}
	}

	return compressed
}

Multiproof:

// Multiproof represents a merkle proof of several leaves.
type Multiproof struct {
	Indices []int
	Leaves  [][]byte
	Hashes  [][]byte
}

CompressedMultiproof:

// CompressedMultiproof represents a compressed merkle proof of several leaves.
// Compression is achieved by omitting zero hashes (and their hashes). `ZeroLevels`
// contains information which helps the verifier fill in those hashes.
type CompressedMultiproof struct {
	Indices    []int
	Leaves     [][]byte
	Hashes     [][]byte
	ZeroLevels []int // Stores the level for every omitted zero hash in the proof
}

Code Merkleization