Introduction
This document describes experiments in Geth using chunk based merkleization according to EIP 2926, analyzing the Merkleization of contracts touched in blocks execution during a full sync.
We are using SSZ Merkleization which was implemented in the fastssz library and included:
- New structures/scheme were defined in order to represent a binary tree.
- Proof Generation Proof verification
Geth changes
A Geth fork was created and added some features to merkleize contracts when performing a full sync.
Changes are described below.
New modules
Two new modules are introduced to the Geth code base:
ssz
(codetrie/ssz
): contains the schema defined for the code tree.codetrie
: contains the logic needed to manipulate the tree.
codemerkleization flag
changes:
cmd/geth/chaincmd.go
cmd/geth/usage.go
cmd/geth/main.go
cmd/utils/flags.go
eth/gen_config.go
eth/backend.go
eth/config.go
A new flag was added in order to indicate geth we want to analyze how the contracts in a block would be merkleized and get the size of proofs required for the contracts in each block.
state_processor
path: core/state_processor.go
One of the main changes in Geth is in the state_processor
.
In function Process
a new "Contract Bag" is
created. A ContractBag
is a map where the key is the code hash and the
value is the contract code, this avoids duplicating identical contracts
with the same bytecode.
Then, each transaction in the block is applied. When the EVM Interpreter is executing the opcodes, collects information about touched opcodes and which chunks the opcodes belong.
After contracts bytecode and touched opcodes/chunks are collected, the stats
are calculated by calling the
bag.Stats()
method, this method
merkleizes the contract, and generates the proof needed for that contract in
that specific block. The proof consists in indices, hashes, zero levels and
leaves. And the sum of those values for all the contracts is considered as
a metric of "proof size" of the block. Indices, Hashes, and Zero Levels are
serialized as RLP and the RLP size is used as another measure of the proof
size.
The results are stored in a csv
(cm-result.csv
) file with the following
fields:
- Block number
- Code size: The sum of all contract's bytecode size in the block.
- Proof size: The sum of all indices, zero levels, hashes, and leaves.
- RLP Size: Size of the RLP-encoded Compressed Multiproof (indices, zero levels, hashes and leaves.)
- UnRLPSize: Size of the uncompressed-RLP Encoded Proof (Indices, Leaves, Hashes)
- SnappySize: Size of the Indices, Zero Levels, Hashes, and Leaves, serialized as RLP and then compressed using Snappy compression.
- Total indices
- Total zero levels
- Total hashes
- Total Leaves
Run (EVM Interpreter)
path: core/vm/interpreter.go
When evaluating the contract code in the Run
function, checks if the
-codemerkleization
flag is set, and ContractBag
was initialized correctly,
also avoids merkleizing code which does not corresponds to a contract (i.e.
contract creation code).
Retrieve the Contract
from the
contract bag, otherwise, if the contract does not exist in the contract bag
yet, a new Contract
object is created.
Marks the chunk corresponding to the current opcode at the current Program Counter (pc) as "touched".
If the current opcode is CODECOPY
it will also touch the
range of opcodes being copied.
When the current opcode is EXTCODECOPY
, it will also retrieve the
code for the "external" contract code we want to copy
and that code will be marked as touched.
In the case if init code (contract creation code) it checks if the total length
of the code size is greater than 0xc000
(49152), if it is greater it will be
Added to a map in
ContractBag
, where the key is the codeHash
of the contract and the value is
its total code size.
Added code
This is the directory tree for the new code added to Geth
├── codetrie
│ ├── ssz
│ │ ├── types_encoding.go
│ │ └── types.go
│ ├── bin_hex_test.go
│ ├── codetrie.go
│ ├── codetrie_test.go
│ ├── contract.go
│ ├── op.go
│ ├── ssz_test.go
│ └── transition.go
Changed code
├── cmd
│ ├── geth
│ │ ├── chaincmd.go
│ │ ├── main.go
│ │ └── usage.go
│ └── utils
│ ├── flags.go
├── core
│ ├── ...
│ ├── state_processor.go
│ ├── ...
├── eth
│ ├── backend.go
│ ├── config.go
│ ├── gen_config.go
Schema
The SSZ schema is defined as a series of types needed to represent
the Code trie. After defining these types, the schema is processed by
FastSSZ in order to generate all the data needed to create the
tree, generate and verify proofs, this process generates Go source code in the
types_encoding.go
file.
Types
path: codetrie/ssz/types.go
CodeTrie
CodeTrie
is the type created to represent the code tree:
type CodeTrie struct {
Metadata *Metadata
Chunks []*Chunk `ssz-max:"1024"`
}
The code trie consists in some Metadata
and the code
Chunk
s.
Metadata
The metadata will store the Version
, CodeHash
and total CodeLength
,
according to the EIP
type Metadata struct {
Version uint8
CodeHash Hash `ssz-size:"32"`
CodeLength uint16
}
Hash
The Hash used in the Matadata's CodeHash
is defined as a byte array.
type Hash []byte
Chunk
Each Chunk contains a FIO, which corresponds to the First Instruction Offset
(i.e. not part of a PUSH*
opcode's input).
The Chunk type also contains an array of byte code up to 32 bytes each.
type Chunk struct {
FIO uint8
Code []byte `ssz-size:"32"` // Last chunk is right-padded with zeros
}
CodeTrie
A new go module codetrie
was added, this module contains functions that
allows to create contract objects, and the corresponding trees for each
contract.
ContractBag (Type)
path: codetrie/contract.go
.
A ContractBag
contains a contracts
map where the key is a Hash
(the
contract code hash) and the value is a Contract
object pointer.
type ContractBag struct {
contracts map[common.Hash]*Contract
}
Contract (Type)
path: codetrie/contract.go
.
A contract object contains code
, and a list of chunks that were touched
during the execution of the contract. When a chunk is touched, the value of the
element with key index
will be set to true
.
type Contract struct {
code []byte
touchedChunks map[int]bool
}
CMStats (type)
path: codetrie/contract.go
type CMStats struct {
NumContracts int
ProofSize int
CodeSize int
ProofStats *ssz.ProofStats
RLPStats *ssz.RLPStats
}
Chunk (Type)
path: codetrie/codetrie.go
.
A chunk object containes two fields:
fio
: is the first instrucction offset in the chunk.code
: the bytecode in the chunk, it can be up to 32 bytes.
type Chunk struct {
fio uint8 // firstInstructionOffset
code []byte
}
RLPStats (type)
type RLPStats struct {
RLPSize int
UnRLPSize int
SnappySize int
}
NewContractBag
path: codetrie/contract.go
Returns a new empty ContractBag
object.
func NewContractBag() *ContractBag {
return &ContractBag{
contracts: make(map[common.Hash]*Contract),
}
}
Stats (ContractBag Method)
path: codetrie/contract.go
For each contract in the ContractBag, gets the code size, Proves the contract, obtaining a Multiproof and a Compressed Multiproof.
Stats for the Multiproofs, CompressedMultiproofs, and for the RLP encoded proof are collected in the main Stats object.
func (b *ContractBag) Stats() (*CMStats, error) {
stats := NewCMStats()
stats.NumContracts = len(b.contracts)
for _, c := range b.contracts {
stats.CodeSize += c.CodeSize()
rawProof, err := c.Prove()
if err != nil {
return nil, err
}
p := ssz.NewMultiproof(rawProof)
cp := ssz.NewCompressedMultiproof(rawProof.Compress())
ps := cp.ProofStats()
stats.ProofStats.Add(ps)
rs, err := ssz.NewRLPStats(p, cp)
if err != nil {
return nil, err
}
stats.RLPStats.Add(rs)
}
stats.ProofSize = stats.ProofStats.Sum()
return stats, nil
}
Get (ContractBag Method)
path: codetrie/contract.go
Checks if contract is already in the contracts bag. To do that checks if the
hash for the input code already exists in the contracts
map. If it exists it
will return the pointer to the Contract
corresponding to that code hash.
If the contract was not previously added to the contract bag it will create a new contract corresponding to the provided bytecode.
CodeSize (Contract Method)
path: codetrie/contract.go
Returns the current contracts code size.
Prove (Contract Method)
path: codetrie/contract.go
Creates a new SSZ tree using the contracts code and specifying the chunk size of 32.
Create new array of metadata indices, it is initialized with the indexes 7, 8, 9, and 10.
mdIndices := []int{7, 8, 9, 10}
=> [7, 8, 9, 10]
Another array is created containing the touched chunks, each one of this chunks obtains an index which will correspond to the place they belong in the tree.
Creates a Multiproof based on the array of metadata indices and chunk indices.
The multiproof is compressed and returned.
This Multiproof will have the following Tree structure:
1
/ \
/ \
/ \
/ \
/ \
2 3
/ \ / \
4 5 6 7
/ \ / \ / \
8 9 10 11 ... ...
6
- Root of the tree of depth 10 (Contains the touched chunks)7
- Number of chunks8
- Version9
- CodeHash10
- Code length11
- Empty field
NewContract
path: codetrie/contract.go
Returns a Contract object containing the code
and an empty map of touchedChunks
.
TouchPC (Contract Method)
path: codetrie/contract.go
Calculates the corresponding chunk number for the pc
(chunk_number = pc / 32
).
Adds the chunk as touched by creating an element (if not exists) in the
touchedChunks
map, where the key is the chunk number
and the value is
true
.
TouchRange (Contract Method)
path: codetrie/contract.go
Calculate the chunks for the given range of opcodes and marks them as touched.
GetSSZTree
path: codetrie/codetrie.go
It gets a new CodeTrie. Then calls the fastssz generated code to get the SSZ Tree which will get returned.
prepareSSZ
path: codetrie/codetrie.go
Chunkifys the provided code, if the requested chunk size is
different than 32, an error MerkleizeSSZ only supports chunk size of 32
occurs.
Calculates the First Instruction Offsets (FIO) for each one of the chunks
Creates the metadata
, which consists in version (0
), the code hash and code
length.
Returns a new CodeTrie containing the metadata
and
chunks.
Chunkify
path: codetrie/codetrie.go
Calculates the total number of chunks are required for the given bytecode.
Splits the code into 32 bits chunks and also finds the first instruction offset (FIO), this is to consider that the initical bytes might be part of a previous PUSH* input and to not consider those as opcodes.
setFIO
Loops the bytecode contained in the chunk's, when a PUSH*
opcode is found, it
will calculate if the input data for PUSH*
is exceeding the current chunk, if
that's the case it will calculate which offset is the Firs Instruction in the
next chunk (avoid considering PUSH*
data as instructions).
NewRLPStats
path: codetrie/ssz/proof.go
Serializes the CompressedMultiproof as RLP
and gets the RLP size (RLPSize
).
Serializes the uncompressed Multiproof as RLP and
gets the RLP size (UnRLPSize
).
The uncompressed Multiproof's RLP is compresssed
using Snappy compression (SnappySize
).
The RLPStats object is returned containing the previous
fields RLPSize
, UnRLPSize
, and SnappySize
.
FastSSZ
https://github.com/ferranbt/fastssz
FastSSZ Changes
#26 - Added the possibility
to use a different hash function. Sina wanted to use keccak256
as hash
function.
#27 - Added PutUint32
,
PutUint16
, and PutUint8
to the hasher. Used to only have PutUint64
.
#28
- Created new schema,
- Added verification function for one of the structs,
- Some hand-crafted test cases,
- Pending to make verify
more general so its not only for one struct.
- New Files:
- tests/codetrie.go
:
- tests/codetrie_encoding.go
: Automatically generated by fastssz
- tests/codetrie_test.go
: Verify proof tests
#32
- Added new proof.go
; which included functions to verify proofs and verify
multiproof as other helper functions.
#33 - Added support for multiproof generation
#38
- Adds a --experimental
flag to get the tree-backing of an object
- Given the tree-backing you can compute the root, generate and verify
proofs.
#39 (pending PR) - Remove duplicate leafs from proof
ProveMulti
path: tree.go
- Input: indices
- Output: Multiproof
Get the required indices (getRequiredIndices
).
Defines proof
as a Multiproof
object, declared with the provided indices,
an empty Leaves array or arrays, an empty Hashes array of arrays.
For each provided index, get the node based on the index. Adds the node as a Leave.
For each reqIndices
, get the node based on that index, Adds the node as
a Hash.
Returns the proof.
func (n *Node) ProveMulti(indices []int) (*Multiproof, error) {
reqIndices := getRequiredIndices(indices)
proof := &Multiproof{Indices: indices, Leaves: make([][]byte, len(indices)), Hashes: make([][]byte, len(reqIndices))}
for i, gi := range indices {
node, err := n.Get(gi)
if err != nil {
return nil, err
}
proof.Leaves[i] = node.value
}
for i, gi := range reqIndices {
cur, err := n.Get(gi)
if err != nil {
return nil, err
}
proof.Hashes[i] = hashNode(cur)
}
return proof, nil
}
Compress (Multiproof)
- (Multiproof) Compress: Returns a
*CompressedMultiproof
tree.go
// Compress returns a new proof with zero hashes omitted.
// See `CompressedMultiproof` for more info.
func (p *Multiproof) Compress() *CompressedMultiproof {
compressed := &CompressedMultiproof{
Indices: p.Indices,
Leaves: p.Leaves,
Hashes: make([][]byte, 0, len(p.Hashes)),
ZeroLevels: make([]int, 0, len(p.Hashes)),
}
for _, h := range p.Hashes {
if l, ok := zeroHashLevels[string(h)]; ok {
compressed.ZeroLevels = append(compressed.ZeroLevels, l)
compressed.Hashes = append(compressed.Hashes, nil)
} else {
compressed.Hashes = append(compressed.Hashes, h)
}
}
return compressed
}
Multiproof:
// Multiproof represents a merkle proof of several leaves.
type Multiproof struct {
Indices []int
Leaves [][]byte
Hashes [][]byte
}
CompressedMultiproof:
// CompressedMultiproof represents a compressed merkle proof of several leaves.
// Compression is achieved by omitting zero hashes (and their hashes). `ZeroLevels`
// contains information which helps the verifier fill in those hashes.
type CompressedMultiproof struct {
Indices []int
Leaves [][]byte
Hashes [][]byte
ZeroLevels []int // Stores the level for every omitted zero hash in the proof
}