A Geth fork was created and added some features to merkleize contracts when performing a full sync.
Changes are described below.
Two new modules are introduced to the Geth code base:
codetrie/ssz): contains the schema defined for the code tree.
codetrie: contains the logic needed to manipulate the tree.
A new flag was added in order to indicate geth we want to analyze how the contracts in a block would be merkleized and get the size of proofs required for the contracts in each block.
One of the main changes in Geth is in the
Process a new "Contract Bag" is
ContractBag is a map where the key is the code hash and the
value is the contract code, this avoids duplicating identical contracts
with the same bytecode.
Then, each transaction in the block is applied. When the EVM Interpreter is executing the opcodes, collects information about touched opcodes and which chunks the opcodes belong.
After contracts bytecode and touched opcodes/chunks are collected, the stats
are calculated by calling the
bag.Stats() method, this method
merkleizes the contract, and generates the proof needed for that contract in
that specific block. The proof consists in indices, hashes, zero levels and
leaves. And the sum of those values for all the contracts is considered as
a metric of "proof size" of the block. Indices, Hashes, and Zero Levels are
serialized as RLP and the RLP size is used as another measure of the proof
The results are stored in a
cm-result.csv) file with the following
- Block number
- Code size: The sum of all contract's bytecode size in the block.
- Proof size: The sum of all indices, zero levels, hashes, and leaves.
- RLP Size: Size of the RLP-encoded Compressed Multiproof (indices, zero levels, hashes and leaves.)
- UnRLPSize: Size of the uncompressed-RLP Encoded Proof (Indices, Leaves, Hashes)
- SnappySize: Size of the Indices, Zero Levels, Hashes, and Leaves, serialized as RLP and then compressed using Snappy compression.
- Total indices
- Total zero levels
- Total hashes
- Total Leaves
When evaluating the contract code in the
Run function, checks if the
-codemerkleization flag is set, and
ContractBag was initialized correctly,
also avoids merkleizing code which does not corresponds to a contract (i.e.
contract creation code).
Contract from the
contract bag, otherwise, if the contract does not exist in the contract bag
yet, a new
Contract object is created.
Marks the chunk corresponding to the current opcode at the current Program Counter (pc) as "touched".
If the current opcode is
CODECOPY it will also touch the
range of opcodes being copied.
When the current opcode is
EXTCODECOPY, it will also retrieve the
code for the "external" contract code we want to copy
and that code will be marked as touched.
In the case if init code (contract creation code) it checks if the total length
of the code size is greater than
0xc000 (49152), if it is greater it will be
Added to a map in
ContractBag, where the key is the
codeHash of the contract and the value is
its total code size.
This is the directory tree for the new code added to Geth
├── codetrie │ ├── ssz │ │ ├── types_encoding.go │ │ └── types.go │ ├── bin_hex_test.go │ ├── codetrie.go │ ├── codetrie_test.go │ ├── contract.go │ ├── op.go │ ├── ssz_test.go │ └── transition.go
├── cmd │ ├── geth │ │ ├── chaincmd.go │ │ ├── main.go │ │ └── usage.go │ └── utils │ ├── flags.go ├── core │ ├── ... │ ├── state_processor.go │ ├── ... ├── eth │ ├── backend.go │ ├── config.go │ ├── gen_config.go