Geth changes
A Geth fork was created and added some features to merkleize contracts when performing a full sync.
Changes are described below.
New modules
Two new modules are introduced to the Geth code base:
ssz
(codetrie/ssz
): contains the schema defined for the code tree.codetrie
: contains the logic needed to manipulate the tree.
codemerkleization flag
changes:
cmd/geth/chaincmd.go
cmd/geth/usage.go
cmd/geth/main.go
cmd/utils/flags.go
eth/gen_config.go
eth/backend.go
eth/config.go
A new flag was added in order to indicate geth we want to analyze how the contracts in a block would be merkleized and get the size of proofs required for the contracts in each block.
state_processor
path: core/state_processor.go
One of the main changes in Geth is in the state_processor
.
In function Process
a new "Contract Bag" is
created. A ContractBag
is a map where the key is the code hash and the
value is the contract code, this avoids duplicating identical contracts
with the same bytecode.
Then, each transaction in the block is applied. When the EVM Interpreter is executing the opcodes, collects information about touched opcodes and which chunks the opcodes belong.
After contracts bytecode and touched opcodes/chunks are collected, the stats
are calculated by calling the
bag.Stats()
method, this method
merkleizes the contract, and generates the proof needed for that contract in
that specific block. The proof consists in indices, hashes, zero levels and
leaves. And the sum of those values for all the contracts is considered as
a metric of "proof size" of the block. Indices, Hashes, and Zero Levels are
serialized as RLP and the RLP size is used as another measure of the proof
size.
The results are stored in a csv
(cm-result.csv
) file with the following
fields:
- Block number
- Code size: The sum of all contract's bytecode size in the block.
- Proof size: The sum of all indices, zero levels, hashes, and leaves.
- RLP Size: Size of the RLP-encoded Compressed Multiproof (indices, zero levels, hashes and leaves.)
- UnRLPSize: Size of the uncompressed-RLP Encoded Proof (Indices, Leaves, Hashes)
- SnappySize: Size of the Indices, Zero Levels, Hashes, and Leaves, serialized as RLP and then compressed using Snappy compression.
- Total indices
- Total zero levels
- Total hashes
- Total Leaves
Run (EVM Interpreter)
path: core/vm/interpreter.go
When evaluating the contract code in the Run
function, checks if the
-codemerkleization
flag is set, and ContractBag
was initialized correctly,
also avoids merkleizing code which does not corresponds to a contract (i.e.
contract creation code).
Retrieve the Contract
from the
contract bag, otherwise, if the contract does not exist in the contract bag
yet, a new Contract
object is created.
Marks the chunk corresponding to the current opcode at the current Program Counter (pc) as "touched".
If the current opcode is CODECOPY
it will also touch the
range of opcodes being copied.
When the current opcode is EXTCODECOPY
, it will also retrieve the
code for the "external" contract code we want to copy
and that code will be marked as touched.
In the case if init code (contract creation code) it checks if the total length
of the code size is greater than 0xc000
(49152), if it is greater it will be
Added to a map in
ContractBag
, where the key is the codeHash
of the contract and the value is
its total code size.
Added code
This is the directory tree for the new code added to Geth
├── codetrie
│ ├── ssz
│ │ ├── types_encoding.go
│ │ └── types.go
│ ├── bin_hex_test.go
│ ├── codetrie.go
│ ├── codetrie_test.go
│ ├── contract.go
│ ├── op.go
│ ├── ssz_test.go
│ └── transition.go
Changed code
├── cmd
│ ├── geth
│ │ ├── chaincmd.go
│ │ ├── main.go
│ │ └── usage.go
│ └── utils
│ ├── flags.go
├── core
│ ├── ...
│ ├── state_processor.go
│ ├── ...
├── eth
│ ├── backend.go
│ ├── config.go
│ ├── gen_config.go