evm-opcodes: Opcode types for Ethereum Virtual Machine (EVM)

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

This library provides opcode types for the Ethereum Virtual Machine.


[Skip to Readme]

Properties

Versions 0.1.0, 0.1.0
Change log None available
Dependencies base (>=4.12 && <4.16), bytestring (>=0.10 && <0.12), cereal (==0.5.*), containers (==0.6.*), data-dword (==0.3.*), text (==1.2.*) [details]
License MIT
Copyright 2020 Simon Shine
Author Simon Shine
Maintainer shreddedglory@gmail.com
Category Ethereum, Finance, Network
Home page https://github.com/sshine/evm-opcodes
Bug tracker https://github.com/sshine/evm-opcodes/issues
Source repo head: git clone https://github.com/sshine/evm-opcodes
Uploaded by sshine at 2021-09-12T10:38:00Z

Modules

[Index]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for evm-opcodes-0.1.0

[back to package description]

evm-opcodes

Haskell CI Status

This Haskell library provides opcode types for the Ethereum Virtual Machine (EVM).

The library has two purposes:

The library has one parameterised type, Opcode' j where j is the annotation for the jump-related instructions JUMP, JUMPI and JUMPDEST, and it has three concrete variants:

The library has a fixpoint algorithm that translates labelled jumps into positional jumps, and it has another function that translates those positional jumps into plain EVM opcodes where a constant is pushed before a jump is made.

Library conventions

When the documentation refers to a lowercase opcode (e.g. push1), then that means the EVM opcode. When the documentation instead refers to an uppercase opcode (e.g. PUSH), then that refers to the Haskell data constructor.

While dup1-dup16, swap1-swap16 and log1-log4 were implemented using the data constructors DUP, SWAP and LOG that are not ergonomic to use but convenient for the library maintainer, pattern synonyms were made:

When pushing a constant to the stack, EVM uses push1, push2, ..., push32 where the number 1-32 refers to how many bytes the constant occupies. Instead of having 32 unique push commands, this library has a single PUSH !Word256 constructor that serializes to the right push1, push2, etc.

Example

Imagine translating the following C program to EVM opcodes:

int x = 1;
while (x != 0) { x *= 2 };

Since EVM is stack-based, let's put x on the stack.

λ> import EVM.Opcode
λ> import EVM.Opcode.Labelled as L
λ> import EVM.Opcode.Positional as P

λ> let opcodes = [PUSH 1,JUMPDEST "loop",DUP1,ISZERO,JUMPI "end",PUSH 2,MUL,JUMP "loop",JUMPDEST "end"]

λ> L.translate opcodes
Right [PUSH 1,JUMPDEST 2,DUP1,ISZERO,JUMPI 14,PUSH 2,MUL,JUMP 2,JUMPDEST 14]

λ> P.translate <$> L.translate opcodes
Right [PUSH 1,JUMPDEST,DUP1,ISZERO,PUSH 14,JUMPI,PUSH 2,MUL,PUSH 2,JUMP,JUMPDEST]

λ> fmap opcodeText . P.translate <$> L.translate opcodes
Right ["push1 1","jumpdest","dup1","iszero","push1 14","jumpi","push1 2","mul","push1 2","jump","jumpdest"]

Accounts for size of PUSHes when doing absolute jumps

EVM's jump and jumpi instructions are parameterless. Instead they pop and jump to the address on the top of the stack. In order to perform absolute jumps in the code, it is necessary to PUSH an address on the stack first. This is inconvenient, and so PositionalOpcode and LabelledOpcode are easier to use.

But what's more inconvenient is what happens to the offset of an absolute jump when the address being jumped to crosses a boundary where its byte index can no longer be represented by the same amount of bytes.

Take for example this EVM code:

0x00: push1 255
0x02: jump
0x03: stop
0x04: stop
0x05: stop
...
0xfe: stop
0xff: jumpdest

which can be represented with the following LabelledOpcode:

λ> import EVM.Opcode
λ> import EVM.Opcode.Labelled as L
λ> import EVM.Opcode.Positional as P

λ> let opcodes = [JUMP "skip"] <> replicate 252 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push1 255","jump","stop","stop","stop",...,"jumpdest"]

Note especially the byte size of a PUSH 255 vs. a PUSH 256:

λ> opcodeSize (PUSH 255)
2
λ> opcodeSize (PUSH 256)
3

Then add another one-byte opcode between the jump and the jumpdest:

λ> let opcodes = [JUMP "skip"] <> replicate 253 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push2 257","jump","stop","stop","stop",...,"jumpdest"]

Even though one byte was added, because the address of jumpdest is now greater than 255, all references to it now take more than 2 bytes. Concretely, one reference went from 2 bytes to 3 bytes, or rather, one JUMP "skip" became a push2 257 instead of a push1 255. And if there were many such jumps, this amounts to a bit of book-keeping.

This happens at subsequent boundaries as well. While this library handles each boundary the same way, it is unlikely to have EVM bytecode of more than a few kilobytes at present time.

λ> let opcodes = [JUMP "skip"] <> replicate 65532 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push3 65537","jump","stop","stop","stop",...,"jumpdest"]