# evm-opcodes [![Haskell CI Status](https://github.com/sshine/evm-opcodes/workflows/Haskell%20CI/badge.svg)](https://github.com/sshine/evm-opcodes/actions?query=workflow%3A%22Haskell+CI%22) This Haskell library provides opcode types for the Ethereum Virtual Machine (EVM). The library has two purposes: - Provide interface between EVM-related libraries: Lower cost of interoperability. - Provide easy access to labelled jumps. Labelled jumps are most useful when generating EVM code, but actual EVM `jump` instructions pop the destination address from the stack. The library has one parameterised type, `Opcode' j` where `j` is the annotation for the jump-related instructions `JUMP`, `JUMPI` and `JUMPDEST`, and it has three concrete variants: - `type Opcode = Opcode' ()` - `type PositionalOpcode = Opcode' Word` - `type LabelledOpcode = Opcode' Text` The library has a fixpoint algorithm that translates labelled jumps into positional jumps, and it has another function that translates those positional jumps into plain EVM opcodes where a constant is pushed before a jump is made. ## Library conventions When the documentation refers to a lowercase opcode (e.g. `push1`), then that means the EVM opcode. When the documentation instead refers to an uppercase opcode (e.g. `PUSH`), then that refers to the Haskell data constructor. While `dup1`-`dup16`, `swap1`-`swap16` and `log1`-`log4` were implemented using the data constructors `DUP`, `SWAP` and `LOG` that are not ergonomic to use but convenient for the library maintainer, pattern synonyms were made: - `DUP1`, `DUP2`, ..., `DUP16` - `SWAP1`, `SWAP2`, ..., `SWAP16` - `LOG1`, `LOG2`, `LOG3`, `LOG4` When pushing a constant to the stack, EVM uses `push1`, `push2`, ..., `push32` where the number 1-32 refers to how many bytes the constant occupies. Instead of having 32 unique push commands, this library has a single `PUSH !Word256` constructor that serializes to the right `push1`, `push2`, etc. ## Example Imagine translating the following C program to EVM opcodes: ``` int x = 1; while (x != 0) { x *= 2 }; ``` Since EVM is stack-based, let's put `x` on the stack. ```haskell λ> import EVM.Opcode λ> import EVM.Opcode.Labelled as L λ> import EVM.Opcode.Positional as P λ> let opcodes = [PUSH 1,JUMPDEST "loop",DUP1,ISZERO,JUMPI "end",PUSH 2,MUL,JUMP "loop",JUMPDEST "end"] λ> L.translate opcodes Right [PUSH 1,JUMPDEST 2,DUP1,ISZERO,JUMPI 14,PUSH 2,MUL,JUMP 2,JUMPDEST 14] λ> P.translate <$> L.translate opcodes Right [PUSH 1,JUMPDEST,DUP1,ISZERO,PUSH 14,JUMPI,PUSH 2,MUL,PUSH 2,JUMP,JUMPDEST] λ> fmap opcodeText . P.translate <$> L.translate opcodes Right ["push1 1","jumpdest","dup1","iszero","push1 14","jumpi","push1 2","mul","push1 2","jump","jumpdest"] ``` ## Accounts for size of `PUSH`es when doing absolute jumps EVM's `jump` and `jumpi` instructions are parameterless. Instead they pop and jump to the address on the top of the stack. In order to perform absolute jumps in the code, it is necessary to `PUSH` an address on the stack first. This is inconvenient, and so `PositionalOpcode` and `LabelledOpcode` are easier to use. But what's more inconvenient is what happens to the offset of an absolute jump when the address being jumped to crosses a boundary where its byte index can no longer be represented by the same amount of bytes. Take for example this EVM code: ``` 0x00: push1 255 0x02: jump 0x03: stop 0x04: stop 0x05: stop ... 0xfe: stop 0xff: jumpdest ``` which can be represented with the following `LabelledOpcode`: ```haskell λ> import EVM.Opcode λ> import EVM.Opcode.Labelled as L λ> import EVM.Opcode.Positional as P λ> let opcodes = [JUMP "skip"] <> replicate 252 STOP <> [JUMPDEST "skip"] λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes) Right ["push1 255","jump","stop","stop","stop",...,"jumpdest"] ``` Note especially the byte size of a `PUSH 255` vs. a `PUSH 256`: ```haskell λ> opcodeSize (PUSH 255) 2 λ> opcodeSize (PUSH 256) 3 ``` Then add another one-byte opcode between the `jump` and the `jumpdest`: ```haskell λ> let opcodes = [JUMP "skip"] <> replicate 253 STOP <> [JUMPDEST "skip"] λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes) Right ["push2 257","jump","stop","stop","stop",...,"jumpdest"] ``` Even though one byte was added, because the address of `jumpdest` is now greater than 255, all references to it now take more than 2 bytes. Concretely, one reference went from 2 bytes to 3 bytes, or rather, one `JUMP "skip"` became a `push2 257` instead of a `push1 255`. And if there were many such `jump`s, this amounts to a bit of book-keeping. This happens at subsequent boundaries as well. While this library handles each boundary the same way, it is unlikely to have EVM bytecode of more than a few kilobytes at present time. ```haskell λ> let opcodes = [JUMP "skip"] <> replicate 65532 STOP <> [JUMPDEST "skip"] λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes) Right ["push3 65537","jump","stop","stop","stop",...,"jumpdest"] ```