EVM- grasping its architecture, bytecode and opcode

Hey, fren. Glad to see you back!

During the past week, I have been expanding my knowledge on the various parts of the puzzle that Ethereum is. You know - in order to be good at something, you really need to understand it fundamentally.

There are various explanations and videos on the topic of the Ethereum Virtual Machine. And I have gone through the majority of them. This article is the aggregated version of all these sources, containing only what you really need to know about the EVM, explained in words that even I can understand. Kek.

Soo, what is the EVM?

Well, let's think of Ethereum as one big "global" imaginary computer. We say "global" because this computer is "located" or, more correctly said "run" on all the nodes in the network. The concept of nodes requires an article on its own so for now just remember it as it is.

Your computer has fundamental hardware components, namely - CPU, GPU, RAM and so on. And if we stick to the example, we can say that the EVM is the CPU (central processing unit) of the Ethereum "computer". The CPU in your computer is responsible for executing tasks in the form of code, right? The same thing is done by the Ethereum Virtual Machine - it deploys, executes and updates all the "programs" on the Ethereum "computer". And those "programs" are also known as "smart contracts".

TLDR: Ethereum Virtual Machine is the CPU of the Ethereum protocol, which is responsible for running the "programs"(smart contracts), deploying them and updating their state.

So far - so good. We have a basic idea of what the EVM is. But how does it take instructions? What language does it understand? This is a three-layered process. We can define it as:

Solidity->bytecode->opcode

Solidity:

That one is pretty easy. This is the name of the most famous language for writing the Ethereum programs (smart contracts). Solidity is meant to be understandable by humans. Once you become a "gigabrain" of a blockchain developer, you will (most probably) write your smart contracts in Solidity. It looks like this:

pragma solidity ^0.8.0;
contract Ownable {
     address  owner;

     constructor () {
    owner  = msg.sender;
        }

     modifier onlyOwner {
         require (msg.sender ==owner);
         _;
     }

}

Nothing too scary.

After writing our Solidity code, and, prior to deploying it on the blockchain, we need to compile it. I assume that you may know what compiling is, but I promised that this article is very beginner-friendly, so, in other words, compiling is the act of "translating" a given programming language to a language, that computers can understand. And this, in EVM's case is called "bytecode". It looks like this:

Bytecode:


0x6080604052348015600f57600080fd5b50336000806101000a81548173ffffffffffffffffffffffffffffffffffffffff021916908373ffffffffffffffffffffffffffffffffffffffff160217905550603f80605d6000396000f3fe6080604052600080fdfea2646970667358221220790cd6cf75edcc8c8304966b1db9b9b8b80e4fc7cc0e02a4482aefe74964faaf64736f6c63430008000033

DON'T leave!!! Before you close the tab and return to watching BitBoyCrypto's latest Youtube video with a misleading thumbnail, just give me a chance to explain. It is not that hard, I promise you!

The mumbo-jumbo above is the bytecode a.k.a the Solidity code from the first snippet, but translated into a language that the EVM can understand.

...Ok I lied. Because actually, the EVM will not understand the code above if it just takes it like it is. It adds a secret decrypting method. It's called opcode. The way it works is that the each byte of the bytecode is interpreted as an opcode operation and then executed. The opcode is simply a list of instructions with which the bytecode is understood and executed by the EVM. We will not go into details about this today, but if you are interested, you may refer to this link, containing all the opcode operations:(https://ethereum.org/en/developers/docs/evm/opcodes/)

The final opcode version of our bytecode above looks like this:

PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH1 0xF JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP CALLER PUSH1 0x0 DUP1 PUSH2 0x100 EXP DUP2 SLOAD DUP2 PUSH20 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MUL NOT AND SWAP1 DUP4 PUSH20 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AND MUL OR SWAP1 SSTORE POP PUSH1 0x3F DUP1 PUSH1 0x5D PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN INVALID PUSH1 0x80 PUSH1 0x40 MSTORE PUSH1 0x0 DUP1 REVERT INVALID LOG2 PUSH5 0x6970667358 0x22 SLT KECCAK256 PUSH26 0xCD6CF75EDCC8C8304966B1DB9B9B8B80E4FC7CC0E02A4482AEF 0xE7 0x49 PUSH5 0xFAAF64736F PUSH13

Up until now we understood that the EVM interprets the smart contracts by taking their bytecode and "separating" it to small executable tasks called "opcode". The last section of this article will briefly cover the architecture of the Ethereum Virtual Machine - more precisely, how it stores and handles the data during a contract call. Here is a graphical representation of it:

Let's dissect it.

In order to grasp it more easily, imagine that you want to interact with a smart contract by sending a transaction to it:

a/ Virtual ROM

This is the place where the code of the smart contract (that we chose to call) which the EVM will handle is stored temporarily. The information here is for read-only purposes. It "tells" the EVM the code of the chosen contract.

b/ Machine state

This changes as the transaction that initiated the contract call occurs. After the transaction is completed, this machine state is wiped out and reset. Consists of:

program counter - a fundamental component of every virtual machine. It contains the address of the currently executed opcode. The smart contract will be executed by moving the program counter through the instruction set (the stack), thus reading opcodes and executing them.
Gas available - Initially, this is the amount of gas that we sent together with the transaction. During the function execution ( while the EVM goes through the opcodes, each of which costs a specific amount of gas ) the amount gradually depletes. If the gas is gone before the execution of the entire opcode, the transaction is reverted, as well as the state of the contract, back to what it was before the transaction was initiated.
Stack - Ethereum is a stack-based virtual machine. The stack's size is 1024 operations in height and 256 bits (32 bytes ) in width. Just remember that this is where the input and output data from executing the opcode is stored.

Memory - used to store data during execution and passes arguments to internal functions ( if there are any)

c/ Account storage

The storage of the contract with which we interact. It contains mainly the global (state)variables of the contract ( in programming, a global variable means that it can be accessed by every function in the program, in our case the smart contract).

And we are done! Congrats! Now you know the basics of the Ethereum virtual machine. Let's sum up:

What is the EVM? - the "CPU" of the Ethereum protocol. Responsible for deploying and running smart contracts, as well as updating their state.
How does the EVM understand what operation to perform - by going through 3 separate steps: Solidity->bytecode->opcode.
The major components of the EVM are:
- The virtual ROM and the account storage - both come from the contract with which we interact.
- The Machine state (which includes the Gas amount, program counter, memory and stack.

I hope you learned something new today or got enlightened on a topic that you've heard of, but never really understood.

WAGMI

Sources: In-depth Understaing of the EVM

Antonopoulos, A.M., Wood, G. and O'reilly Media (2019) Mastering Ethereum : building smart contracts and DApps. Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo: O’reilly, Copyright.

‌

Simplifying the EVM (Ethereum Virtual Machine)

Soo, what is the EVM?

a/ Virtual ROM

b/ Machine state

c/ Account storage

Subscribe to my newsletter

Petar Todorov

Petar Todorov