Understanding the CPython Compiler


CPython is the reference implementation of Python written in C. When you run a .py
file, it goes through several internal steps before your code is actually executed.
Overview of the Compilation Process
Here’s a simplified overview of what happens when you run a Python script:
Lexical Analysis (Tokenizer)
Parsing (AST generation)
Abstract Syntax Tree to Bytecode Compilation
Execution by Python Virtual Machine
Lexical Analysis (Tokenizer)
The tokenizer splits the raw source code into tokens. This is like identifying words and punctuation in a sentence.
source_code = "x = 42"
CPython uses a tokenizer from the tokenize
module. You can inspect how Python breaks it down:
import tokenize
from io import BytesIO
code = b"x = 42"
tokens = list(tokenize.tokenize(BytesIO(code).readline))
for token in tokens:
print(token)
Output:
TokenInfo(type=1 (NAME), string='x', start=(1, 0), end=(1, 1), line='x = 42')
TokenInfo(type=54 (OP), string='=', ...)
TokenInfo(type=2 (NUMBER), string='42', ...)
Parsing (AST Generation)
Next, Python turns those tokens into an Abstract Syntax Tree (AST). This is a tree structure representing the grammar of your code.
import ast
tree = ast.parse("x = 42")
print(ast.dump(tree, indent=4))
Output:
Module(
body=[
Assign(
targets=[Name(id='x', ctx=Store())],
value=Constant(value=42)
)
]
)
This tree shows an assignment of the constant 42
to variable x
.
Here’s a visual of another tree for a simple function of x = y + 3
:
AST to Bytecode Compilation
Now, the AST is compiled into bytecode, the low-level instructions that Python's virtual machine can understand.
You can do this using compile()
:
code = compile("x = 42", "<string>", "exec")
print(code.co_code) # raw bytecode
To disassemble it into human-readable instructions:
import dis
dis.dis(code)
Output:
1 0 LOAD_CONST 0 (42)
2 STORE_NAME 0 (x)
4 LOAD_CONST 1 (None)
6 RETURN_VALUE
Each instruction here corresponds to an operation in the Python Virtual Machine.
Of course, do try these commands in your terminal. I’ve made a small web tool for you to checkout AST and bytecode for any python program. https://anistark.github.io/python-bytecode-inspector/
Python Virtual Machine
Finally, the bytecode is interpreted by the PVM, a stack-based virtual machine that executes instructions like LOAD_CONST
, STORE_NAME
, etc.
You can think of the interpreter as a loop that fetches, decodes, and executes each instruction in the bytecode. A simplified C-style pseudo code for it might look like:
while (1) {
opcode = *ip++; // instruction pointer
switch(opcode) {
case LOAD_CONST:
push(consts[arg]);
break;
case STORE_NAME:
names[arg] = pop();
break;
...
}
}
Compiling and Running Custom Code
Here’s how you can compile and execute custom Python code dynamically:
code = "for i in range(3): print(i)"
compiled = compile(code, "<string>", "exec")
exec(compiled)
Output:
0
1
2
You can inspect and understand how Python internally handles this by analyzing the AST and bytecode.
The CPython compiler is elegant and modular, allowing dynamic features like exec()
and eval()
because Python code is always just one compile()
away from being bytecode. Understanding this pipeline gives you deeper insight into debugging, performance optimization, and even writing your own language features.
Subscribe to my newsletter
Read articles from Ani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
