Inside the Python Interpreter: How Code Becomes Action
Ever wondered how Python transforms your code into pure awesomeness? Let’s uncover the mischief happening under the hood!
Python is an interpreted language, so naturally it uses something we call an interpreter. But what exactly does this mean?
When you write a Python script, you’re essentially giving a list of instructions to the computer. Unlike compiled languages, which translate the entire program into machine code before running it, Python takes a different approach. The interpreter reads your code line by line, translates it into an intermediate form called bytecode, and then executes it on the fly. (This approach is also used by languages like Java, C#, etc.)
The Life Cycle of a Python Program
Source Code: It all starts with us writing our Python code. This code is saved in a file with a
.py
extension.Compilation: When we run our Python script, the Python interpreter gets to work, compiling our code into bytecode. This step happens so seamlessly that we usually don't even notice it.
Bytecode: Bytecode is like a bridge between our human-readable code and the machine-readable instructions. These files have a
.pyc
extension and are stored in the__pycache__
directory.Python Virtual Machine (PVM): The PVM is where the real action happens. It reads and executes the bytecode instructions, making our code come to life.
Detailed Breakdown
Source Code
We begin by writing our Python code. Python's syntax is designed to be clean and easy to read, which is one of the reasons many of us love it. Here's a simple example to illustrate:
def foo():
print("Hello World!")
This code snippet contains a function which when executed tells Python to print "Hello, World!" to the console.
Compilation
Firstly the source code is Parsed with the help of a complex grammar which looks something like this:
This is just one section of the grammar. If you wanna have a look at the whole thing you can find it at https://docs.python.org/3.8/reference/grammar.html
The interpreter uses this grammar to parse our source code into tokens. Here's a detailed look into the process:
Lexical Analysis: First, the source code undergoes lexical analysis, where the code is broken down into tokens. Tokens are the basic building blocks of the code, such as keywords (like
if
,else
,for
), operators (like+
,-
,*
), identifiers (like variable names), and literals (like numbers and strings).Syntax Analysis: After tokenization, the parser takes these tokens and arranges them into a syntax tree, also known as an Abstract Syntax Tree (AST). This tree structure represents the hierarchical syntactic structure of the source code according to the grammar rules of the Python language. Each node in the tree corresponds to a construct occurring in the source code.
Concrete Syntax Tree (CST): Initially, the parser generates a CST, which is a detailed representation of the source code, including all syntactic details.
Abstract Syntax Tree (AST): The CST is then transformed into an AST, which is a simplified, abstract representation that omits some syntactic details while retaining the essential hierarchical structure of the code.
Semantic Analysis: The parser then checks the AST for semantic correctness. This involves ensuring that the code follows Python's rules for meaning and logic. For instance, it checks that variables are defined before they are used, that operations are performed on compatible data types, and that function calls have the correct number and type of arguments.
The result of the parsing process is a well-formed AST, which serves as an intermediate representation of the source code.
To Look at the AST representation of our Hello World Function we can use the ast
module:
import ast
import inspect
# Our Hello World Functiom
def foo():
print("Hello World!")
# Get the source code of the function
source_code = inspect.getsource(foo)
# Parse the source code into an AST
parsed_ast = ast.parse(source_code)
# Display the AST
print(ast.dump(parsed_ast, indent=4))
Output:
ByteCode
Bytecode is a more abstract representation of our source code, optimized for execution by the PVM. Bytecode files are saved with a .pyc
extension in the __pycache__
directory. For example, our simple foo function might be translated into bytecode instructions like these:
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello World!')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
To view these instructions we can use the inbuilt dis
module:
import dis
def foo():
print("Hello World!")
dis.dis(foo)
A code object is attached to every implementation of a function, we can view this object with the help of __code__
attribute:
Let's have a look at the various attributes the code object contains:
def foo():
print("Hello World!")
# Access the code object of the function
code_obj = foo.__code__
# Print various attributes of the code object
print(f"Function name: {code_obj.co_name}")
print(f"Argument count: {code_obj.co_argcount}")
print(f"Variable names: {code_obj.co_varnames}")
print(f"Bytecode: {code_obj.co_code}")
print(f"Constants: {code_obj.co_consts}")
print(f"Names: {code_obj.co_names}")
# Output:
# Function name: foo
# Argument count: 0
# Variable names: ()
# Bytecode: b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'
# Constants: (None, 'Hello World!')
# Names: ('print',)
co_name
: The name of the Function.co_argcount
: The number of arguments passed into the Function.co_varnames
: Tuple containing all the local variables referenced.co_code
: The actual string of bytecode. (The output looks like a bunch of escape sequences but that's only because python is representing the bytes that way when we print it)co_consts
: Tuple containing all the Constants that were referenced in the Function, Notice that we didn't reference None anywhere but it is still there. None is always present in the bytecode representation, even when not explicitly referenced, serving as the default return value if the function lacks an explicit return statement.co_names
: Not to be confused withco_name
, This is the tuple containing all the non-local variables referenced in the Function.
Python Virtual Machine (PVM)
The PVM is the powerhouse that executes the bytecode. It acts as an interpreter, translating each bytecode instruction into machine-level operations. The PVM uses a stack to manage operations. Here's a simplified look at what happens when the PVM executes our example:
LOAD_CONST: Pushes the constant value 'Hello, World!' onto the stack.
PRINT_ITEM: Pops the top value from the stack and prints it.
PRINT_NEWLINE: Prints a newline character.
This process makes Python platform-independent, as the same bytecode can run on any machine with a compatible PVM.
Additional Components
While the steps above cover the core execution process, there are a few more components and optimizations worth mentioning:
Garbage Collection: Python uses automatic memory management to handle memory allocation and deallocation. The garbage collector identifies and reclaims memory from objects that are no longer in use.
Interpreter Optimizations: The Python interpreter includes various optimizations, such as constant folding and peephole optimizations, to speed up execution.
Just-In-Time (JIT) Compilation: Some Python implementations, like PyPy, use JIT compilation to further optimize performance. JIT compilers translate bytecode into machine code at runtime, significantly speeding up execution.
Conclusion
Understanding how Python runs under the hood can give us valuable insights into its performance and behavior. From source code to bytecode and finally to execution by the PVM, Python's execution model is designed to be efficient and platform-independent. Whether you're a beginner or an experienced developer, knowing the inner workings of Python can help you write better code and optimize your programs more effectively.
I hope you enjoyed this journey through the inner workings of Python as much as I did. There’s always more to learn and discover, so keep exploring and happy coding! 😄
Subscribe to my newsletter
Read articles from Pranay Sinha directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by