Understanding JIT

Tay KimTay Kim
5 min read

Over the past few days, I’ve been diving into how PyTorch 2's TorchDynamo compiles and optimizes ML models. TorchDynamo is a “Python-level JIT compiler” that “dynamically modifies Python bytecode.”

Since I’ve never taken a course on compilers, the concept of JIT compilation—especially in Python—was initially hard to wrap my head around. While I still don’t fully grasp all the details of TorchDynamo, I’ve gained a better understanding of how Python and Java execute code under the hood. In this post, I’ll share some of what I’ve learned.


JIT in Java

I had come across the term Just-In-Time (JIT) compilation many times in the context of Java, but I never really understood what it meant.

The easiest way to make sense of JIT is by comparing it to its counterpart: Ahead-Of-Time (AOT). While AOT compiles all the code upfront before execution, JIT takes a more dynamic approach. It waits to see which parts of the code are actually executed frequently and compiles just those parts into machine code.

Let’s examine this idea by exploring how Java code gets executed under the hood.

Java source code is first compiled into bytecode, which is saved in .class files. Bytecode is an intermediate, low-level representation of the original source code—it's more compact and easier to interpret than high-level code, but it’s not machine code that runs directly on the CPU.

This bytecode is then executed by the Java Virtual Machine (JVM). As the program runs, the JVM monitors which parts of the code are executed frequently. These frequently-used sections—such as loops, methods, or other performance-critical paths—are marked as hot.

The JIT (Just-In-Time) compiler then compiles these hot paths into machine code at runtime, i.e., while the program is still running. From that point on, the JVM executes the compiled machine code directly instead of interpreting the bytecode, resulting in much faster performance.

On the other hand, code that is used less frequently remains as bytecode, which the JVM continues to interpret line by line.

I used to think of Java as a purely "compiled" language, but that turns out to be a misconception. Java is best described as a hybrid: it’s first compiled to bytecode, then interpreted or JIT-compiled at runtime by the JVM.


JIT in PyTorch

Now that we've covered the basics of JIT, let’s explore how these ideas apply to compiling and optimizing ML models. But first, it’s important to understand the trade-offs that early ML frameworks had to navigate.

Performance vs. Flexibility

Early ML frameworks were either static or dynamic.

Static frameworks like the original TensorFlow required users to define models as static computation graphs. This meant constructing the entire graph ahead of time — before any actual computation happened. The benefit of this approach was that it allowed for aggressive optimization: operations could be fused, reordered, or parallelized for better performance.

But this came at the cost of flexibility. Python is a dynamic language — it runs code line by line, allowing developers to use native control flow (like if statements or loops), insert print() statements for debugging, and interact with external libraries naturally.

Static graphs didn’t align well with Python’s dynamic nature. For example, you couldn’t use regular Python print() or if logic — instead, you had to use special graph-compatible versions like tf.print() and tf.cond(), which made the code harder to read, debug, and experiment with. For many Python developers, this felt unintuitive and restrictive.

Dynamic frameworks like PyTorch quickly gained popularity because they worked like regular Python — executing code line by line with eager execution. There was no need to build a computation graph, which made development feel natural, flexible, and easy to debug. However, this dynamic nature made it harder for the system to optimize performance, since it could only see and execute one operation at a time.

So the creators of PyTorch 2 aimed to combine the best of both worlds: the flexibility of dynamic frameworks, along with the performance benefits of graph-based optimization. This is where TorchDynamo and JIT compilation come in — enabling PyTorch to trace and optimize models without giving up the dynamic Python experience.

JIT in PyTorch 2

PyTorch’s JIT works a bit differently from Java’s. In Java, the JIT compiler identifies frequently executed hot paths and compiles them into machine code. In contrast, PyTorch’s JIT focuses specifically on compiling tensor operations, while leaving regular Python code untouched. Let’s take a closer look.

Consider the following function:

def f(x):
    y = x + 1
    z = torch.relu(y)
    return z

When you run this in Python, the source code is first parsed into an Abstract Syntax Tree (AST), then compiled into Python bytecode. That bytecode might look roughly like this:

LOAD_FAST x
LOAD_CONST 1
BINARY_ADD
LOAD_GLOBAL torch.relu
CALL_FUNCTION 1
RETURN_VALUE

Normally, the Python interpreter would execute this bytecode line by line. But in PyTorch 2, there’s an additional layer: TorchDynamo intercepts this process at runtime to optimize tensor-heavy code.

TorchDynamo analyzes the bytecode and identifies the tensor operations — in this case, the addition and the ReLU. It then extracts these operations and constructs an FX graph, a static representation of the computation. This graph is compiled into highly optimized low-level code (C++ or Triton).

Rather than modifying the original bytecode, TorchDynamo wraps the original function with a new one that calls the compiled version. The resulting execution looks conceptually like this:

LOAD_GLOBAL compiled_graph
LOAD_FAST x
CALL_FUNCTION 1
RETURN_VALUE

So when the interpreter executes this bytecode, it directly calls the optimized graph. Once the function has been compiled, subsequent calls skip the original Python logic entirely and run the compiled version instead.


So far, I’ve explored how PyTorch 2 leverages JIT compilation to optimize tensor operations into fast, low-level code. The PyTorch 2 paper goes much deeper into the technical internals, like guard conditions and graph break handling, which I’m still working to fully understand. As I continue learning about Big Data and ML systems, I’ll be back with more posts on what I discover. Thanks for reading! 🚀

0
Subscribe to my newsletter

Read articles from Tay Kim directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tay Kim
Tay Kim