Disabling GIL - Unleashing the True Power of Python

HemachandraHemachandra
10 min read

Introduction

Python is finally getting rid of its major limitation, the Global Interpreter Lock (GIL), which it has had since 1991. Before explaining what the GIL is, let's understand how computers and programming languages were designed in the 1990s.

In the 1990s, computers were not focused on speed or running multiple applications at the same time. They were designed to complete tasks efficiently. There were no computers with multiple CPU cores or processors, so programs from that time prioritized memory efficiency over speed. High speed needs both software and hardware to work together. Without the right hardware, making super-fast software would have been pointless. Given the hardware limits back then, Python focused on memory efficiency, robustness, and simplicity rather than multi-threading, which allows multiple tasks to run at the same time. Since Python is a fully dynamic or interpreted language, its developers added the Global Interpreter Lock (GIL) to manage and process object references efficiently within a single thread.

Global Interpreter Lock (GIL)

When it comes to multi-threading, Python behaves a bit weird. Let's understand this with a small example.

In my childhood, my younger brother and I used to play computer games. My mother made a rule that my brother should play first for 30 min and then I could play second for 30 min. This rule was to make sure that no one messed up the game by trying to do things at the same time. So I would wait for 30 min until my turn came to play the game for 30 min.

In the world of computers, programs are like me and my brother trying to play the game. The Global Interpreter Lock (GIL) is like the rule that only allows one, either me or my brother (or thread), to play the game (or execute Python bytecode) at a time.

GIL is a rule that allows only one thread at a time to run the Python bytecode. Global Interpreter Lock is a lock that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously in a single process. This means that even in a multi-threaded Python program, only one thread can execute Python bytecode at any given time.

As a result, this causes performance limitations of multi-threading in CPU-bound tasks. Note that CPU-bound tasks are the tasks that rely a lot on CPU rather than IO operations. Mathematical computations, compression and decompression of files, and compilation of programs by a program compiler are a few examples of CPU-bound tasks that use more CPU.

Why GIL & What is the Advantage of GIL?

If GIL doesn't allow multiple threads to run at the same time then why do we need GIL?

Memory Management In Python

  • CPython uses reference counting as a primary mechanism for memory management. Every Python object has a reference count, which tracks how many references point to the object. When the count drops to zero, the memory occupied by the object is freed.

  • Without the GIL, multiple threads could simultaneously modify the reference count of an object, leading to race conditions and potentially corrupting the reference count, resulting in memory leaks or crashes.

  • The GIL simplifies memory management by ensuring that only one thread can modify an object’s reference count at a time.

Simplicity and Safety:

  • The GIL makes the Python interpreter easier to implement because it removes the need for detailed locking around shared resources. This simplifies maintenance and development, especially with Python's dynamic typing and complex object model.

  • It ensures thread safety for Python’s built-in types (like lists, dictionaries, etc.) without needing locks for every operation, which could slow down performance significantly.

Let us understand with an example

import threading

counter = 0
iterations = 1000000

def increment_counter():
    global counter
    for _ in range(iterations):
        counter += 1

# Create two threads
thread1 = threading.Thread(target=increment_counter)
thread2 = threading.Thread(target=increment_counter)

# Start both threads
thread1.start()
thread2.start()

# Wait for both threads to finish
thread1.join()
thread2.join()

print(f"Final counter value: {counter}")

Expected vs. Actual Outcome

  • Expected Outcome (Without GIL): If Python didn't have the GIL, you might expect the final value of counter to be 2 * iterations (i.e., 2 million). However, without proper locking mechanisms, there's a risk that both threads might read and write counter simultaneously, leading to race conditions. The final count could be less than 2 million due to lost increments.

  • Actual Outcome (With GIL): Because of the GIL, Python prevents both threads from modifying counter at the exact same time. While the GIL prevents the race condition, it also means that the threads aren't truly running in parallel; they are interleaved. So, the final value will correctly be 2 million, but the program won't benefit from parallel execution.

Why not remove GIL completely?

Removing the Global Interpreter Lock (GIL) from Python has been a topic of discussion and research for many years. While it seems like an obvious solution to improve multi-threaded performance, the decision to keep or remove the GIL is not straightforward.

Limitations to remove GIL from Python

  1. Slower Single-Threaded Performance: The majority of Python programs are single-threaded. Removing the GIL would require adding fine-grained locking mechanisms around Python objects to ensure thread safety. These locks introduce overhead, which could slow down single-threaded programs significantly, especially in areas like reference counting, object allocation, and garbage collection.

  2. Fine-Grained Locking: To remove the GIL, the Python interpreter would need to use fine-grained locks around shared data structures (like lists, dictionaries, and other objects). Implementing and maintaining this locking mechanism across all Python data types and operations is highly complex and could lead to subtle bugs and race conditions that are difficult to detect and debug.

  3. Backward Compatibility: Removing the GIL could break existing extensions and C libraries that rely on the GIL for thread safety. The Python ecosystem includes a vast number of third-party libraries, many of which are written in C and assume the presence of the GIL. Removing the GIL would require significant changes to these libraries, leading to compatibility issues.

  4. Library Performance: Many C extensions (e.g., NumPy, pandas) rely on the GIL for simplicity and performance. These libraries could suffer performance degradation if they have to handle their own locking mechanisms or if Python's core operations become slower.

  5. Frameworks: Major frameworks like Flask, FastAPI, Django, and Tornado rely heavily on GIL and its single-threaded concurrency. Removing the GIL would significantly impact these frameworks. However, redesigning them to work without the GIL could improve speed and reduce latency.

Why is Python thinking of removing GIL now?

The answer is simple. The growing need for AI and machine learning computations, along with the availability of powerful hardware like CPUs and GPUs, led to the decision to remove the GIL. This change improves multi-threading, which greatly speeds up these computations and CPU operations.

Python 3.13 Release Candidate 1

Installing no-gil version of Python

Python 3.13 includes the initial experimental removal of the GIL and is now available for download and testing. Visit the official Python downloads page or here to get Python 3.13 RC. You might need to build the Python 3.13 interpreter from scratch, or you can find the GUI installer for Mac and Windows on the page if you scroll down.

If you use the GUI installer, you need to configure it to include the option "free-threaded binaries (experimental)" to use the GIL removal feature, as shown in the image below.

If you are building from scratch, make sure you include the --disable-gil option during installation for the free threading build.

After installing Python 3.13, check the installation folder (e.g., C:\Program Files\Python313). You will find an executable file named python3.13t.exe. Notice the letter t in the name of the executable.

The t stands for threaded binaries and lets you try out the "removal of GIL" feature. From the command line, you can run the following commands to use the feature.

Using -Xgil=0 disables the GIL, and -Xgil=1 enables it.

python3.13t -Xgil=0 sample.py

python3.13t -Xgil=1 sample.py

Metrics

Let's get started. I want to write a program and run it in single-threaded and multi-threaded versions. Then, I'll measure the time taken by the threads when the GIL is disabled and enabled.

The program below calculates the sum of squares of all numbers from 0 to n in both single-threaded and multi-threaded versions and measures the time taken by Python in each version. In the multi-threaded version, each thread handles a portion of the numbers and calculates the sum of squares for that part. For example, if n is 100 and the number of threads is 4, I divide the range (0, 100) into 4 equal parts. Each thread then works on one part: [(0, 25), (25, 50), (50, 75), (75, 100)].

Example: If I provided 5 as the value of n then it gives sum([0*0, 1*1, 2*2, 3*3, 4*4])

import time
import threading
from time import perf_counter

def sum_of_squares(n):
    return sum(i * i for i in range(n))

def sum_of_squares_partial(start, end, result, index):
    result[index] = sum(i * i for i in range(start, end))

def run_single_threaded(n):
    start_time = perf_counter()
    result = sum_of_squares(n)
    end_time = perf_counter()
    print(f"Single-threaded sum of squares: {result}")
    print(f"Time taken: {end_time - start_time:.4f} seconds")

def run_multi_threaded(n, num_threads):
    chunk_size = n // num_threads
    threads = []
    results = [0] * num_threads # generates a list of 0 of length num_threads. [0, 0, 0, 0]

    start_time = perf_counter()

    for i in range(num_threads):
        start = i * chunk_size
        end = start + chunk_size if i != num_threads - 1 else n
        thread = threading.Thread(target=sum_of_squares_partial, args=(start, end, results, i))
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

    total_result = sum(results)
    end_time = perf_counter()

    print(f"Multi-threaded sum of squares: {total_result}")
    print(f"Time taken: {end_time - start_time:.4f} seconds")

def main():
    n = 10**9  # Adjust the size as needed
    num_threads = 4  # Number of threads

    print("Running single-threaded version...")
    run_single_threaded(n)

    print("\nRunning multi-threaded version...")
    run_multi_threaded(n, num_threads)

if __name__ == "__main__":
    main()

In the above program, I set n to 10**9 and num_threads to 4. Since n is very large, the CPU needs time to compute the result. Let's run the program with the GIL enabled and then with the GIL disabled.

GIL is enabled (Python 3.13)

# command to enable GIL
python3.13t -Xgil=1 sample.py

If you look at the time taken, the multi-threaded version took 58 seconds, which is more than the single-threaded 50 seconds. This extra delay is because the GIL is enabled.

GIL is Disabled (Python 3.13)

# command to disable GIL
python3.13t -Xgil=0 sample.py

If you look at the time taken, the multi-threaded version takes about 21.8 seconds, which is almost 60% less than the single-threaded version when the GIL is disabled. Comparing the multi-threaded version with the GIL disabled to the one with the GIL enabled, the times are (21.8 vs 58.7) seconds.

However, the single-threaded version with the GIL disabled is much slower than the single-threaded version with the GIL enabled. (57.4 vs 50.3) seconds. This is something Python needs to improve. Remember, this is still in the experimental phase.

Note: The time taken on my computer differs from yours.

Python 3.12

I want to compare the Python 3.13 metrics with Python 3.12, where the GIL is always enabled and there is no option to disable it. Run the same program with Python 3.12 and get the statistics.

# GIL is enabled by default in Python 3.12 and we can't disable it
python3.12 sample.py

That was a bit different. The Python 3.13 version with the GIL enabled shows a slight improvement over Python 3.12 with the GIL enabled. This might be because of improvements in the threading library.

At a glance

GIL is EnabledGIL is DisabledGIL is Enabled (Python3.12)
Single-threaded - 50.3166 secSingle-threaded - 57.4112 secSingle-threaded - 54.3308 sec
Multi-threaded - 58.7743 secMulti-threaded - 21.840 secMulti-threaded - 60.1747 sec

Summary

Removing the GIL can greatly boost performance in multi-threading, the concept that has changed the way we compute.

Just imagine if Python ran without the GIL. It could greatly speed up Machine Learning, and AI models, improve performance in the Pandas library, and reduce latency in web frameworks like Django, Flask, and FastAPI, which are currently limited to a single thread.

However, removing the GIL is not easy and requires a major redesign of Python. But I believe it is worth it because of the benefits it offers in the AI field.

Until then, your friend here, Hemachandra, is signing off...

For more courses, visit my website here.

Have a nice day!

10
Subscribe to my newsletter

Read articles from Hemachandra directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Hemachandra
Hemachandra