Data Race in Python despite GIL?
Data race
While the definite definition of data race differs with the concurrency model and the language, it's safe to assume that data race is when 2 or more threads try to access the same memory where at least one of them is a write operation.
Seeing the above requirements, it might even seem silly even to assume that GIL (Global Interpreter Lock) would let you have a condition like data race. But there are nuances to it and to understand that we'll need to understand what GIL is and how it works.
Note: Python's semantics aren't tight or specific enough to definitively call something a data race vs. a race condition, so I'll avoid going too deep into its classification.
Global Interpreter Lock (GIL)
As the name suggests, GIL locks the interpreter to running only 1 thread at a time. This means it limits one thread to execute bytecode at a time or it runs one bytecode instruction at a time.
The GIL also guarantees lock is released every sys.getswitchinterval()
second, at which point a thread switch can take place. This means that for Python code, a thread switch can still take place, but only between byte code instructions.
But what does "one bytecode instruction at a time" even mean?
Bytecode
Bytecode is the low-level representation of the Python code which is platform-independent. In simpler words, it is the compiled version of the code you wrote which is run by the Python interpreter. You may be confused and thinking "Isn't Python an interpreted language?", while yes you're right, that is outside the scope of this article so you can read more about it here.
Now to understand the statement "one bytecode instruction at a time", we'll need to look at the statement x=x+1.
While it seems like a simple statement where x
's value is incremented by one, but at the bytecode level a simple statement is turned into multiple instructions, such as;
Load
x
into the registerIncrement
1
to the value ofx
Write the value in the register back to
x
Note: We'll call this order of operations the "load-modify-store
statement' from now on.
A simple operation like incrementing a variable has to use multiple instructions. So, if threads are switched in the middle of executing an instruction, it might lead to what we know as a data race.
How does data race occur?
Going back to the load-store-modify
statement, we discussed that it might lead to a race condition. That is because, at any point in the load-store-modify
statement, another thread might take over and run its own operation before the 3-part operation is complete, which will affect the correctness of the output of the program. Hence, it may lead to a race condition.
As a simple example, assume that GIL doesn't exist in this version of Python's interpreter.
x = 10
def increment():
global x
x += 1
t1 = Thread(target=increment)
t2 = Thread(target=increment)
t1.start()
t2.start()
t1.join()
t2.join()
Now let's try to dry-run the following code; you need to keep in mind that, in data races, the code will not be deterministic hence we'll dry-run a possibility which might happen.
t1
reads the value ofx
and stores it in a temporary variablet1_x
,t1
increments the value oft1_x
by1
,t2
reads the value ofx
and stores it in a temporary variablet2_x
,t1
writes the value oft1_x
back tox
,t2
increments the value oft2_x
,t2
writes the value oft2_x
back tox
If you don't look at the dry run, you might assume the value of x
to be 12
, but due to a possible data race condition, the value of x
will be 11
. The value of t1_x
will be overwritten as t2_x
never knew about the t1_x
's value being written to x
and only cares about the initial value of x
.
You also have to remember that, the following condition isn't possible in current Python (with GIL) due to it having locks on thread switches which lasts sys.getswitchinterval() seconds
(5 milliseconds on average). This means that Python won't switch threads unless an operation takes more than 5 milliseconds and as x+=1
requires less than that to complete, so threads won't switch context and a data race won't happen.
Data race example
While the above example can't practically demonstrate data race in Python, the following example can practically demonstrate data race on Python versions below 3.10.
from threading import Thread
from time import sleep
counter = 0
def increase():
global counter
for _ in range(0, 100000):
counter = counter + 1
threads = []
for i in range(0, 400):
threads.append(Thread(target=increase))
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print(f'Final counter: {counter}')
# Final counter: 31735072
# Final counter: 32829326
# Final counter: 31496003
The above example works because the loop of 100,000 takes more than 5 milliseconds to complete, hence data race can be seen practically.
Relation between race condition and data race
A race condition is a flaw that occurs when a program's correctness depends on the order and/or timing of the events executed. A race condition is a higher-level error than a data race. Many race conditions can be caused by data races, but this is not necessary. In summary, a data race is a specific thing that a program does, and a race condition is a flaw in the program's behaviour.
It's very important to understand that race condition and data race aren't the same things. They are not a subset of one another. They are also neither necessary nor sufficient conditions for one another. But in this context load-modify-store
statement causes a data race which might lead to a race condition as well.
To clarify, the load-modify-store
statement causes a data race which leads to a race condition, and that is the relationship between data race and race condition.
Note: Because access to data structures is not atomic, race conditions can take place there as well.
Conclusion
The data race condition mentioned before is also known as load-modify-store
data race. A true data race condition (same memory location being accessed and modified by our processor at the same time) isn't technically possible, most of the data race conditions happen due to cache maintenance problems.
Data race can be classified as a cache maintenance problem as well. If you load a value from a memory location, modify it, and then at some point write it back to the original memory location then you realize the value at the shared location prior to write-back is no longer the value you originally copied then information is potentially lost and that's where things start to go bad.
Data race in Python is definitely possible and GIL doesn't prevent data race as GIL only makes sure that only one thread executes bytecode at a time, which doesn't guarantee higher-level operations will be atomic, hence data race and race conditions can happen in Python.
Subscribe to my newsletter
Read articles from Samrid Pandit directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Samrid Pandit
Samrid Pandit
Async >>>