Concurrency and Threads

Understanding Threads in Operating Systems
Operating systems provide abstractions to make hardware easier to use and share:
CPU abstraction: A single physical CPU is turned into multiple virtual CPUs, giving the illusion that many programs run at once.
Memory abstraction: Each process is given its own virtual address space, so programs behave as if they have private memory, even though the OS is multiplexing physical memory (and sometimes disk).
Now, let’s look at a new abstraction within a single process: the thread.
What is a Thread?
A traditional process has a single execution point (a Program Counter, or PC).
A multi-threaded process can have multiple execution points, each running independently.
Threads are almost like separate processes, with one critical difference: they share the same address space.
This is why threads are often called “lightweight processes.”
Thread State
Each thread maintains its own state:
Program Counter (PC) – tracks the next instruction
Registers – used for computation
Stack – stores local variables, function arguments, return values, etc.
When switching between threads (a context switch), the CPU must save the register state of one thread and restore that of another.
Unlike switching between processes, the page table does not need to be changed, since threads share the same address space.
Threads and the Stack
Each stack contains thread-local storage.
This breaks the “clean” model where the heap grows upward and the stack grows downward without interference.
Usually fine (stacks don’t need much space), but programs with deep recursion may run into stack issues.
Why Use Threads?
Threads are mainly used for two reasons:
Parallelism
On multi-core systems, threads let you split work across CPUs.
Example: dividing large array operations among threads → faster execution.
Avoiding I/O Blocking
While one thread waits for I/O (disk, network, page fault), others can keep working.
This overlaps computation and I/O, improving responsiveness.
Threads vs Processes
Threads share the same address space → easier data sharing.
Processes are better when tasks are independent and need isolation.
👉 In short: use threads for speed (parallelism) and responsiveness (I/O overlap).
Threads Example
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h> // sleep()
void* compute_task(void* arg) {
for (int i = 1; i <= 5; i++) {
printf("[Compute] step %d\n", i);
sleep(1); // simulate a long computation
}
return NULL;
}
void* io_task(void* arg) {
FILE* fp = fopen("log.txt", "w");
if (!fp) {
perror("fopen");
return NULL;
}
for (int i = 1; i <= 5; i++) {
fprintf(fp, "[I/O] writing line %d\n", i);
fflush(fp); // flush immediately to disk
printf("[I/O] wrote line %d\n", i);
sleep(2); // simulate slow I/O
}
fclose(fp);
return NULL;
}
int main() {
pthread_t t1, t2;
printf("Main: starting threads...\n");
// create threads
pthread_create(&t1, NULL, compute_task, NULL);
pthread_create(&t2, NULL, io_task, NULL);
// wait for both threads to finish
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Main: all threads finished.\n");
return 0;
}
result
Main: starting threads...
[Compute] step 1
[I/O] wrote line 1
[Compute] step 2
[Compute] step 3
[I/O] wrote line 2
[Compute] step 4
[Compute] step 5
[I/O] wrote line 3
[I/O] wrote line 4
[I/O] wrote line 5
Main: all threads finished.
🔎 Result Analysis
Order is unpredictable
Sometimes
Compute
first, sometimesI/O
first.Thread scheduling is non-deterministic.
CPU vs I/O overlap
Compute
: fast, CPU-boundI/O
: slower, waits for diskShows how threads let CPU work while I/O waits.
Key takeaway
Threads improve efficiency by overlapping tasks.
But since outputs mix, shared data must be carefully synchronized
Why It Gets Worse: Shared Data
You might think: “Computers are deterministic, right? Same input, same output!”
But in reality, things aren’t so simple.
Here’s a classic example:
static volatile int counter = 0;
void *mythread(void *arg) {
printf("%s: begin\n", (char *) arg);
for (int i = 0; i < 1e7; i++) {
counter = counter + 1;
}
printf("%s: done\n", (char *) arg);
return NULL;
}
int main() {
pthread_t p1, p2;
printf("main: begin (counter = %d)\n", counter);
Pthread_create(&p1, NULL, mythread, "A");
Pthread_create(&p2, NULL, mythread, "B");
Pthread_join(p1, NULL);
Pthread_join(p2, NULL);
printf("main: done with both (counter = %d)\n", counter);
}
The intention is simple:
Thread A and Thread B each add 1 ten million times.
The expected final result is: 20,000,000.
But when you run the program, the result varies:
main: done with both (counter = 20000000) ✅ (lucky case)
main: done with both (counter = 19345221) ❌ (wrong)
main: done with both (counter = 19221041) ❌ (wrong)
It looks like a single operation in C, but the CPU breaks it down into multiple steps:
Load the value of
counter
from memoryAdd 1
Store the new value back into memory
When multiple threads interleave these steps, things go wrong:
Thread A: reads counter = 100
Thread B: reads counter = 100
Thread A: adds 1 → writes 101
Thread B: adds 1 → writes 101 (overwrites A’s update)
So instead of incrementing twice, the counter only increased once.
This phenomenon is called a Race Condition.
Uncontrolled Scheduling
Increment Isn’t Atomic
Consider a simple counter increment:
counter = counter + 1;
t the C level it looks like one operation, but at the CPU level it breaks into three steps:
Load
counter
from memory into a registerAdd
1
to the registerStore the register value back into memory
In x86 assembly, it might look like this:
mov 0x8049a1c, %eax ; load counter
add $0x1, %eax ; add 1
mov %eax, 0x8049a1c ; store result
How Scheduling Breaks It
Now imagine two threads running this code at the same time:
Thread 1 loads
counter = 50
into its register.A timer interrupt occurs — the OS saves Thread 1’s state.
Thread 2 runs, also loads
counter = 50
, increments it, and stores51
back to memory.The OS switches back to Thread 1, which still has
eax = 51
, and then stores51
again.
➡ The result is counter = 51
instead of 52
. One increment is lost.
Race Condition
This phenomenon is called a race condition (data race):
The final outcome depends on the exact timing of thread execution.
Sometimes you get the correct result, sometimes not.
Each run may give a different result → the program becomes indeterminate, not deterministic.
Critical Sections
The code sequence that updates counter
is a critical section:
A region of code that accesses shared resources
Must not be executed by more than one thread at a time
The property we need is mutual exclusion:
Only one thread can be in the critical section at once
Prevents lost updates and nondeterminism
Because the OS scheduler can interrupt threads at any time, we cannot rely on “simple” code being safe in multithreaded environments. Even a one-line increment can break.
This is the heart of concurrency problems: uncontrolled scheduling creates race conditions.
Subscribe to my newsletter
Read articles from 박서경 directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
