The C Programmer's Guide to Multithreading: From Concurrency to Parallelism

For a long time, C programs have run like a solo actor on a stage, performing one instruction after another in a predictable sequence. This is simple and effective, but what happens when your program needs to do multiple things at once? Imagine a desktop application where a long calculation freezes the entire user interface, making it unresponsive. Or a web server that can only handle one client request at a time.
The solution is to move from a single-threaded model to a multi-threaded one. This guide will walk you through the world of threads, concurrency, and parallelism in C, from the foundational concepts to the practical tools you need to write powerful, modern C applications.
Part 1: The Core Concepts
Before we write a single line of threaded code, we must understand the "what" and the "why."
What is a Process?
When you run a program (e.g., ./my_app), the operating system creates a process. A process is an instance of a program in execution. It has its own:
Private Memory Space: Its own stack, heap, and global variables. One process cannot directly access the memory of another.
System Resources: File handles, network sockets, etc.
Processes are "heavyweight" and isolated from each other. Communicating between them (Inter-Process Communication or IPC) is possible but relatively slow and complex.
What is a Thread?
A thread is the smallest unit of execution within a process. Think of it as a "lightweight process." A single process can have multiple threads, all running seemingly at the same time.
All threads within the same process share:
The same memory space (heap and global variables).
The same system resources (file handles, etc.).
Each thread gets its own:
Stack: For its local variables and function call history.
Instruction Pointer: To keep track of where it is executing.
This shared memory is the single most important feature of threads. It's what makes them powerful and fast for collaboration, but it's also the source of nearly every bug in multithreaded programming.
Concurrency vs. Parallelism: The Great Distinction
These two terms are often used interchangeably, but they describe different concepts.
Concurrency is about dealing with multiple tasks at once. It’s a design concept. On a system with a single CPU core, you can still have concurrency. The OS rapidly switches between threads, running each one for a fraction of a second. The tasks appear to progress simultaneously, but they are actually taking turns.
Analogy: A chef making a soup. They chop vegetables, then stir the pot, then add spices. They are handling multiple tasks concurrently, but at any given microsecond, they are only doing one thing.
Parallelism is about doing multiple tasks at once. This is a hardware reality. It requires a system with multiple CPU cores. If you have two threads and two cores, each thread can run on its own core truly, physically, at the same time.
Analogy: Two chefs in a kitchen. One is chopping vegetables while the other is stirring the pot. Two tasks are being performed in parallel.
Key Takeaway: You can have concurrency without parallelism, but you cannot have parallelism without concurrency. All multithreaded programs are concurrent. Only those running on multi-core hardware can be parallel.
Part 2: Your Toolkit - The C11 Threads Library (<threads.h>)
For decades, C had no standard way to create threads. Programmers had to rely on platform-specific APIs like POSIX Threads (pthreads) on Linux/macOS or Windows Threads on Windows.
The C11 standard changed this by introducing a portable threading library, declared in <threads.h>.
The "Hello, World!" of Threads
Let's create a program where the main function starts another thread that runs concurrently.
The Tools:
thrd_t: A type that represents a thread identifier.
thrd_create(): Creates and starts a new thread.
thrd_join(): Waits for a thread to finish its execution.
#include <stdio.h>
#include <threads.h> // C11 threads library
// This is the function our new thread will run.
// It must have this specific signature: int function_name(void*).
int say_hello(void* arg) {
// The 'arg' is the data passed from the main thread.
const char* name = (const char*)arg;
printf("Hello from the new thread, %s!\n", name);
return 0; // The return value can be retrieved by thrd_join.
}
int main(void) {
thrd_t my_thread; // A variable to hold the new thread's ID.
const char* my_name = "World";
printf("Starting a new thread from main...\n");
// 1. CREATE the thread
// thrd_create(&thread_id, function_to_run, argument_to_pass);
if (thrd_create(&my_thread, say_hello, (void*)my_name) != thrd_success) {
fprintf(stderr, "Error creating thread.\n");
return 1;
}
// 2. WAIT for the thread to complete
// If we don't 'join', main might finish and exit the whole program
// before the new thread gets a chance to run!
int result;
thrd_join(my_thread, &result);
printf("Thread finished with result: %d\n", result);
printf("All done from main!\n");
return 0;
}
How to Compile:
On GCC or Clang, you often need to link against the pthreads library, as the C11 standard library implementation frequently uses it as a backend.
gcc my_program.c -o my_program -lpthread
Part 3: The Danger Zone - Race Conditions
What happens when two threads try to modify the same shared variable? Chaos. This is called a race condition.
Consider this "simple" counter:
#include <stdio.h>
#include <threads.h>
#define NUM_THREADS 10
#define NUM_INCREMENTS 100000
long long counter = 0; // Our shared resource!
int increment_counter(void* arg) {
for (int i = 0; i < NUM_INCREMENTS; i++) {
counter++; // DANGER!
}
return 0;
}
int main(void) {
thrd_t threads[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) {
thrd_create(&threads[i], increment_counter, NULL);
}
for (int i = 0; i < NUM_THREADS; i++) {
thrd_join(threads[i], NULL);
}
// Expected result: 10 * 100000 = 1,000,000
printf("Final counter value: %lld\n", counter);
return 0;
}
If you run this, you will almost certainly not get 1,000,000. You'll get some random, smaller number.
Why? The operation counter++ is not a single instruction. It's three:
Read the value of counter from memory into a CPU register.
Increment the value in the register.
Write the new value from the register back to memory.
Imagine two threads, A and B, when counter is 50:
Thread A reads 50.
Context Switch! The OS pauses Thread A and runs Thread B.
Thread B reads 50.
Thread B increments its value to 51 and writes 51 back to memory.
Context Switch! The OS pauses Thread B and resumes Thread A.
Thread A, which still has the old value 50, increments it to 51 and writes 51 back to memory.
We performed two increments, but the counter only went up by one. This is a "lost update," and it's a classic race condition. The part of the code that accesses the shared resource (counter++) is called the critical section.
Part 4: Synchronization - Taming the Chaos
To fix race conditions, we must ensure that only one thread can be inside a critical section at any given time. This is called mutual exclusion. The C11 library provides tools for this.
Mutexes: The Lock and Key
A mutex (short for MUTual EXclusion) is like a lock on a door. Before a thread enters a critical section, it must acquire the lock. When it's done, it releases the lock, allowing another thread to enter.
The Tools:
mtx_t: The mutex type.
mtx_init(): Initializes a mutex.
mtx_lock(): Acquires the lock. If another thread holds it, this call will block (wait) until it's released.
mtx_unlock(): Releases the lock.
mtx_destroy(): Cleans up the mutex resources.
Let's fix our counter program:
// ... includes and defines from before ...
long long counter = 0;
mtx_t counter_mutex; // Declare the mutex
int increment_counter_safe(void* arg) {
for (int i = 0; i < NUM_INCREMENTS; i++) {
mtx_lock(&counter_mutex); // <<< Lock
counter++; // --- Critical Section
mtx_unlock(&counter_mutex); // <<< Unlock
}
return 0;
}
int main(void) {
// Initialize the mutex before creating threads
if (mtx_init(&counter_mutex, mtx_plain) != thrd_success) {
fprintf(stderr, "Mutex init failed.\n");
return 1;
}
thrd_t threads[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) {
thrd_create(&threads[i], increment_counter_safe, NULL);
}
for (int i = 0; i < NUM_THREADS; i++) {
thrd_join(threads[i], NULL);
}
// Destroy the mutex after all threads are done
mtx_destroy(&counter_mutex);
printf("Final (safe) counter value: %lld\n", counter); // Now it's 1,000,000!
return 0;
}
Now, the counter++ operation is atomic (indivisible) from the perspective of other threads. The program is thread-safe.
Beware of Deadlock!
If Thread A locks Mutex 1 and waits for Mutex 2, while Thread B locks Mutex 2 and waits for Mutex 1, both threads will be stuck forever. This is a deadlock. A common way to prevent this is to always lock mutexes in the same order.
Part 5: Advanced Synchronization
Condition Variables: Efficient Waiting
What if a thread needs to wait for a specific condition to become true? For example, a "consumer" thread waiting for a "producer" thread to add an item to a queue.
Spinning in a loop and constantly locking/unlocking a mutex to check the condition is horribly inefficient. This is where condition variables come in. They allow a thread to go to sleep until it is "signaled" by another thread that the condition it's waiting for might now be true.
Key Idea: A condition variable is always used with a mutex.
The Tools:
cnd_t: The condition variable type.
cnd_init() / cnd_destroy(): For setup and cleanup.
cnd_wait(): Atomically unlocks the associated mutex and puts the thread to sleep. When woken up, it re-acquires the lock before continuing.
cnd_signal(): Wakes up one waiting thread.
cnd_broadcast(): Wakes up all waiting threads.
A classic producer-consumer pattern might look like this (pseudo-code):
// Producer Thread
mtx_lock(&mutex);
add_item_to_queue();
cnd_signal(&cond_var); // Signal the consumer that there's data
mtx_unlock(&mutex);
// Consumer Thread
mtx_lock(&mutex);
while (is_queue_empty()) {
// Atomically unlocks mutex and waits. When it wakes up, it has the lock again.
cnd_wait(&cond_var, &mutex);
}
item = get_item_from_queue();
mtx_unlock(&mutex);
process(item);
Atomic Operations: Lock-Free Programming
For very simple operations like incrementing a counter, a full mutex can be overkill. The <stdatomic.h> header provides a way to perform certain operations in a guaranteed atomic, lock-free manner, using special hardware instructions.
#include <stdatomic.h>
atomic_llong counter = 0; // Use an atomic type
// In the thread function:
atomic_fetch_add(&counter, 1); // This is an atomic increment
Atomic operations are much faster than mutexes but are only suitable for simple read-modify-write operations on a single variable.
Final Words and Best Practices
Identify Shared Data: The first step is always to identify which data will be accessed by multiple threads.
Protect by Default: Every access to shared data (read or write) should be protected by a synchronization primitive, usually a mutex.
Keep Critical Sections Small: Lock the mutex, do the absolute minimum work required, and unlock it immediately. Don't make slow function calls (like I/O) inside a critical section.
Beware of Deadlock: Always acquire multiple locks in a consistent, global order.
Use Condition Variables for Waiting: Don't burn CPU cycles in a spin-lock.
Compile with Thread Sanitizers: Modern compilers have tools to detect race conditions automatically. For GCC/Clang, use the -fsanitize=thread flag during compilation. It will save you countless hours of debugging.
Multithreading is a deep and complex topic, but by understanding the fundamentals of processes, threads, concurrency, and race conditions, and by using the tools provided by the C11 standard library correctly, you can unlock a new level of performance and responsiveness in your C programs.
Subscribe to my newsletter
Read articles from Rafal Jackiewicz directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Rafal Jackiewicz
Rafal Jackiewicz
Rafal Jackiewicz is an author of books about programming in C and Java. You can find more information about him and his work on https://www.jackiewicz.org