Inside the Operating System: A Journey Through Compilation, Execution, and Concurrency

Nurul HasanNurul Hasan
16 min read

Written by: Nurul Hasan
For: Fellow C++ learners and curious minds
Special thanks to: ChatGPT.

Let’s understand the full behind-the-scenes system behavior — from writing a C++ program, compiling it, linking it, and finally executing it. This will include how the operating system handles processes, file descriptors (FDs), fork(), execve(), and how everything fits together during execution.


🚀 Starting Point: You Open a Terminal

You open a terminal (e.g., bash, zsh, gnome-terminal):

  • The OS creates a new process for the terminal application.

  • This process is assigned:

    • A PID (Process ID)

    • Virtual memory

    • File Descriptor table

    • Other metadata: parent PID, UID, GID, etc.

  • FDs 0, 1, 2 are initialized:

    • FD 0 → stdin (keyboard)

    • FD 1 → stdout (terminal output)

    • FD 2 → stderr (terminal error output)

✅ At this point, you have an interactive shell running.


✍️ You Write Your Code

Let’s say you have the following C++ files:

main.cpp          → contains `main()`
logic.cpp         → contains `doSomething()`
utils.cpp         → contains `helperFunction()`

Each .cpp file is a translation unit — a partial program that needs to be compiled and later linked.


🏗️ Step 1: Compilation — Translating Code to Object Files

You run:

g++ -c main.cpp    → main.o
g++ -c logic.cpp   → logic.o
g++ -c utils.cpp   → utils.o

Each g++ -c command:

  • Forks a child process.

  • That child calls execve() to launch the compiler (g++ binary).

  • g++:

    • Opens the source file using open("file.cpp", O_RDONLY) → maybe FD 3

    • Creates the .o file using open("file.o", O_WRONLY | O_CREAT | O_TRUNC, 0644) → maybe FD 4

    • Compiles source to machine code, writes to .o file.

  • Then exits, returning control to the shell.

✅ Each compilation step is independent and sequential unless run using make -j (we’ll cover parallelism later).


🔗 Step 2: Linking — Creating the Final Executable

Now you run:

g++ main.o logic.o utils.o -o myApp

Here’s what happens:

  1. Shell forks a new process.

  2. Child execs g++, now running the compiler driver.

  3. Internally, g++ calls the linker (usually ld) as a subprocess.

  4. ld:

    • Opens all .o files

    • Resolves all symbols (e.g., where is doSomething() defined?)

    • Writes the final single binary executable file myApp

To create myApp, the linker does:

open("myApp", O_WRONLY | O_CREAT | O_TRUNC, 0755) → maybe FD 4
  • FD 4 now points to the output binary.

  • It writes the ELF headers, code, data segments into myApp.

✅ This does not run your app, just builds it.


🏃 Step 3: You Run the Program

You type:

./myApp

Here’s what happens under the hood:

  1. Shell calls fork() → creates a child process.

  2. The child process calls:

     execve("./myApp", ...)
    

This tells the OS: “Replace my current program (shell) with this new one (myApp).”

What the OS does next:

  • Opens the file myApp

  • Parses the ELF(Executable and Linkable Format.) binary format

  • Maps sections into memory:

    • .text → executable code

    • .data, .bss → global/static data

    • Stack and heap

  • Closes the binary file (after mapping it)

  • Initializes:

    • Registers (including program counter)

    • argc, argv, and envp

  • Starts executing main() in your code

✅ Your compiled C++ program is now running as a new process, with:

  • A new PID

  • Its own virtual memory

  • A new file descriptor table

  • FDs 0, 1, 2 still point to the terminal (same as parent)


💡 Summary So Far

✅ From the moment you opened the terminal to the point your C++ program runs:

  1. Terminal runs as a process with stdin/stdout/stderr.

  2. You compile your .cpp files into .o files → each g++ -c is a new process.

  3. You link .o files using g++ → internally calls ld linker to produce the final binary.

  4. Running ./myApp:

    • Shell forks → child

    • Child execs → replaces memory with new program

    • OS maps code/data/stack/heap

    • Process runs main() with inherited FDs (0, 1, 2)


🔑 What is an FD (File Descriptor)?

A File Descriptor (FD) is a simple integer number that the Linux kernel uses to represent an open file, socket, or I/O resource within a process.

Think of it like a small ticket stub that points to an open file or resource — the process uses this stub to read/write to that file, terminal, socket, etc.

Each process has its own FD table, and each entry in that table points to an open file object in the kernel.

📂 Understanding FD Behavior During Compilation vs Execution

Let’s now make this mental model fully accurate using your drawer analogy — which fits perfectly.


🧱 During Compilation (g++ main.cpp -o myApp)

When we pass main.cpp to the compiler:

We are not accessing the actual file content directly, rather we are using a token (file descriptor) — like putting the file into a drawer, and the drawer gives us a number (FD) that links to the file in the background.

So:

  • main.cpp is just plain text, not executable.

  • It is data for the compiler to read.

  • The OS does:

      open("main.cpp", O_RDONLY) → FD 3
    
  • The compiler reads from FD 3.

Since it’s non-executable data, the OS treats it just like a text file:

  • It allocates an FD.

  • That FD links to the file on disk.

  • The compiler reads through that FD — like reading a piece of paper through a slot in a locked drawer.

Because it's not executable, the OS doesn't do any memory mapping or ELF parsing. It’s just:
"Here's the drawer handle (FD), read your text."


⚙️ During Execution of an Executable (e.g., ./myApp)

Now the behavior completely changes, because we're executing, not just reading data.

Instead, the OS:

  • Opens myApp → gets FD 3 (let’s say)

  • Parses the ELF header to understand how to map it into memory

  • Uses mmap() or equivalent to map the binary sections of the file into virtual memory:

    • .text → executable code

    • .data, .bss → global/static variables

  • Stack, heap are initialized

  • Closes the FD!

✅ Once the code is in memory, the file is no longer needed.

So again, your statement holds:

The file is opened via FD, mapped to memory, then unlinked (closed).
Execution happens from memory, not through the FD.

The FD was only the handle to read and load the code, not to run it directly.


🎬 Now: What About Video Files and VLC?

Let’s apply the same principle to video files like .mp4.

When you double-click a file like movie.mp4, the following happens:

  1. The OS resolves the default program for .mp4 → maybe VLC.

  2. It runs:

     execve("/usr/bin/vlc", ["vlc", "movie.mp4"], ...)
    
  3. That creates a new process for VLC.

  4. VLC itself does:

     int fd = open("movie.mp4", O_RDONLY); // gets FD 3 maybe
    

FD 3 is now pointing to the raw binary data of the video file.

VLC:

  • Reads that data using the FD

  • Decodes the video and audio in memory

  • Sends output to:

    • The screen (via GPU)

    • The speakers (via ALSA or PulseAudio)

The video file is never directly executed — it is read via FD as data, just like the .cpp source file during compilation.


🧠 Final Perspective: FD as a Drawer Handle

File TypeTreated AsOpened via FD?Executed via FD?FD Closed After Use?
.cppPlain data✅ Yes❌ No✅ After read
ExecutableBinary code✅ Yes❌ Not directly✅ After mmap()
.mp4Media data✅ Yes❌ No✅ After playback

✅ In all these cases, the FD is just the drawer handle — a temporary link to a file, used for reading or loading, but not kept open unnecessarily.


🧠 What Is a Process?

A process is an instance of a running program — it is how the operating system runs, isolates, and manages code execution.


🔧 Components of a Process

When a process is created, the OS assigns and manages several key components:

ComponentDescription
PID (Process ID)Unique identifier for the process.
Parent PID (PPID)The PID of the process that created (forked) it.
Virtual Memory SpaceIncludes sections for code (.text), data (.data, .bss), heap, stack.
File Descriptor TableA table of open file descriptors (FDs like stdin, stdout, stderr).
Execution ContextCPU state: program counter, registers, etc.
Environment VariablesUser and system-defined vars (e.g., $PATH, $HOME).
Process MetadataPriority, state, user ID, resource usage, etc.

🏁 How Is a Process Created?

Let’s use a real example involving your terminal and g++.


📟 Step 1: You Open a Terminal

  1. Your desktop environment (like GNOME, KDE) is already running.

    • It has a process running the GUI and background services.

    • You click the terminal icon (say, GNOME Terminal).

  2. GNOME creates the terminal window using:

     fork();      // Spawns a new child process
     execve("/usr/bin/gnome-terminal", ...)  // Loads terminal program into memory
    
    • This new process now becomes the terminal.

    • The parent process is the desktop environment (e.g., gnome-shell).

    • It inherits:

      • Environment variables

      • File descriptors (like for the display, sound)

      • Working directory

  3. The child’s memory is replaced with the terminal binary using execve().

🧠 execve() = "execute a program, replacing this process’s memory"

  1. The terminal process is now running with its own:

    • PID (e.g., 2205)

    • FD table

    • Allocated memory for the terminal program


🛠️ Step 2: You Run g++ main.cpp -o app

Now you're inside the terminal and you type:

g++ main.cpp -o app

Here’s what happens:

  1. The terminal is running a shell (bash, zsh, etc.).

  2. The shell receives the command and calls:

     fork();      // Creates a new child process
     execve("/usr/bin/g++", ["g++", "main.cpp", "-o", "app"], ...)
    
    • A new child process is created to run g++.

    • It gets its own:

      • PID (e.g., 2210)

      • FD table (inherits 0, 1, 2 — stdin, stdout, stderr)

      • Virtual memory space

      • Environment variables

  3. Once g++ finishes compiling:

    • It returns control to the shell (the parent process).

    • The child process calls exit() → it's removed from the process table.

✅ So now you're back at the shell prompt.


🚀 Step 3: You Run the Compiled Program ./app

Now you execute:

./app

Again:

  1. The shell (bash) calls:

     fork();    // Create child
     execve("./app", ["./app"], ...)   // Replace child memory with app binary
    
  2. The ./app process is now a new process:

    • Own PID (e.g., 2215)

    • Memory layout initialized:

      • .text, .data, .bss, heap, stack
    • FDs 0, 1, 2 connected to your terminal

    • ELF binary loaded via FD and mmap() into memory

    • FD used for the binary is closed after loading

  3. Once your program finishes:

    • It calls exit()

    • Process dies

    • Control returns to the shell


🧾 Recap of Flow:

StepActionResult
Open TerminalParent (GUI) forks & execsNew process: Terminal GUI
Type g++ ...Shell forks & execsNew process: g++ compiler runs
Type ./appShell forks & execsNew process: Your app runs

In all cases:

  • fork() creates a copy of the parent process

  • execve() replaces the child’s memory with the new program

  • New FDs, PID, memory are assigned

  • When child exits, control goes back to the parent (usually your shell)


till now what we have discussed is the sequential processing, let discuss parallel processing..

⚙️ What Is Parallel Processing?

Parallel processing means:

Running multiple processes at the same time, so work gets done concurrently — ideally using multiple CPU cores.

This is useful when:

  • Tasks are independent

  • You have multi-core hardware

  • You want to speed things up (compilation, server requests, file processing)


🧪 Example 1: Compiling Multiple .cpp Files in Parallel

Earlier, you compiled files like this:

g++ -c main.cpp
g++ -c logic.cpp
g++ -c utils.cpp

This ran one after another, wasting time if you had multiple cores.


✅ Using a Makefile with Parallelism

🎯 Makefile

all: main.o logic.o utils.o
    g++ main.o logic.o utils.o -o app

%.o: %.cpp
    g++ -c $< -o $@

🧨 Run It With Parallelism:

bashCopyEditmake -j3

This means:

  • Make can spawn up to 3 separate g++ processes at once

  • These processes run in parallel, each on its own core if available

🔄 How it works internally:

  1. Make forks a process for each file:

     cppCopyEditfork(); execve("g++", ["g++", "-c", "main.cpp", ...]);
     fork(); execve("g++", ["g++", "-c", "logic.cpp", ...]);
     fork(); execve("g++", ["g++", "-c", "utils.cpp", ...]);
    
  2. Each process has:

    • Its own memory

    • FD table (stdin/out/err, plus source/output files)

    • PID

  3. The OS schedules these across CPU cores.

Result: Faster builds using real parallelism.


🌐 Example 2: Web Server Handling Multiple Clients in Parallel

Let’s say you build a simple HTTP server.

🛑 Sequential Model (Not Parallel):

cppCopyEditwhile (true) {
    int client_fd = accept(server_fd, ...);
    handle_request(client_fd); // Blocks: only one client at a time
    close(client_fd);
}
  • Only one request at a time

  • Others must wait

  • Bad for performance


✅ Fork-Based Parallel Model

cppCopyEditwhile (true) {
    int client_fd = accept(server_fd, ...);
    if (fork() == 0) {
        // In child process
        handle_request(client_fd);
        close(client_fd);
        exit(0); // Exit child
    }
    close(client_fd); // Parent closes its copy
}
  • Each request handled by a new process

  • OS runs child processes in parallel

  • Each has its own memory, FD, stack

✅ This is how Apache, PostgreSQL, and others used to handle concurrency.


🔗 Example 3: Pipe with Two Processes Running in Parallel


🧵 What Is a Pipe in Unix?

A pipe is a one-way communication channel used to pass data from one process to another, without using intermediate files.

Think of it like this:

🧪 Imagine process A is pouring data into a hose, and process B is drinking from the other end.

📦 A Pipe Connects:

  • STDOUT (output) of one process

  • STDIN (input) of another process

This allows:

  • One process to write data into the pipe

  • Another process to read that data in real-time


🔧 How Is a Pipe Created?

The shell or program uses the pipe() system call:

int pipefd[2];
pipe(pipefd); // pipefd[0] = read end, pipefd[1] = write end

Then it forks two processes and assigns:

  • Process 1: writes to pipefd[1]

  • Process 2: reads from pipefd[0]

🧠 Pipes Work Like:

  • A shared buffer in the kernel

  • Data written to the pipe by one process gets buffered

  • Data is then read by the other process


🔗 Example: yes hello | head -n 5

Let’s use this example now to see a pipe in action:

yes hello | head -n 5

🛠 Step-by-Step Breakdown

✅ Step 1: The Shell Sets Up a Pipe

int pipefd[2];
pipe(pipefd); // pipefd[1] is write end, pipefd[0] is read end

✅ Step 2: The Shell Forks Two Child Processes

🔹 Child Process 1 — yes hello

  • Writes infinite "hello\n" to stdout

  • Shell redirects its stdoutpipefd[1]

So now, yes is writing into the pipe.

🔹 Child Process 2 — head -n 5

  • Reads from stdin

  • Shell redirects its stdinpipefd[0]

So now, head is reading from the pipe.


🧪 What Happens During Execution?

  1. yes hello starts spamming:

     hello\nhello\nhello\n...
    

    into the pipe.

  2. head -n 5 reads from the pipe, line by line.

    It reads:

     hello
     hello
     hello
     hello
     hello
    
  3. Once head reads 5 lines:

    • It closes its end of the pipe.

    • It exits.

  4. yes is still trying to write, but suddenly:

    • The read end of the pipe is gone.

    • The kernel sends a SIGPIPE signal to yes.

    • yes is terminated.


✅ What You See on Your Terminal

Only what head prints:

hello
hello
hello
hello
hello

Even though yes was generating infinite lines, only 5 made it through — because head controlled the read and exited after 5.



🧵 Process vs Thread

Both processes and threads represent independent flows of execution, but they differ in how they manage resources, isolation, and performance characteristics.


🔍 1. What Is a Process?

A process is an independent instance of a running program, managed by the OS.

🧱 Key Characteristics:

FeatureDescription
Has its own memory spaceCompletely isolated from other processes
Has its own PID, FD table, and stack
Created with fork() (heavyweight)
Expensive to create and switch between
Safer — crashing one process doesn’t affect others

🔍 2. What Is a Thread?

A thread is a lightweight unit of execution within a process. Multiple threads share the same memory.

🧱 Key Characteristics:

FeatureDescription
Shares memory with other threads in the same process
Each thread has its own stack, program counter, and registers
Created with pthread_create() in C/C++, or std::thread in C++
Fast to create and context switch
Riskier — a crash in one thread can corrupt shared memory

🧪 Real-World Example: Web Server

Let’s say you build a server to handle client requests.

🔁 Using Processes:

int client = accept(...);
if (fork() == 0) {
    handle_request(client);
    exit(0);
}
  • Each client request spawns a new process

  • These processes run in parallel

  • Completely isolated

  • Expensive in terms of system resources

🧵 Using Threads:

int client = accept(...);
std::thread t(handle_request, client);
t.detach();
  • Each client handled by a new thread

  • Shares memory with other threads

  • Fast and efficient

  • More complex to handle safely (due to shared data)


📊 Comparison Table

FeatureProcessThread
MemorySeparate address spaceShared address space
Creation costHigh (fork)Low (pthread_create / std::thread)
CommunicationVia IPC (pipes, sockets, shared memory)Direct via shared variables
Crash impactIsolated – doesn’t affect othersShared fate – may crash the whole process
Use case examplesChrome tabs, microservices, CLI commandsServer threads, GUI responsiveness, AI tasks
SchedulingManaged by OS (context switch = expensive)Managed by OS or language runtime

🧠 Analogy

🍱 Processes = Bento Boxes

  • Each box is self-contained.

  • Opening one doesn’t mess with the others.

  • More overhead, but safer.

🍜 Threads = Compartments in a Bowl

  • All in the same bowl (shared memory).

  • Fast to create new compartments (threads).

  • But if one spills, the whole bowl is messy.


✅ Summary

When to Use ProcessesWhen to Use Threads
Need isolation and safety (e.g., running untrusted code)Need speed and shared memory
Crashes should be containedThreads must coordinate carefully (mutexes, locks)
Example: Browser tabs (separate PIDs)Example: Game engine threads (render, audio, input)

Thank you for reading through this article. I'm currently learning and exploring some of these concepts myself, and while going through them, I thought it might be helpful to write things down in a way that’s easy to revisit and understand. If it helped you too, I’m really glad.

That’s all — just wanted to share what I’m learning. Thanks again for taking the time to read ❤️.

0
Subscribe to my newsletter

Read articles from Nurul Hasan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nurul Hasan
Nurul Hasan