Clang & LLVM Under the Hood: From C++ to Machine Code

ADITYA SINGHADITYA SINGH
4 min read

The Compiler Factory: A Simple Analogy

Imagine a car factory that builds vehicles from blueprints:

  1. Blueprint = Your C++ code (source.cpp)
  2. Universal Car Frame = LLVM IR (adapts to any model)
  3. Assembly Line Robots = LLVM optimization passes
  4. Final Car Models = Executables for Windows/Mac/Linux

Clang/LLVM works like this factory - it first creates a universal intermediate frame (LLVM IR) before specializing it for different targets.


The Hidden Translation Step

When you run clang++:

clang++ hello.cpp -o hello

It secretly goes through 5 stages:

graph LR
A[C++ Code] --> B[Preprocessor]
B --> C[LLVM IR]
C --> D[Optimizer]
D --> E[Assembly]
E --> F[Machine Code]

Why This Extra Step?

  • Universal translator for 20+ CPU architectures
  • Performance boost - optimizes before platform-specific decisions
  • Language flexibility - same engine for C++, Rust, Swift

Understanding LLVM IR: The Universal Blueprint

LLVM IR (Intermediate Representation) is:

  • A simplified computer language
  • Works on all processors (Intel, ARM, etc.)
  • Looks like a hybrid of English and code

Simple C++ → LLVM IR Example

C++ Code:

int addFive(int num) {
    return num + 5;
}

LLVM IR:

define i32 @addFive(i32 %num) {
  %result = add nsw i32 %num, 5
  ret i32 %result
}

Breaking Down the IR:

  • i32: 32-bit integer (like int in C++)
  • @addFive: Function name
  • %num: Input parameter
  • add nsw: "Add with no signed wrap" (safe math)
  • ret: Return instruction

The Complete Compilation Journey

Preprocessing: The Copy Machine

clang++ -E hello.cpp -o hello.ii
  • Expands #include files
  • Replaces macros like #define PI 3.14
  • Output: Single massive text file

Frontend: C++ → Universal IR

clang++ -S -emit-llvm hello.cpp -o hello.ll
  • Translates C++ to architecture-neutral IR
  • Like converting English to Esperanto

Optimization: The Tuning Workshop

opt -O3 hello.ll -o optimized.ll
  • Removes unused code
  • Simplifies calculations
  • Reorders instructions for speed
  • Does 200+ possible improvements

Backend: IR → Assembly

llc -march=x86_64 optimized.ll -o hello.s
  • Converts universal IR to CPU-specific instructions
  • Supports x86, ARM, RISC-V, etc.

Assembly: Human → Machine

clang++ hello.s -o hello
  • Converts text instructions to binary
  • Links libraries (printf, malloc, etc.)
  • Creates executable file

Why This Matters to You

For Beginners:

  • See inside the "magic box" of compilers
  • Understand errors better - some happen at IR stage
  • Cross-compile easily for Raspberry Pi/phones

For Professionals:

  • Inspect optimizations with -S -emit-llvm -O2
  • Add custom optimizations with LLVM passes
  • Use LTO (Link-Time Optimization) for 10-15% speed boost

Try It Yourself: Beginner's Lab

Experiment 1: See Different Stages

# Preprocessed output (messy!)
clang++ -E hello.cpp -o hello.ii

# Human-readable IR
clang++ -S -emit-llvm hello.cpp -o hello.ll

# Final assembly
clang++ -S hello.cpp -o hello.s

Experiment 2: Cross-compile for ARM

# Target Raspberry Pi
clang++ -target arm-linux-gnueabihf hello.cpp -o hello_arm

LLVM vs Traditional Compilers

FeatureOld Compilers (GCC)LLVM
Intermediate StepPlatform-specificUniversal IR
Add New CPURewrite entire backAdd one module
OptimizationsFixed orderPlug-and-play
Error MessagesCrypticDetailed with context

Real-World Applications

  1. iOS/macOS development (Clang is default compiler)
  2. Rust compiler uses LLVM for code generation
  3. Chrome browser uses LTO for faster performance
  4. Scientific computing (Julia language)

Learning Resources

LevelResource
BeginnersCompiler Explorer (godbolt.org) - See C++ → ASM in browser
IntermediateLLVM for Grad Students (free PDF)
AdvancedLLVM Essentials book
pie
    title "Why Developers Love LLVM"
    "Cross-platform" : 35
    "Better Optimizations" : 30
    "Clear Errors" : 20
    "Modern Infrastructure" : 15

“LLVM is the Linux of compilers - an open-source project that revolutionized how we build software.”
— Chris Lattner, Creator of LLVM and Swift

0
Subscribe to my newsletter

Read articles from ADITYA SINGH directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

ADITYA SINGH
ADITYA SINGH

I am an Information Science and Engineering student, passionate about Artificial Intelligence, cybersecurity, software development, and problem-solving. I have a strong foundation in data structures and algorithms and was a Top 10 finalist at the SANDBOX 2025 Cybersecurity Hackathon. I’m proficient in Python, C++, SQL, TensorFlow, and PyTorch, and actively contribute to the AI/ML community. On Kaggle, I’m a Notebooks Expert with a global rank of 2,976 out of 57,797 (highest rank: 2,975). Outside academics, I enjoy chess (1500 on Chess.com, 1800 on Lichess), football, badminton, and solving mathematical puzzles.