C++ Program Building

Introduction

You might have been writing programs in C++ for several months or years. But have you ever wondered how the program, which is just text following some rules, actually gets translated so that the computer understands what to do?

This blog provides an in-depth, hands-on experience of what happens in the background when a simple program is being built. Almost every C++ program starts with including header files like #include <iostream> statements. But what are header files and why do we need them?

Header files

Here is a simple program that assigns a value of 2 to a variable a and then increments its value using the increment operator (++) and prints the result.

#include <iostream>
using namespace std;

int main(){
    int a = 2;
    a++;
    std::cout<<a<<std::endl;
}

The iostream header file must be included to use the cout and cin keywords in our program. It declares *cout*, cin, and other functionalities, although here we're focusing on these two. The actual definitions of cout and cin—objects of ostream and istream respectively—are provided by the C++ Standard Library implementation, which forms part of the compiler's runtime library.

In C++, there are various other header files like fstream, cmath,threads, string, etc. These files generally contain class and function declarations, as well as macro definitions and typedefs. They may also include other header files. In fact, we can create our own header files too, which we've done in this case; more on that later.

Namespace

A very common statement you might encounter is using namespace std, which indicates that we are using the std namespace in the program. A namespace is a declarative region that provides scope to identifiers (names of types, functions, variables, etc.) within it. Namespaces are primarily used to prevent naming conflicts in programs. The scope resolution operator (::) is used to resolve such conflicts.

std is a built-in namespace provided by the standard C++ library. It includes input/output streams (cin and cout), declarations of data structures like vector, map, set, tuple, and all standard algorithms such as find, sort, etc.

Now that you understand its purpose, it's clear why using namespace std needs to be included in almost all programs. Running a program with this statement might produce, for example, the output "3" on the console.

Command to get executable file :

g++ source_file.cpp -o output_file

Command to run a program :

./output_file

But what exactly happened in the background ? Let's dive deep into...

Steps invloved before running a program

The CPU serves as the processing unit of a computer, operating solely in binary. Due to this limitation, it cannot directly interpret high-level programming languages such as C++. To overcome this challenge, a system is needed to translate high-level code into a format that the CPU can process. This is where compilers, assemblers, and linkers play crucial roles. Collectively, they convert C++ code into a binary executable file, allowing the CPU to execute the program effectively.

These are the 4 steps involved in building a C++ code to get an executable file.

The compiler internally invokes the preprocessor, compiler, assembler, and linker to complete the compilation process. Let's delve into each of these components by analyzing the output after each step.

01] Preprocessing

Before the code is provided to the compiler for conversion into assembly code, it needs to be preprocessed because there are some statements in the program that the compiler cannot interpret. The compiler requires the declaration of all identifiers present in the program. Thus, the preprocessor handles the preprocessing of all directives. Directives are keywords that start with a # (hashtag), such as #include, #ifdef, and #define.

Let's understand the preprocessing of some directives. For example, #include <iostream> is used where the preprocessor finds the iostream header file from the system and simply copies and pastes all its content.

Command to preprocess a source_file :

 g++ -E source_file.cpp -o preprocessed.i

After opening the main.i file, you will notice a significant increase in lines of code. This is because the entire content of the iostream header file has been pasted into it. At the end, you'll find your actual code.

# 0 "main.cpp"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "main.cpp"
# 1 "/usr/include/c++/14.1.1/iostream" 1 3
# 36 "/usr/include/c++/14.1.1/iostream" 3

# 37 "/usr/include/c++/14.1.1/iostream" 3

# 1 "/usr/include/c++/14.1.1/bits/requires_hosted.h" 1 3
# 31 "/usr/include/c++/14.1.1/bits/requires_hosted.h" 3
# 1 "/usr/include/c++/14.1.1/x86_64-pc-linux-gnu/bits/c++config.h" 1 3
# 33 "/usr/include/c++/14.1.1/x86_64-pc-linux-gnu/bits/c++config.h" 3

# 34 "/usr/include/c++/14.1.1/x86_64-pc-linux-gnu/bits/c++config.h" 3
# 308 "/usr/include/c++/14.1.1/x86_64-pc-linux-gnu/bits/c++config.h" 3

# 308 "/usr/include/c++/14.1.1/x86_64-pc-linux-gnu/bits/c++config.h" 3

namespace std __attribute__ ((__visibility__ ("default")))
{

# 62 "/usr/include/c++/14.1.1/iostream" 3
  extern istream cin;
  extern ostream cout;
  extern ostream cerr;
  extern ostream clog;

  extern wistream wcin;
  extern wostream wcout;
  extern wostream wcerr;
  extern wostream wclog;
# 82 "/usr/include/c++/14.1.1/iostream" 3
  __extension__ __asm (".globl _ZSt21ios_base_library_initv");
}

# 2 "main.cpp" 2
# 3 "main.cpp"
int main(){
    int a = 2;
    int b = 3;

    std::cout<<a+b<<std::endl;
}

The #define directive is a macro that is commonly used in programs. For example, let's say we want a macro named print that prints the passed argument to the console, serving as a replacement for cout.

#include <iostream>
#define print(arg) std::cout<<arg<<std::endl;

int main()
{
    int a = 2;
    a++;
    print(a)
    return 0;
}

After preprocessing the source_file.cpp file :

# 11 "main.cpp" 2
# 13 "main.cpp"
int main()
{
    int a = 2;
    a++;
    std::cout<<a<<endl;;
    return 0;
}

We can see that print(a) is being replaced by contents of print macro.

💡

An interesting thing to note is that all comments are removed during preprocessing.

💡

Additionally, if there are any syntax errors in the code, the preprocessor still processes the code because it doesn't perform any syntax analysis.

02] Compiling

This is one of the most interesting steps in building a C++ program because here the compiler translates the preprocessed output (main.i) into assembly language code. If you're unfamiliar with assembly language, don't worry. Simply put, it's a language just above the machine-level language, bringing us closer to our final executable.

int main(){
    int a = 2;
    a++;
}

# 0 "main.cpp"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "main.cpp"

int main(){
    int a = 2;
    a++;
}

Command to get assembly language code :

g++ -S preprocessed.i -o assembly.s

    .file    "main.cpp"           # Source file name
    .text                     # Text section (code)
    .globl    main               # Global declaration of main function
    .type    main, @function      # Define main as a function

main:
.LFB0:
    .cfi_startproc            # Start of function main
    pushq    %rbp                  # Push the base pointer onto the stack
    .cfi_def_cfa_offset 16    # Define the offset of CFA (Canonical Frame Address)
    .cfi_offset 6, -16        # Define offset for %rbp
    movq    %rsp, %rbp            # Move stack pointer to base pointer
    .cfi_def_cfa_register 6    # Define base register
    movl    $2, -4(%rbp)          # Move immediate value 2 to memory location at %rbp-4
    addl    $1, -4(%rbp)          # Add immediate value 1 to memory location at %rbp-4
    movl    $0, %eax              # Move immediate value 0 to register %eax (return value)
    popq    %rbp                  # Pop the base pointer from the stack
    .cfi_def_cfa 7, 8         # Define CFA with offset and register
    ret                       # Return from function
    .cfi_endproc              # End of function main

.LFE0:
    .size    main, .-main          # Size of main function
    .ident    "GCC: (GNU) 14.1.1 20240522"    # Compiler identification
    .section    .note.GNU-stack,"",@progbits    # GNU stack note

The code mentioned above is a assembly language program for the source code.

Command to disassemble the object file and generate the assembly code:

objdump -d main.o > main.asm

0000000000000000 <main>:
   0:    55                       push   %rbp                # Push the base pointer onto the stack
   1:    48 89 e5                 mov    %rsp,%rbp           # Move stack pointer to base pointer
   4:    c7 45 fc 02 00 00 00     movl   $0x2,-0x4(%rbp)     # Move immediate value 2 to memory location at %rbp-4
   b:    83 45 fc 01              addl   $0x1,-0x4(%rbp)     # Add immediate value 1 to memory location at %rbp-4
   f:    b8 00 00 00 00           mov    $0x0,%eax           # Move immediate value 0 to register %eax (return value)
  14:    5d                       pop    %rbp                # Pop the base pointer from the stack
  15:    c3                       ret                       # Return from function

💡

NOTE: The compiler provides different levels of optimization through a set of compiler flags which tend to improve the performance of the code.

03] Assembling

Finally assembler converts the assembly language code into machine code.

Command to assemble a source file :

g++ -c source_file.cpp -o object.o

Command to assemble a compiled file :

as assembly.s -o object.o

The output after assembling a file is a object file which contains all the machine code.

If you've reached this point and are curious to dive deeper into the intricacies of different statements in C++, consider exploring this GitHub repository: cpp-build-pipeline.

While we have the machine code, one might think we could execute it directly on the computer. However, there's a crucial step we're missing: linking. Let's delve into that further.

04] Linking

While building any project in C++, we are working with multiple files. Thus, we get multiple object files after compilation. Hence, we need to combine multiple object files to get a single executable file; that’s what the linker does.

Linking files resolves references to functions and variables that are spread across different object files.

Even when dealing with single C++ file we have to use linker to provide the entry point i.e main function to start its execution.

Command to link object files together :

g++ -o executable main.o

Lets consider a simple example to understand linking of multiple files.

#include <iostream>

int main(){
    add(10, 20);
    return 0;
}

#include <iostream>

void add(int a, int b)
{
    std::cout<<a+b<<std::endl;
}

In this case if we try to compiled main.cpp we get a compilation error, since the declaration of add function is not present in main.cpp.

#include <iostream>

void add(int a,int b);

int main(){
    add(10, 20);
    return 0;
}

Thus adding just the declaration of add function solves the compilation problem. After compiling both main.cpp and add.cpp have to link these files together in-order to provide reference to the add function.

g++ -o executable main.o add.o

05] Loading

Now we have the executable file but to run that file it needs to loaded into the main memory, thus loader loads the executable file into correct memory location.and initializes the program.

Command to load/run the program :

./executable

Conclusion

Building a C++ program involves translating human-readable code into machine-executable instructions through several steps: preprocessing, compiling to assembly, assembling to machine code, linking object files, and loading the executable. Understanding these steps helps programmers troubleshoot issues and optimize development.

If this blog sparked your curiosity about the workings of any system, consider hitting the like button and sharing it with your friends. For deeper insights through examples, don't forget to check the GitHub repository: cpp-build-pipeline.

BTS : Building a C++ program

Table of contents