Decoding the Magic of C Compilation: A Beginner's Guide

Emmanuel OyiboEmmanuel Oyibo
12 min read

The C programming language is one of the most interesting and powerful programming languages out there. Like most compiled languages (such as Java and C++), it requires a compiler to convert the written code to a form your machine can understand and execute.

As a newbie C programmer, you’ve probably written your first C program (my guess is a “Hello World”) and maybe a few more. If you’re curious enough, you may wonder what happens under the hood when compiling a C program. Don’t worry. Your curiosity will be satisfied before the end of this article.

Understanding the compilation process will help you diagnose/debug errors efficiently and write optimized codes as a C programmer. This understanding will eventually help you become a better programmer.

Hence, this article will explore the different stages of C compilation. And you’ll understand how your computer comprehends and transforms your C source codes into an executable file.

To follow and understand this article well, you need the following:

  • Basic knowledge of how to use a compiler for C programs

  • Knowledge of some basic Linux commands

  • A command line editor (e.g. vim, nano)

Ready? Let’s dive in!

Overview of the C Compilation Process

In simple terms, C compilation refers to how the compiler converts your C source codes into a machine-readable form. The computer only understands machine codes (i.e., 0s and 1s), so the compiler needs to convert your code (human-readable) into this machine language.

See the compiler as a language translator who understands the language you and your computer speak. So, it converts (translates, in this context) your code to a form your machine understands. Then your machine executes the instructions in the code and returns feedback. A win-win situation, yeah?

There are 4 stages of C compilation, namely:

  1. Preprocessing

  2. Compilation

  3. Assembly

  4. Linking

As mentioned earlier, C is a compiled language. Therefore, a compiler must convert the source code into an executable form. Many compilers work with C programs, but we’ll use the GNU Compiler Collection (GCC) in this article. GCC converts human-readable C codes into a machine-readable and executable form using the 4 stages listed above.

Each compilation stage is important in transforming C source codes into an executable program. Also, if an error occurs at any point during compilation, the process will not proceed, and the compiler will display an error message.

In addition, it’s essential to note that each stage of the compilation generates a file that the next stage uses. For example, the preprocessing stage generates an intermediate file with the extension .i and the compiling stage outputs an assembly file with the extension, .s. You'll learn more about these shortly.

Preprocessing

This is the first stage of the process that converts C source codes to an executable file. The preprocessing phase prepares the source code for compilation and typically performs the following tasks:

  • Comments removal

  • Macros expansion

  • File inclusion

  • Conditional compilation

Without further ado, let’s consider what each task of the preprocessing stage entails.

Comments Removal

When writing code in any programming language, adding comments explaining certain aspects or offering brief details about your code is good practice. This practice benefits others who may read your code and your future self.

You or someone else may visit your source code in a few months and probably wonder what each line does. But adding comments may help curtail the possible confusion.

However, the computer doesn’t need these comments. The preprocessor removes them before compilation begins. For example, let’s consider the simple 0-main.c program below:

#include <stdio.h>
/**
 * main - Entry point of the program
 *
 * Return: 0 at success
 */

int main(void)
{
    printf("This is an example.\n");
    /** This is another comment. Just like the comments above,
     *  the preprocessor will remove it.
     */
    return (0);
}

We can compile 0-main.c above and stop the process just after preprocessing using the -E option with gcc. Let's find out what happens:

gcc -E 0-main.c | tail

When you run the program above in your Bash terminal, the -E option used with the gcc command will stop compilation after the preprocessing phase. Then "pipe" the result into the Linux tail command.

The pipe command(|) is used to combine two or more Linux commands, while the tail command displays the last 10 lines of a file. You can find out more about the Linux commands, pipe and tail.

So, the last 10 lines of the preprocessed file are displayed.

# 9 "0-main.c"
int main(void)
{
 printf("This is an example.\n");




 return (0);
}

This code block above shows the output when you run gcc -E 0-main.c | tail on your terminal. As you can see, unlike our original program, the comments have been removed. Feel free to try this on your terminal.

Macros Expansion

A macro is a name that stands for a line or piece of code. This name is substituted for what it stands for in your program by the preprocessor during compilation. Macros are created using the preprocessor directive, #define.

For example:

#define PI 3.142

After defining the macro, PI, the preprocessor will replace it with the value, 3.142, anywhere you use it in your code before proceeding to the next phase.

In addition, you can create macros that take arguments like functions. See an example below:

#define SQUARE(x) ((x) * (x))

/*You can use the above macro in your code like below*/
int y = SQUARE(4); //The value of y will be 16.

Now that we know what macros are, let's see how the preprocessor expands them before compilation. Let's modify our 0-main.c example above by adding the SQUARE and PI macros:

#include <stdio.h>
/**
 * main - Entry point of the program
 *
 * Return: 0 at success
 */

#define PI 3.142
#define SQUARE(x) ((x) * (x))

int main(void)
{
    printf("This is an example.\n");
    // The preprocessor will replace the macros with their respective values
    int x = PI;
    int y = SQUARE(4);
    return (0);
}

Let's run the same command we did earlier in our terminal to see the output:

gcc -E 0-main.c | tail

This time, the above commands will output the following result:


# 11 "0-main.c"
int main(void)
{
 printf("This is an example.\n");

 int x = 3.142;
 int y = ((4) * (4));
 return (0);
}

After running the commands, gcc -E 0-main.c | tail, you can see that the preprocessor replaces every instance of the macros with their corresponding values.

File Inclusion

If you’ve been writing C programs, you must have added a few standard library header files, such as stdio.h, or even a custom header file to your program.

Also, you may have noticed that to add a header file to your program, you had to use the preprocessor directive, #include. Don't worry if you've been using these header files without knowing their purpose or how they work. I was once like you!

So, how do header files work?

I want you to see them as gateways or portals to other program files. Using a header file, you can utilize a function or piece of code that's present in another prewritten program in your current program.

For example, the C functions, printf() and scanf(), are prewritten functions present in the standard C library. There's no need to write these functions from scratch. You can use them in your code by adding the stdio.h file with the #include preprocessor directive.

The files you “included” will be added to your program during the preprocessing phase. For instance, if you used the standard library header, #include <stdio.h>, this one-liner gives your program access to the C program's standard input/output library.

Let's illustrate this using our 0-main.c program:

#include <stdio.h>
/**
 * main - Entry point of the program
 *
 * Return: 0 at success
 */

#define PI 3.142
#define SQUARE(x) ((x) * (x))

int main(void)
{
    printf("This is an example.\n");
    // The preprocessor will replace the macros with their respective values
    int x = PI;
    int y = SQUARE(4);
    return (0);
}

We shall compile the program and see what the first 10 lines look like using the Linux head command. This command displays the first 10 lines of the file. Feel free to learn more about the head command.

gcc -E 0-main.c | head

Running the commands above on your terminal will generate something like the result below:

# 1 "0-main.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "0-main.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/aarch64-linux-gnu/bits/libc-header-start.h" 1 3 4

Now you can see that the first line in our original 0-main.c program has expanded into more lines that may not make sense to you. The intricate details are beyond the scope of this article. However, the essential point is that the preprocessor has added some prewritten codes to your program.

Conditional Compilation

There are situations where you want certain macros to be used or ignored when you compile your code. This is what conditional compilation implies. It simply refers to compiling a code block based on whether a macro is defined.

You can achieve conditional compilation using preprocessor directives such as #ifndef, #ifdef, #if, #elif, and #endif.

Here's a piece of code illustrating conditional compilation:

#define DEBUG 1

...

#ifdef DEBUG
    printf("Debugging information: x=%d\n", x);
#endif

The printf() line will only run if the macro, DEBUG, is defined. And as we established earlier, the macro expansion happens during the preprocessing phase.

Compilation

As the name implies, a proper compilation of your preprocessed source code begins here. The compiler parses (reads) through your program and reveals any syntax error present via a warning on the display. Hence, the program will only run once the syntax error is fixed. And you must start the compilation process again.

Remember, we stated earlier that an intermediate file with the .i extension is created after preprocessing. This file is further processed by the compiler in this phase into an assembly file with the .s extension. The assembly file contains assembly-level code that the assembler will eventually convert into binary codes that your local machine can understand and execute.

Just as we suppressed the process immediately after preprocessing, we'll do the same for compilation. We can achieve this using the -S option with gcc.

We shall use our 0-main.c program above in its last modified form:

#include <stdio.h>
/**
 * main - Entry point of the program
 *
 * Return: 0 at success
 */

#define PI 3.142
#define SQUARE(x) ((x) * (x))

int main(void)
{
    printf("This is an example.\n");
    // The preprocessor will replace the macros with their respective values
    int x = PI;
    int y = SQUARE(4);
    return (0);
}

Now let's stop the process in the second stage:

gcc -S 0-main.c

When you run the command above, the compiler terminates the process after the second stage. The assembly output file will have the same base name as the source file, but it'll indeed have a different extension: 0-main.s. This file contains assembly code specific to your local machine's architecture.

You can view the content of 0-main.s using any file editor of your choice or run the command, cat 0-main.s, on your terminal. Here's what it looks like on my machine:

     .arch armv8-a
    .file    "0-main.c"
    .text
    .section    .rodata
    .align    3
.LC0:
    .string    "This is an example."
    .text
    .align    2
    .global    main
    .type    main, %function
main:
.LFB0:
    .cfi_startproc
    stp    x29, x30, [sp, -32]!
    .cfi_def_cfa_offset 32
    .cfi_offset 29, -32
    .cfi_offset 30, -24
    mov    x29, sp
    adrp    x0, .LC0
    add    x0, x0, :lo12:.LC0
    bl    puts
    mov    w0, 3
    str    w0, [sp, 24]
    mov    w0, 16
    str    w0, [sp, 28]
    mov    w0, 0
    ldp    x29, x30, [sp], 32
    .cfi_restore 30
    .cfi_restore 29
    .cfi_def_cfa_offset 0
    ret
    .cfi_endproc
.LFE0:
    .size    main, .-main
    .ident    "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
    .section    .note.GNU-stack,"",@progbits

Assembly

This phase converts the assembly-level instructions from the last step to machine language. The assembler is simply a program that works on the instructions from the previous steps to convert your code to binary/hexadecimal. The file generated from this phase is an object file with a .o extension.

We can halt the compilation process after this phase using the -c option with gcc.

gcc -c 0-main.c

The compilation starts from scratch after running the command above on your terminal. A corresponding object file, 0-main.o, is created when the process stops after assembly. The file's content is machine language and isn't as pretty as 0-main.s.

Viewing 0-main.o on my local using the Vim editor looks like this:

1 ^?ELF^B^A^A^@^@^@^@^@^@^@^@^@^A^@·^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^C^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@@^@^M^@^L^@ý{¾©ý^C^@<91>^@^@^@<90>^@^@^@<91>^@^@^@<94>`^@<8
0>Rà^[^@¹^@^B<80>Rà^_^@¹^@^@<80>Rý{¨À^C_ÖThis is an example.^@^@GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0^@^P^@^@^@^@^@^@^@^AzR^@^Dx^^^A^[^L^_^@ ^@^@^@^X^@^@^@^@^@^@^@
0^@^@^@^@A^N <9d>^D<9e>^CJÞÝ^N^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^D^@ñÿ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^A^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^D^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^E^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
2 ^@^@^@^@^@^E^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^M^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^G^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
3 ^@^@^@^@^@^H^@^T^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^F^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^P^@^@^@^R^@^A^@^@^@^@^@^@^
@^@^@0^@^@^@^@^@^@^@^U^@^@^@^P^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@0-main.c^@$d^@$x^@main^@puts^@^@^@^@^@^@^@^H^@^@^@^@^@^@^@^S^A^@^@^E^@^@^@^@^@^@^@^@^@^@^@^L^@^@^@^
@^@^@^@^U^A^@^@^E^@^@^@^@^@^@^@^@^@^@^@^P^@^@^@^@^@^@^@^[^A^@^@^M^@^@^@^@^@^@^@^@^@^@^@^\^@^@^@^@^@^@^@^E^A^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@.symtab^@.strtab^@.shstrtab^@.re
la.text^@.data^@.bss^@.rodata^@.comment^@.note.GNU-stack^@.rela.eh_frame^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^A^@^@^@^F^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@^@^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^D^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^[^@^@^@^D^@^@^@@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@X^B^@^@^@^@^@^@H^@^@^@^@^@^@^@
4 ^@^@^@^A^@^@^@^H^@^@^@^@^@^@^@^X^@^@^@^@^@^@^@&^@^@^@^A^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@p^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@,^@^@^@^H^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@p^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@1^@^@^@^A^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@p^@^@^@^@^@^@^@^T^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@9^@^@^@^A^@^@^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<84>^@^@^@^@^@^@^@,^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@B^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@°^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@W^@^@^@^A^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@°^@^@^@^@^@^@^@8^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@R^@^@^@^D^@^@^@@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@ ^B^@^@^@^@^@^@^X^@^@^@^@^@^@^@
5 ^@^@^@^H^@^@^@^H^@^@^@^@^@^@^@^X^@^@^@^@^@^@^@^A^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@è^@^@^@^@^@^@^@P^A^@^@^@^@^@^@^K^@^@^@^L^@^@^@^H^@^@^@^@^@^@^@^X^@^@^@^@^@^
@^@    ^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@8^B^@^@^@^@^@^@^Z^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^Q^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@¸^B^@^@^@^@^@^@a^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

Linking

This is the final stage of the C compilation process. Linking involves adding library files and other custom object files to your program before creating an executable file. The object file generated after assembly contains certain symbols or statements that your local machine’s operating system may not understand.

Hence, the linker adds certain library files to your program to make meaning of these symbols or statements.

For more context, assume your program is a fictional novel your operating system reads. If it comes across an unfamiliar word, the linker provides access to a dictionary that provides its meaning.

But what exactly are library files? These files contain pre-compiled pieces of code (such as functions, variables, etc.) that have been packaged for use across multiple platforms. And there are two types of libraries, namely static and dynamic libraries. These libraries are used for static and dynamic linking, respectively.

Furthermore, the linking phase checks your program for logic or data errors. These interrupt the compilation, and your program will only proceed if you fix the error and restarts the process.

This last phase generates an executable file a.out, which you can use to run the program on your terminal. Now let's run our 0-main.c file one last time using gcc, without halting the process at any point.

gcc 0-main.c

After running the above command, an executable, a.out, will be generated. Running this executable on your terminal, i.e. ./a.out will print the statement in the program, This is an example.

Below is a simple illustration of the C compilation process:

Diagrammatic illustration of the C compilation process

Note that when you compile your C program or run gcc "filename.c" on your terminal, all the processes described above happen at a go to generate the a.out executable file. We only suppressed the process at each stage to understand what happens under the hood.

Conclusion

C compilation describes how the compiler transforms your human-readable C codes into an executable file that your local machine can run. The process occurs in 4 stages:

  • Preprocessing

  • Compiling

  • Assembly

  • Linking

It's essential to understand this process as a C programmer, as it'll help improve your debugging skill and make you a better programmer.

Also, like every other skill in life, regular practice will help you go a long way. Therefore, I encourage you to practice, experiment, and explore further resources on C compilation and related concepts.

Thanks for reading! If you found this article helpful (which I bet you did 😉), got a question or spotted an error/typo... do well to leave your feedback in the comment section.

And if you’re feeling generous (which I hope you are 🙂) or want to encourage me, you can put a smile on my face by getting me a cup (or thousand cups) of coffee below. :)

Also, feel free to connect with me via LinkedIn.

13
Subscribe to my newsletter

Read articles from Emmanuel Oyibo directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Emmanuel Oyibo
Emmanuel Oyibo

As a budding DevOps engineer and a detail-oriented technical writer, I tackle the ever evolving realm of system automation, deployment, and integration on a regular basis. This blog is my online space where I document my journey, share the interesting things I discover, and untangle the challenging issues I face. My mission is to break down complex technical topics and make them straightforward and engaging. Whether you're deeply involved in tech or just starting to get curious, you're welcome here in my digital nook!