The C Compilation Process: A Step-by-Step Guide
Table of contents
Introduction
C language which is often regarded as a low-level language is a compiled language. Most programming languages fall under compiled and interpreted, the C programming language falls under the compiled group.
When they say a language is compiled, it simply means that it is converted into a form that can be directly executed by the computer's central processing unit (CPU). This conversion process from the source file of the code to a form that can be executed by the CPU is what is called compilation.
The C Compilation Process
The C compilation process is the step-by-step approach that is used to convert a C source file code (file.c) into an executable file, that is into a file that the computer processor can run directly.
This process is divided majorly into four steps, and we will be seeing what each step does, and the output from each step.
These steps are:
Preprocessing (Preprocessor)
Compiling (Compiler)
Assembling (Assembler)
Linking (Linker)
Each of these steps is unique and can be done individually with special options passed to the gcc compiler to tell it to stop at a particular point in the process.
Preprocessing
The preprocessing which is done by the preprocessor is the first step of this compilation, it is simply a program that does the following:
Removes comments that are in the source file.
Includes header files in the source code.
Expands or Replaces macro name with code.
Let us look at an example of this simple piece of code and what the preprocessor does under the hood.
#include <stdio.h>
#define PI 3.142
/**
*main - prints the value of PI
*Return: 0
*/
void main(void)
{
printf("The value of PI = %f\n", PI);
}
The above code is a simple code that prints out the value of PI. From the code, we have the #include <stdio.h>
which is the standard input and output file, and the #define PI 3.142
which is the macro with the name of PI
and a value of 3.142
and then finally we have our comment, which follows the Betty style of comment in C. Next is the main code.
What the preprocessor does is include the files in the standard input and output header file (stdio.h).
It also will expand or replace the macro name with code. Now from our macro definition, we defined PI to be 3.142 and then used it in the printf() function, now the preprocessor will replace all the PI in the code with the value we defined for it, which in this case is 3.142.
Finally, the preprocessor will remove the comments. So our code will likely becomes:
codes of stdio.h file
void main(void)
{
printf("The value of PI = %f\n", 3.142);
}
To compile a file to run through only the preprocessor, the option used is the -E alongside the gcc command, here we are using a c source code file named source_file.c, you can replace it with the name of c source code file you have. So we will have something like this:
gcc -E source_file.c
Immediately you run this command, you will get something like this:
Once you run the command, it will display on the standard out things like this. the first, second and third images are codes of the stdio.h header file, while the last image contains where the comment is being removed and the macro PI is been expanded or replaced with its value of 3.142.
Compiling
The compilation code is the second process of the whole compilation process. Here the compiler takes the preprocessed file as input and converts it to assembly code. The assembly code is a human-readable form of the object code (machine code).
To just stop the compiling process of the C Compilation process, we will have to use the -S option with the gcc. We can do it this way;
gcc -S source_file.c
using this command for the same code we have under the preprocessor, will generate a new file with the same name as the source file, but having a different format, for the assembly code, the format is .s
, so for the same code, we will have source_file.s
as seen in the picture below:
And if you open the file, or check the file, you will see the assembly code that was generated from the C source file. The assembly code is shown below:
Assembling
This is the third step in the compilation process, here the assembler takes the assembly code as input and converts it into machine code. This machine code can also be referred to as object code.
This machine code that is generated by the assembler is a binary form of the assembly code that can be directly executed by a computer.
An object file code is created here if the c source file is source_file.c
, then the object code file will be source_file.o
, that is it has the .o
format at the end.
To just stop the compilation process at the assembling stage, you should use the gcc
command with the -c
option to stop the assembling process. You can use the command below to do that:
gcc -c source_file.c
Below is what you will see:
If you view the source_file.o object code, you will see something like this:
Linking
This is the fourth and last step in the compilation process. In a situation where there are multiple source codes (c files) after compiling and assembling, the linker will link all the multiple object/machine codes into a single executable file.
Another role the linker does is that it links codes from functions used in the C library.
Finally, it converts the executable file into a format that can be executed by the operating system.
For Windows, we will end up with a .exe file, while for Linux, it will simply be the name of the file we output to. By default when the file name is not specified, it saves it in a file called a.out.
For this final process, one can simply just use the gcc command without any option, which will compile the file to an executable file that can be run on the computer system.
gcc source_file.c
with this, an executable file called a.out will be created, which we can run or execute with the following command:
./a.out
you can see the images below to see how it looks:
To compile it into an executive file with the same name as our source file, we simply add the -o
option to the gcc
and the file name, source_file.c
followed by the name we want the executive file to be in. Here is the command to use:
gcc source_file.c -o source_file
so to execute this file, we will just simply need to run the command:
./source_file
Check the image below for what you will see:
Conclusion
The C compilation process can be summarized as follows:
The preprocessor processes the C source code.
The compiler translates the C source code into assembly code or you can simply call it assembly language, which is the human-readable form of the machine language.
The assembler translates the assembly language code into object/machine code.
The linker combines the object/machine code files into an executable file that can be run on the operating system.
Thank you for reading. You can connect with me on Twitter and LinkedIn.
Subscribe to my newsletter
Read articles from Gideon Bature directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Gideon Bature
Gideon Bature
A Seasoned Software Engineer and Technical Writer.