We Don't Need No IDE

Table of contents

Something you should know about me is that I hate the feeling of being reliant on things. Whenever I use any sort of tool, (especially if it is very specialized), the question that always persists in my mind is "If I did not have this tool, would I even be able to accomplish my task?"
On that same note, I also hate being force-fed endless layers of abstraction.
IDEs do both.
This is especially frustrating when I am trying to learn a new concept and the tool I am using is abstracting my process so much that I forget what I am even trying to accomplish in the first place. Yes it is true, abstractions are very helpful when working on very complicated projects, but why does it always feel like they take away my ability to learn something from the ground up?
In summary, abstracting concepts is not how I learn. I need to get a sense of all of the details first.
Now, with that being said: Why do I have to learn how to set up and use an IDE? If I want to create a simple single-file program, why can't I just write some code in a basic text editor and just compile/build it myself? It is my project after all. If I know how to write it, why can't I also learn how to build it?
And so, I decided to write this guide as a future reference for myself and for others that share my similar frustrations in a lack of understanding of how C compilation ACTUALLY works.
No longer will you be waist-deep in auto-generated boilerplate code, nested config menus, and fifty mysterious settings that make your build almost work. If you just drop into a terminal and run a few simple lines of GCC commands, you'll be done already.
No IDE, no build system, not even Make. Just pure terminal GCC commands. I decided to show this process from first principles as best as I can. I've written this guide as a future reference for myself, but I hope others that share my similar frustrations in a lack of understanding of how C compilation ACTUALLY works will find this useful as well. Another great benefit of learning this way is that this knowledge will be easily transferable to IDEs, build systems, and other operating systems. If you can do it from scratch, you can do it using any tools too.
Please note this explanation is demonstrated in Linux using GCC, but I know many other development tools such as MinGW and Clang share significant similarities with GCC. I believe using GCC on Linux is the easiest way to demonstrate each step of the compilation process. Additionally, understanding this build process should help with building C++ applications too.
GCC Compilation Process
Here is the full GCC build process. Note that GCC is a toolchain meaning that it is a collection of tools (i.e. cpp, cc1, as, ld) that are called as intermediary steps or stages. In total, to get from the C source code to an executable, GCC needs to invoke the preprocessor, compiler, assembler, and linker in that order. Usually this is handled behind the scenes, but you can also use special GCC flags to pause at each of the stages. In this guide, I will be dissecting each stage independently.
For the first example, let's start off with dissecting what happens when the following "Hello World!"-like example is given:
A simple header file, "myheader.h"
A single C implementation file, "main.c"
Both exist in the same directory.
// myheader.h
#ifndef MY_HEADER_H
#define MY_HEADER_H
char* string_from_header = "Greetings!";
#endif
// main.c
#include <stdio.h>
#include "myheader.h"
int main() {
printf("%s\n", string_from_header);
}
The "main.c" file simply prints the string defined in "myheader.h".
To build this program through all of GCC's stages and skip straight to the output executable, all we need is to just write this single line in the terminal which produces an executable from "main.c" and names it "main":
gcc main.c -o main
And here is the output after execution :
Greetings!
Now, to break down what exactly is happening throughout the whole process, we can use some of GCC's compiler flags.
Also since GCC (or the GNU Compiler Collection) is considered a compiler driver that manages the different stages of compilation, technically it is not just a compiler. By definition, a compiler driver acts as an interface between the user and the compiler's core components. This essentially means GCC is a wrapper for all of the typical build process stages (i.e. preprocesser, compiler, assembler, linker). Every stage has its own associated tool that is invoked by GCC. In this guide, as I dissect each stage, I will mention both the GCC commands for that stage and the standalone tool.
A Very Brief Note on the History of GCC
Something that can often be confusing is the abbreviation of GCC. While in this context, GCC is the abbreviation for the GNU Compiler Collection, GCC simply used to be the GCC C Compiler. Since then, GCC grew into a full-blown toolkit that controls the entire build process by invoking the preprocessor, the linker, and the assembler. Nowadays, GCC grew to support other languages like C++, Objective-C, Objective-C++, Fortran, Ada, Go, D, Modula-2, and COBOL.
Step 1: Preprocessor
First, let's take a look at what is happening at the Preprocessor Stage with the "-E" flag.
gcc -E main.c -o main.i
Note that GCC here is invoking CPP (the C Preprocessor). To perform the preprocessor step manually, you can also execute CPP in the following way:
cpp main.c -o main.i
The output is a file with a ".i" file extension, which represents the preprocessed source code. It essentially just contains the code from "stdio.h" and "myheader.h" pasted into "main.c". Here is what that looks like. (I am only showing the last 20 lines since "stdio.h" is very large.)
Output after Preprocessor Stage:
806 extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1)));
807 # 959 "/usr/include/stdio.h" 3 4
808 extern int __uflow (FILE *);
809 extern int __overflow (FILE *, int);
810 # 983 "/usr/include/stdio.h" 3 4
811
812 # 2 "main.c" 2
813 # 1 "myheader.h" 1
814
815
816
817
818 # 4 "myheader.h"
819 char* string_from_header = "Greetings!";
820 # 3 "main.c" 2
821
822 int main() {
823 printf("%s\n", string_from_header);
824 return 0;
825 }
You can see the first 810 lines were from "stdio.h" and the remaining lines are from "myheader.h" and "main.c". Note that nothing was compiled here. All that was done was the two header files were preprocessed and expanded into "main.c" to create one large file of source code. The preprocessor output strongly resembles the original code.
Aside from the output including the expanded code from the included header files and replacing all macros with their defined values, here are some other things the preprocessor does:
Preprocessing directive lines (starting with '#') are removed or replaced with blank lines.
Comments are replaced with spaces.
Multiple consecutive blank lines might be collapsed into a single blank line.
Whitespace between tokens might be preserved or replaced with single spaces, depending on the implementation.
The output will often contain special lines of the form
# linenum filename flags
. These lines convey information about the original source file and line numbers, helping the compiler generate accurate error messages. The flags provide additional details, like whether a new file has started (flag 1), if the preprocessor is returning to a previously included file (flag 2), if the following code comes from a system header (flag 3), or if the text following should be treated as C (flag 4).
How does GCC know where the associated header files are located?
In this particular case, GCC knows where to search without us having to specify their locations since both of the header file locations are standardized.
The "stdio.h" header in #include <stdio.h>
is standardized because it is located in the standard system directories set by GCC (or by other compilers). The "myheader.h" header in #include "myheader.h"
is standardized because it shares the same directory as "main.c".
The convention is to use angle brackets, <>, for including standard system directories and double quotes, "", for including custom headers.
Here is what the official GCC documentation states about the convention:
By default, the preprocessor looks for header files included by the quote form of the directive #include "file" first relative to the directory of the current file, and then in a preconfigured list of standard system directories.
For the angle-bracket form #include , the preprocessor’s default behavior is to look only in the standard system directories. The exact search directory list depends on the target system, how GCC is configured, and where it is installed.
One way to find the paths of where GCC searches for system directories is to run the following command.
echo | gcc -E -Wp,-v -
What happens if you need to specify a specific location for a header file in the GCC command?
In that case, you can use the following "-I" GCC flag to tell the compiler to look in that directory for your header files:
gcc -I/path/to/myheader -E main.c -o main.i
Note you can do this for multiple header file paths:
gcc -I/path/to/myheader1 -I/path/to/myheader2 -E main.c -o main.i
The "-I" flag also supports relative paths if your header file is a level up from your current directory (i.e. -I../include
).
Other ways to include headers from specific locations can be to set environment variables for default include paths or to simply specify the relative path in the include preprocessor directive (i.e. #include "path/to/myheader/myheader.h"
.
Step 2: Compiler
This step is fairly straightforward. Here, GCC takes the preprocessed code from Step 1 and converts it into platform-specific assembly code.
To get to this stage from "main.c", you use the "-S" flag and run the following GCC command:
gcc -S main.c -o main.s
Under the hood, GCC calls a compiler known as "cc1" during this stage. The cc1 compiler is only marginally documented and is not supported to be invoked by users. Instead, it can only be executed by calling the GCC command. Therefore, I will only demonstrate how to compile with GCC's "-S" flag.
The result is the following assembly source code file (as a ".s" file extension).
Output after Compiler Stage:
1 .file "main.c"
2 .text
3 .globl string_from_header
4 .section .rodata
5 .LC0:
6 .string "Greetings!"
7 .section .data.rel.local,"aw"
8 .align 8
9 .type string_from_header, @object
10 .size string_from_header, 8
11 string_from_header:
12 .quad .LC0
13 .text
14 .globl main
15 .type main, @function
16 main:
17 .LFB0:
18 .cfi_startproc
19 endbr64
20 pushq %rbp
21 .cfi_def_cfa_offset 16
22 .cfi_offset 6, -16
23 movq %rsp, %rbp
24 .cfi_def_cfa_register 6
25 movq string_from_header(%rip), %rax
26 movq %rax, %rdi
27 call puts@PLT
28 movl $0, %eax
29 popq %rbp
30 .cfi_def_cfa 7, 8
31 ret
32 .cfi_endproc
33 .LFE0:
34 .size main, .-main
35 .ident "GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
36 .section .note.GNU-stack,"",@progbits
37 .section .note.gnu.property,"a"
38 .align 8
39 .long 1f - 0f
40 .long 4f - 1f
41 .long 5
42 0:
43 .string "GNU"
44 1:
45 .align 8
46 .long 0xc0000002
47 .long 3f - 2f
48 2:
49 .long 0x3
50 3:
51 .align 8
52 4:
Even if you are not proficient with x86_64 assembly, it is still clear that this code represents "main.c".
Compiler Optimizations:
During this stage, you can also perform compiler optimizations using "-O" flags. This will result in some differences in how the assembly code is generated. Here are some common optimization flags.
-O0: Default option - meant for reducing compilation time.
-O1: Reduces code size and execution time
-O2: Further reduces code size and execution time at the expense of increased compilation time and memory usage.
-O3: Even further reduces code size and execution time at the expense of more increased compilation time and memory usage.
-Os: Optimizes for reduced code size
Example of producing assembly code with GCC that is optimized for reducing the size of the executable:
gcc -S -Os main.c
If you are interested in reading more about the details of GCC's optimizations, you can visit GCC's documentation.
Step 3: Assembler
The Assembler Stage converts the assembly code (from a ".s" file) to machine code. The result is an object file, which is represented in a non human-readable format as a ".o" file extension. In simple terms, an object file is just an intermediate file that contains machine code and data that have been translated from source code. Object files are not complete executables and have to be passed to the linker in the following Linker Stage.
An object file can be generated from "main.c" with GCC's "-c" flag in the following way:
gcc -c main.c -o main.o
Under the hood, executing this GCC command invokes the GNU Assembler (also known as "as" or "gas"). Here is how GCC is calling the assembler:
as main.s -o main.o
While we cannot display the object file directly, using the objdump
command (another GNU utility) with the "-D" flag allows us to disassemble the machine code and view the executable sections of the file.
objdump -D main.o
Output of Assembler Stage (Disassembled Machine Code):
1
2 main.o: file format elf64-x86-64
3
4
5 Disassembly of section .text:
6
7 0000000000000000 <main>:
8 0: f3 0f 1e fa endbr64
9 4: 55 push %rbp
10 5: 48 89 e5 mov %rsp,%rbp
11 8: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # f <main+0xf>
12 f: 48 89 c7 mov %rax,%rdi
13 12: e8 00 00 00 00 call 17 <main+0x17>
14 17: b8 00 00 00 00 mov $0x0,%eax
15 1c: 5d pop %rbp
16 1d: c3 ret
17
18 Disassembly of section .rodata:
19
20 0000000000000000 <.rodata>:
21 0: 48 rex.W
22 1: 65 6c gs insb (%dx),%es:(%rdi)
23 3: 6c insb (%dx),%es:(%rdi)
24 4: 6f outsl %ds:(%rsi),(%dx)
25 5: 20 57 6f and %dl,0x6f(%rdi)
26 8: 72 6c jb 76 <main+0x76>
27 a: 64 21 00 and %eax,%fs:(%rax)
28
29 Disassembly of section .comment:
30
31 0000000000000000 <.comment>:
32 0: 00 47 43 add %al,0x43(%rdi)
33 3: 43 3a 20 rex.XB cmp (%r8),%spl
34 6: 28 55 62 sub %dl,0x62(%rbp)
35 9: 75 6e jne 79 <main+0x79>
36 b: 74 75 je 82 <main+0x82>
37 d: 20 31 and %dh,(%rcx)
38 f: 33 2e xor (%rsi),%ebp
39 11: 33 2e xor (%rsi),%ebp
40 13: 30 2d 36 75 62 75 xor %ch,0x75627536(%rip) # 7562754f <main+0x7562754f>
41 19: 6e outsb %ds:(%rsi),(%dx)
42 1a: 74 75 je 91 <main+0x91>
43 1c: 32 7e 32 xor 0x32(%rsi),%bh
44 1f: 34 2e xor $0x2e,%al
45 21: 30 34 29 xor %dh,(%rcx,%rbp,1)
46 24: 20 31 and %dh,(%rcx)
47 26: 33 2e xor (%rsi),%ebp
48 28: 33 2e xor (%rsi),%ebp
49 2a: 30 00 xor %al,(%rax)
50
51 Disassembly of section .note.gnu.property:
52
53 0000000000000000 <.note.gnu.property>:
54 0: 04 00 add $0x0,%al
55 2: 00 00 add %al,(%rax)
56 4: 10 00 adc %al,(%rax)
57 6: 00 00 add %al,(%rax)
58 8: 05 00 00 00 47 add $0x47000000,%eax
59 d: 4e 55 rex.WRX push %rbp
60 f: 00 02 add %al,(%rdx)
61 11: 00 00 add %al,(%rax)
62 13: c0 04 00 00 rolb $0x0,(%rax,%rax,1)
63 17: 00 03 add %al,(%rbx)
64 19: 00 00 add %al,(%rax)
65 1b: 00 00 add %al,(%rax)
66 1d: 00 00 add %al,(%rax)
67 ...
68
69 Disassembly of section .eh_frame:
70
71 0000000000000000 <.eh_frame>:
72 0: 14 00 adc $0x0,%al
73 2: 00 00 add %al,(%rax)
74 4: 00 00 add %al,(%rax)
75 6: 00 00 add %al,(%rax)
76 8: 01 7a 52 add %edi,0x52(%rdx)
77 b: 00 01 add %al,(%rcx)
78 d: 78 10 js 1f <.eh_frame+0x1f>
79 f: 01 1b add %ebx,(%rbx)
80 11: 0c 07 or $0x7,%al
81 13: 08 90 01 00 00 1c or %dl,0x1c000001(%rax)
82 19: 00 00 add %al,(%rax)
83 1b: 00 1c 00 add %bl,(%rax,%rax,1)
84 1e: 00 00 add %al,(%rax)
85 20: 00 00 add %al,(%rax)
86 22: 00 00 add %al,(%rax)
87 24: 1e (bad)
88 25: 00 00 add %al,(%rax)
89 27: 00 00 add %al,(%rax)
90 29: 45 0e rex.RB (bad)
91 2b: 10 86 02 43 0d 06 adc %al,0x60d4302(%rsi)
92 31: 55 push %rbp
93 32: 0c 07 or $0x7,%al
94 34: 08 00 or %al,(%rax)
95 ...
The values with the colon represent the virtual address of each instruction. The hex values after the virtual address represent the individual bytes of each operation on each line.
Each section header describes the type of data being stored. For example:
The .text segment stores actual machine code (compiled functions).
The .rodata segment stores read-only data (such as string literals or constant variables).
The .comment segment stores compiler metadata.
The .note.gnu.property segment stores data required by the OS or loader.
The eh_frame segment contains data for exception handling.
Other memory segments (not present in this example) often include:
.data for storing initialized global and initialized static variables.
.bss for storing uninitialized global and uninitialized static variables.
.heap for storing dynamically allocated memory.
.stack for storing function call frames (function arguments, local variables and return addresses).
A few more notes on object files:
The code and data within an object file are relocatable as shown by the virtual addresses associated with each instruction. This means their exact memory addresses are not yet fixed, and can be located into different memory addresses during execution.
Object files contain metadata and information necessary for the linker to combine them with other object files and libraries, which includes symbol tables (names of functions and global variables) and relocation information.
Since object files are not executable, they require a linker to resolve external references, and combine it with other necessary components (i.e. libraries).
Object files are modular meaning they can be compiled or recompiled independently from their respective source code file. Then, in the Linker Stage, each of these separate object files can be combined into a single executable file. Having a collection of object files ready to be linked is an easy way to speed up compilation time if only certain object files required to be recompiled.
Step 4: Linker
The Linker Stage is the last major step that combines all object files and any libraries in order to produce an executable file. The linker script is also required to tell the linker how to properly map the executable to memory addresses. However, you are provided a linker script by default on a general purpose PC. More details on linker scripts are discussed later. Linking is a quite complex topic so I will attempt to give a general overview without too many details. (Perhaps in a future article, I can go over linking in much more detail).
Here are the main purposes of the linker:
Symbol Resolution: The linker builds a global symbol table that matches all undefined symbols with their definitions. Symbols are named entities of the program such as functions, global variables, and static variables, and their purpose is to allow different parts of your program (e.g. separate object files and libraries) to refer to each other. The symbol table maps the names of functions and variables to their locations within the object file. For example, if one object file calls a function defined in another object file, the linker uses the symbol for that function to connect the call instruction to the correct memory address where the function's code resides.
Relocation: Before relocation, instructions that reference functions or global variables use placeholders. Relocation fixes up these instructions with actual existing memory addresses.
Section Merging: The linker combines the memory sections for each object file into one large .text, .data, and .bss section in the final binary.
Output Binary Generation: The output binary on Linux is stored as an ELF file format (Executable and Linkable Format), while MacOS uses "Mach-O" and Windows use PE (Portable Executable) file formats.
To produce an executable with the linker, GCC does not require any flags. Simply do:
gcc main.o -o main
Under the hood, GCC uses the "ld" linker to perform linking during the build process. To directly invoke the ld linker, you can use the following command:
ld main.o -o main
And so, voila! Now we have an executable we can run!
$ ./main
Greetings!
To actually view the executable ELF file, a good way is to use the readelf
command. Here is an example using the readelf
command to display the symbol table where we can see the details of every variable or function using the "-s" flag.
readelf -s main
Output Of ELF Symbol Table:
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _[...]@GLIBC_2.34 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (3)
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
6: 0000000000000000 0 FUNC WEAK DEFAULT UND [...]@GLIBC_2.2.5 (3)
Symbol table '.symtab' contains 37 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS Scrt1.o
2: 000000000000038c 32 OBJECT LOCAL DEFAULT 4 __abi_tag
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
4: 0000000000001090 0 FUNC LOCAL DEFAULT 16 deregister_tm_clones
5: 00000000000010c0 0 FUNC LOCAL DEFAULT 16 register_tm_clones
6: 0000000000001100 0 FUNC LOCAL DEFAULT 16 __do_global_dtors_aux
7: 0000000000004018 1 OBJECT LOCAL DEFAULT 26 completed.0
8: 0000000000003dc0 0 OBJECT LOCAL DEFAULT 22 __do_global_dtor[...]
9: 0000000000001140 0 FUNC LOCAL DEFAULT 16 frame_dummy
10: 0000000000003db8 0 OBJECT LOCAL DEFAULT 21 __frame_dummy_in[...]
11: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
12: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
13: 00000000000020f0 0 OBJECT LOCAL DEFAULT 20 __FRAME_END__
14: 0000000000000000 0 FILE LOCAL DEFAULT ABS
15: 0000000000003dc8 0 OBJECT LOCAL DEFAULT 23 _DYNAMIC
16: 0000000000002010 0 NOTYPE LOCAL DEFAULT 19 __GNU_EH_FRAME_HDR
17: 0000000000003fb8 0 OBJECT LOCAL DEFAULT 24 _GLOBAL_OFFSET_TABLE_
18: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_mai[...]
19: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
20: 0000000000004000 0 NOTYPE WEAK DEFAULT 25 data_start
21: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5
22: 0000000000004018 0 NOTYPE GLOBAL DEFAULT 25 _edata
23: 0000000000001168 0 FUNC GLOBAL HIDDEN 17 _fini
24: 0000000000004010 8 OBJECT GLOBAL DEFAULT 25 string_from_header
25: 0000000000004000 0 NOTYPE GLOBAL DEFAULT 25 __data_start
26: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
27: 0000000000004008 0 OBJECT GLOBAL HIDDEN 25 __dso_handle
28: 0000000000002000 4 OBJECT GLOBAL DEFAULT 18 _IO_stdin_used
29: 0000000000004020 0 NOTYPE GLOBAL DEFAULT 26 _end
30: 0000000000001060 38 FUNC GLOBAL DEFAULT 16 _start
31: 0000000000004018 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
32: 0000000000001149 30 FUNC GLOBAL DEFAULT 16 main
33: 0000000000004018 0 OBJECT GLOBAL HIDDEN 25 __TMC_END__
34: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
35: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@G[...]
36: 0000000000001000 0 FUNC GLOBAL HIDDEN 12 _init
And here is the output of the ELF's file headers (using the “-h” flag) showing basic information about the file's type, (architecture, entry point, and header offsets):
readelf -h main
Output Of ELF File Headers:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1060
Start of program headers: 64 (bytes into file)
Start of section headers: 14024 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
Something to keep in mind is that this was a very simple example with just one C source code file and one header file. If we had to compile and link multiple C source code files, we would typically compile the C source code files independently. This would result in multiple separate object files (one object file for each C source code file). Then, we would finally link all of the separate object files into one executable using the linker. This is what the linker does if you tell GCC to compile multiple C source code files together.
To link multiple object files together to produce an executable, you can use the following GCC command:
gcc main.o file1.o file2.o -o main
And there you go! I believe I pretty much covered the basics. If you are still interested, the next few sections of this article go over a few more specifics, such as linking libraries, using linker scripts, and converting to raw binary format.
Linker Map Files
We can also get the linker to generate a map file, which is a detailed output file that provides a comprehensive overview of how the components of your program are arranged in memory. It lists the memory layout, the symbol locations, discarded sections, and the included object files and libraries. Here is an example of how to generate a map file:
gcc main.o -o main -Wl,-Map=mapfile.map
You can use the "-Wl" flag in a GCC command to directly pass options to the linker, such as the linker's "-Map" flag.
Here is an example of what a map File looks like (ignoring the hundreds of lines of the linker script):
Output of mapfile.map:
1
2 Merging program properties
3
4
5 As-needed library included to satisfy reference by file (symbol)
6
7 libc.so.6 main.o (puts@@GLIBC_2.2.5)
8
9 Discarded input sections
10
11 .note.GNU-stack
12 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-linux-gnu/13/../../../x8 6_64-linux-gnu/Scrt1.o
13 .note.gnu.property
14 0x0000000000000000 0x20 /usr/lib/gcc/x86_64-linux-gnu/13/../../../x8 6_64-linux-gnu/crti.o
15 .note.GNU-stack
16 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-linux-gnu/13/../../../x8 6_64-linux-gnu/crti.o
17 .note.GNU-stack
18 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-linux-gnu/13/crtbeginS.o
19 .note.gnu.property
20 0x0000000000000000 0x20 /usr/lib/gcc/x86_64-linux-gnu/13 crtbeginS.o
21 .note.GNU-stack
22 0x0000000000000000 0x0 main.o
23 .note.gnu.property
24 0x0000000000000000 0x20 main.o
25 .note.GNU-stack
26 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-linux-gnu/13/crtendS.o
27 .note.gnu.property
28 0x0000000000000000 0x20 /usr/lib/gcc/x86_64-linux-gnu/13/crtendS.o
29 .note.gnu.property
30 0x0000000000000000 0x20 /usr/lib/gcc/x86_64-linux-gnu/13/../../../x8 6_64-linux-gnu crtn.o
31 .note.GNU-stack
32 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-linux-gnu/13/../../../x8 6_64-linux-gnu/crtn.o
33
34 Memory Configuration
35
36 Name Origin Length Attributes
37 *default* 0x0000000000000000 0xffffffffffffffff
38
39 Linker script and memory map
40
41 LOAD /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o
42 LOAD /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crti.o
... <LINKER SCRIPT LINES>
507 OUTPUT(testmap elf64-x86-64)
What are Linker Scripts?
Linker scripts allow you to have a lot more advanced control over the linking process by defining the memory layout and symbol placement. Writing linker scripts tends to be a pretty niche skill especially since it is not often done, since linker scripts are provided for you based on the device/architecture you use. Even in embedded system development, the vendor toolchains typically provide a linker script for the given hardware architecture.
A linker has another tool built into it known as a "Locator" which assigns physical memory addresses to the code and data sections within the combined program. In reality, when you feed a linker a linker script, you are actually feeding it to the locator. Essentially the linker prepares the code for execution, and the locator places it into the correct memory locations.
From my experience, it seems like information on how to write linker scripts from scratch is very scarce. I will not go into detail here on linker scripts as writing linker scripts is beyond the scope of this article, but if you are brave enough, aside from the official ld GCC documentation, there is also this incredible guide you can visit. Note that linker scripts are heavily hardware-dependent, meaning that you need to understand the memory segments and addressing and other hardware requirements of your chosen device very well.
If you do decide to write your own linker script files, they are stored with an ".ld" extension. You can include a linker script with the "-T" flag during the Linker Stage.
gcc main.o -T linker_script.ld -o main.o
In general, here are some examples when you may write your own linker script:
Embedded System Development: When needing precise control over the memory layout to place code and data in specific memory regions.
OS Kernel Development: When building your own operating system or kernel, you may need to define the memory map, entry points, and section placement to match the hardware architecture and boot process.
Custom Hardware Architectures: When you have a custom architecture, you will need to manually specify to the linker how the memory regions are mapped.
How does the linker work with libraries?
Without going into too much detail about libraries, I will summarize the essentials you need to know to create or work with libraries.
Firstly, it is important to understand the basic definition of a library: a collection of pre-compiled code that you can reuse. The simplest example is the C Standard Library (libc) which has many pre-compiled functions and type definitions for basic tasks such as string manipulation, mathematical computation, input/output, and memory management. Without it, you would not even be able to use printf
. There is also no need to link the C Standard Library manually since it is linked by default.
Secondly, it is important to understand the basic differences between static libraries and shared libraries (or dynamically linked libraries).
Static libraries are libraries that are linked at compile/link time. This means the linker copies all necessary code from the library directly into the final executable. Static libraries are often beneficial to use in embedded systems development for two reasons: portability and performance. However, linking static libraries results in a larger executable size since the library code is directly included in the executable. Additionally, each program that uses the static library requires its own copy.
Shared libraries are libraries that are linked to executables at runtime instead of copying the code directly into the final executable. This means the linker only includes the references to the library's functions in the executable. The operating system is responsible for loading the library into memory when the program is executed. Linking shared libraries leads to a smaller executable size due to only references being included. Unlike static libraries, a shared library can be shared by multiple programs resulting in saved memory usage. However, shared libraries can result in a slower startup time due to being linked at runtime.
Next, let's talk about how both types of libraries are created and linked.
Let's say we have two C source code files: "file1.c" and "file2.c", and we compile them to object files independently using GCC's "-c" flag. The result is two object files: "file1.o" and "file2.o".
On Linux, static libraries are stored with a ".a" file extension. On the surface, static libraries are nothing special. A static library is simply an archive of multiple object files. You can create a static library, "libmylibrary.a", with the Linux "ar" archiver utility:
ar rcs libmylibrary.a file1.o file2.o
The result is a static library named "libmylibrary.a".
The "rcs" flags are specifically used for creating static libraries. They instruct the archiver tool to replace or add files to the archive, create the archive if it does not exist, and build a symbol table for the object files for easier linking.
On Linux, shared libraries are stored with a ".so" file extension. For shared libraries, it is necessary to first compile "file1.c" and "file2.c" into object files as position-independent code. This is done with GCC's "-fPIC" flags and is required in order for the shared libraries to be loaded correctly into memory. A shared library's exact memory location is not known at compile time, so position-independent code allows the library to be loaded at any available address in the process's virtual memory space. This also allows the shared library to be shared by multiple different programs.
Here is how to produce position-independent code:
gcc -fPIC -c file1.c -o
gcc -fPIC -c file2.c -o
To actually create the shared library, we have to use GCC's "-shared" flag:
gcc -shared -o libmylibrary.so file1.o file2.o
Finally, to link either the static library or shared library with "main.o" and generate the executable we use the following GCC command:
gcc main.o -L. -lmylibrary -o main
In this case, both the static or shared libraries were located in the current directory, which is why the "-L." was used. If you want to include a custom path for your library, you can use the "-L" flag directly followed by the path.
The lowercase "-l" flag was used to specify the name of the library.
Note: The common naming convention is to name your library with a "lib" prefix and ignore the prefix when using the "-l" flag along with the file extension in the GCC command (changing "libmylibrary.a" to "-lmylibrary"). This way, it makes it easier to link with the GCC commands. If the library does not follow the "lib" prefix naming convention, you can link the library in the following way:
gcc main.c -o main -L/path/to/mylibrary -l:mylibrary.a
To verify that the executables have been linked properly on Linux, you can use the file main
command for the statically linked executable or the ldd main
command for the dynamically linked executable.
Optional Step: Strip Executable To Binary
On bare-metal or embedded devices such as microcontrollers, you may need to further convert the executable file to a binary or hex executable as these devices cannot directly read an ELF (or EXE) file. Keep in mind the linker produces an ELF file with full symbolic information and section metadata. You can instead use the objcopy utility to strip away that information and just produce a raw ".bin" or ".hex" file that a bare-metal device can execute.
To convert from ELF to binary using objcopy:
objcopy -O binary main.elf main.bin
If you then want to finally see the pure, raw, binary/hex values, you can just the hexdump
command. Here is the hexdump
command showing the first 256 bytes of "main.bin".
hexdump main.bin -n 256
Raw Binary Output of main.bin:
0000000 6c2f 6269 3436 6c2f 2d64 696c 756e 2d78
0000010 3878 2d36 3436 732e 2e6f 0032 0000 0000
0000020 0004 0000 0020 0000 0005 0000 4e47 0055
0000030 0002 c000 0004 0000 0003 0000 0000 0000
0000040 8002 c000 0004 0000 0001 0000 0000 0000
0000050 0004 0000 0014 0000 0003 0000 4e47 0055
0000060 0aef 25ee 6705 e4f7 d365 56fc eb9c 8f2a
0000070 6ba4 6d9b 0004 0000 0010 0000 0001 0000
0000080 4e47 0055 0000 0000 0003 0000 0002 0000
0000090 0000 0000 0000 0000 0002 0000 0006 0000
00000a0 0001 0000 0006 0000 0000 0081 0000 0000
00000b0 0006 0000 0000 0000 65d1 6dce 0000 0000
00000c0 0000 0000 0000 0000 0000 0000 0000 0000
00000d0 0000 0000 0000 0000 0006 0000 0012 0000
00000e0 0000 0000 0000 0000 0000 0000 0000 0000
00000f0 0048 0000 0020 0000 0000 0000 0000 0000
0000100
The first number on each line is the starting offset in the file for the first of the 8 following values on that line. The following 16 bytes of the line are simply raw data. Hexdump can be a pretty interesting utility to play around with to examine raw binary data. I recommend if you are interested in reading up about the binary data format, to read this article.
A Simple Example To Tie Everything Together
Lastly, if we take everything we learned, we can create a bash script that performs every step for us.
Here is my example of compiling the following project just with terminal commands.
The project is all located in the same directory. It consists of
- "main.c"
- "file1.c"
- "file2.c"
- "myheader.h"
- "libmystaticlibrary.a" (A static library consisting of "libfile1.o" and "libfile2.o")
- "mylibraryheader.h" (A header for the static library's functions)
The purpose of this program is to simply print out a string from each file. This confirms that all of the object files were linked, the static library's functions were linked, and the header files were included properly.
// main.c
#include <stdio.h>
#include "myheader.h"
#include "mylibraryheader.h"
int main() {
// Printing from other functions
print_from_file1();
print_from_file2();
// Printing from myheader.h
printf("%s\n", headerstring);
// Printing from static library functions
print_from_libfile1();
print_from_libfile2();
return 0;
}
// file1.c
#include <stdio.h>
void print_from_file1() {
printf("Printing From file1\n");
}
// file2.c
#include <stdio.h>
void print_from_file2() {
printf("Printing From file2\n");
}
// myheader.h
#ifndef MYHEADER_H
#define MYHEADER_H
char* headerstring = "This string is located in myheader.h";
// Function prototypes for file1.c and file2.c:
void print_from_file1();
void print_from_file2();
#endif
// libfile1.c
#include <stdio.h>
void print_from_libfile1() {
printf("Printing from libfile1\n");
}
// libfile2.c
#include <stdio.h>
void print_from_libfile2() {
printf("Printing from libfile2\n");
}
// mylibraryheader.h
#ifndef MYLIBRARYHEADER_H
#define MYLIBRARYHEADER_H
// Function prototypes for libfile1.c and libfile2.c:
void print_from_libfile1();
void print_from_libfile2();
#endif
Here is the bash script that demonstrates every step of the build process from scratch and generates an executable file named "main":
#! /usr/bin/bash
# build.sh
# Path to myheader.h and mylibraryheader.h ("." is current directory)
headerpath="."
# Path to libmystaticlibrary.a ("." is current directory)
librarypath="."
# Creating a static library from libfile1.c and libfile2.c:
gcc -c libfile1.c -o libfile1.o
gcc -c libfile2.c -o libfile2.o
ar rcs libmystaticlibrary.a libfile1.o libfile2.o
# Preprocessor Stage:
# Preprocess all source code files separately
gcc -I$headerpath -E main.c -o main.i
gcc -E file1.c -o file1.i
gcc -E file2.c -o file2.i
# Compiler Stage:
# Compile all preprocessed files separately
gcc -S main.i -o main.s
gcc -S file1.i -o file1.s
gcc -S file2.i -o file2.s
# Assembler Stage:
# Generate object files for each assembly file separately
gcc -c main.s -o main.o
gcc -c file1.s -o file1.o
gcc -c file2.s -o file2.o
# Linker Stage:
# Link object files together into an executable
# and include libmystaticlibrary.a
gcc main.o file1.o file2.o -L$librarypath -lmystaticlibrary -o main
Output:
$ ./main
Printing From file1
Printing From file2
This string is located in myheader.h
Printing from libfile1
Printing from libfile2
This output verifies that everything was included/linked properly.
If you would like to download this example yourself, you can get it from my Github.
And there you go! If you made it this far, I commend you. I hope this comprehensive guide taught you how to actually build a program using the GCC toolchain from scratch without any IDE or build system abstractions.
If you are like me, you enjoy learning from the ground up and that is what I tried to achieve with this article. Hopefully this clears up any frustrations about spending hours setting up IDE compiler settings or interpreting obscure build errors in your IDE. If all of the IDEs in the world magically disappear one day, may you find comfort in knowing that you do not need no god-forsaken IDE to compile a C program.
Subscribe to my newsletter
Read articles from Mikhail Smirnov directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Mikhail Smirnov
Mikhail Smirnov
I am an Electrical/Computer Engineer early in my career and I decided to start a blog as a passion project. As of right now, the projects and articles I plan on sharing here will be mostly related to engineering. My biggest interests are currently: Embedded Systems, Digital Signal Processing, Mathematical Modeling & Scientific Computing, and Music. I plan to dive into every one of those topics at some point!