In the Linux operating system, process creation is one of the fundamental aspects of how programs execute and interact with the system. Processes are instances of running programs, and understanding how they are created, managed, and terminated is crucial for developers and system administrators. In this blog, we will delve into the internals of process creation in Linux using fork() and exec(), and explore concepts like zombie processes and how they are managed using the wait() system call.

Process Creation: using fork()

The Linux kernel uses the fork() system call to create a new process. When a process calls fork(), the operating system creates a child process that is a nearly identical copy of the parent process. The child process receives a new Process ID (PID) but inherits most of the parent’s resources, including file descriptors, environment variables, and memory space.

What Happens During a fork() Call?
- Memory Space Duplication: Initially, the memory space of the parent is copied for the child, but Linux uses a technique called Copy-On-Write (COW) to optimize this. Memory pages are shared between the parent and child processes until one of them writes to a page, at which point a copy of that page is made.
- Separate Execution: Both the parent and child process continue execution from the point where the fork() was called. The return value of fork() helps distinguish between the two:
  - In the parent process, fork() returns the PID of the child. So basically a parent is responsible for the child, like if the child finished its execution or not, if it is taking too long etc and hence the requirement for the PID of the child.
  - In the child process, fork() returns 0. Doesn’t care about parents. Just does what its supposed too.

    pid_t pid = fork();

    if (pid == 0) {
        printf("I am the child process\n");
    } else if (pid > 0) {
        printf("I am the parent process\n");
    } else {
        perror("fork failed");
    }

Replacing Process Memory: The exec() Family

After a process is created using fork(), the child often calls one of the functions in the exec() family to replace its memory space with a new program. Unlike fork(), which creates a new process, exec() loads a new program into the current process and starts its execution from the entry point.

For example, the execl() function might be used to replace the child process with another program:
```
 execl("/bin/ls", "ls", "-l", (char *) NULL);
```
This call replaces the child’s memory with the /bin/ls program, effectively transforming it into a new process that executes the ls command.
1. Zombie Processes
  
  A zombie process is a process that has completed execution (via exit()), but its exit status has not been read by its parent process. When a child process terminates, it enters the "zombie" state until the parent collects its termination status using the wait() or waitpid() system calls.
  
  When a process terminates, Linux must keep some information about it in the process table to allow the parent to retrieve the termination status. Once the parent retrieves this information using wait(), the kernel removes the process entry from the process table.
  
  A zombie process is essentially a dead process that still occupies an entry in the process table. Zombies do not consume system resources (like memory or CPU), but they do occupy an entry in the process table, which is finite. Too many zombie processes can eventually exhaust the process table, causing issues for the system.
  1. The wait() System Call
    
    The wait() system call is used by the parent process to wait for its child process to finish execution and retrieve its exit status. When the parent calls wait(), it blocks until a child process terminates, at which point it reaps the child’s exit status, and the zombie process is removed from the process table.
```
 pid_t pid = fork();

 if (pid == 0) {
     // Child process
     printf("Child process executing\n");
     exit(0);  // Child exits, becoming a zombie
 } else if (pid > 0) {
     // Parent process
     printf("Parent waiting for child to terminate\n");
     wait(NULL);  // Parent collects the child’s exit status
     printf("Child has been reaped\n");
 }
```
    In the code above, once the child process exits, the parent calls wait() to reap the child. This prevents the child from becoming a zombie.
    
    Once the parent process calls wait(), the zomn=bie process is removed fromt the process table, freeing up its entry. The two main methods for handling zombie processes :
    1. Using wait() or waitpid(): A parent process should always call wait() (or a related function like waitpid()) to clean up after its child processes. This prevents the creation of zombies.
    2. Reparenting and Orphan Processes: If a parent process terminates before calling wait(), any orphaned child processes are reparented to the init process (systemd in modern Linux). The init process automatically calls wait() to clean up orphaned child processes, thus preventing zombies.

Understanding these internals is crucial for developing efficient, reliable, and scalable applications on Linux. By mastering these system calls and process management techniques, you can ensure your applications perform optimally, avoiding the common pitfalls of process creation and termination.

Understanding Linux Process Creation Internals: fork(), exec(), Zombie Processes, and the Role of wait()

Table of contents

Process Creation: using fork()

What Happens During a `fork()` Call?

Replacing Process Memory: The `exec()` Family

Zombie Processes

When a process terminates, Linux must keep some information about it in the process table to allow the parent to retrieve the termination status. Once the parent retrieves this information using `wait()`, the kernel removes the process entry from the process table.

The `wait()` System Call

Subscribe to my newsletter

Aman Kumar

Aman Kumar

Understanding Linux Process Creation Internals: fork(), exec(), Zombie Processes, and the Role of wait()

Table of contents

Process Creation: using fork()

What Happens During a fork() Call?

Replacing Process Memory: The exec() Family

Zombie Processes

When a process terminates, Linux must keep some information about it in the process table to allow the parent to retrieve the termination status. Once the parent retrieves this information using wait(), the kernel removes the process entry from the process table.

The wait() System Call

Subscribe to my newsletter

Aman Kumar

Aman Kumar

What Happens During a `fork()` Call?

Replacing Process Memory: The `exec()` Family

When a process terminates, Linux must keep some information about it in the process table to allow the parent to retrieve the termination status. Once the parent retrieves this information using `wait()`, the kernel removes the process entry from the process table.

The `wait()` System Call