[2] Understanding Processes
Table of contents
- What is a Process
- Process Creation: A Little More Detail
- Process States
- Data Structures for Process Management in Operating Systems
- Limited Direct Execution: Balancing Efficiency and Control in CPU Virtualization
- Introducing User and Kernel Modes
- The Trap Table
- Limited Direct Execution Protocol
- Switching Between Processes
- The Two Types of Register Saves/Restores
- Worried About Concurrency?
In this article we will be Exploring definition of a process, Process Creation, State Management, Context Switching, and CPU Virtualization in Modern Operating Systems
What is a Process
A process, as defined by the operating system, is simply an abstraction of a running program.
The Process API (Application Programming Interface) refers to the set of functions and mechanisms provided by an operating system that allow users and programs to interact with and manage processes. These interfaces are crucial for creating, controlling, and monitoring processes within the OS.
1. Create: Initializes and starts a new process based on user commands or application requests.
2. Destroy: Forcefully terminates a process, especially if it’s unresponsive or behaving unexpectedly.
3. Wait: Pauses the execution of a program until a specific process has completed.
4. Miscellaneous Control: Provides options to suspend (pause) and resume (continue) a process.
5. Status: Retrieves information about a process, such as its runtime duration or current state.
Process Creation: A Little More Detail
how the operating system transforms a program into a running process.Specifically, how does the OS get a program up and running? How does process creation actually work?
Loading Code and Static Data:
The OS begins by loading the program’s executable code and any static data (like initialized variables) from disk into the process’s memory space.
In older or simpler OSes, this loading is done eagerly (all at once before execution). In modern OSes, it’s done lazily, loading only the necessary parts of the program as needed during execution.
Memory Allocation for the Stack:
The OS allocates memory for the program’s runtime stack. The stack is used for storing local variables, function parameters, and return addresses.
It also initializes the stack with arguments, such as argc and argv, which are passed to the
main()
function in C programs.
Memory Allocation for the Heap:
The OS may allocate memory for the program’s heap, which is used for dynamic memory allocation.
The heap starts small but can grow as the program requests more memory using functions like
malloc()
and free it explicitly by callingfree()
.
I/O Initialization:
- The OS sets up basic I/O for the process. For example, in UNIX systems, each process has default file descriptors for standard input, output, and error, which allow the process to interact with the terminal.
Starting Execution:
Once the OS has loaded the necessary code and data, allocated memory, and performed I/O setup, it begins program execution.
The OS transfers control to the process by jumping to the entry point of the program, typically the
main()
function, allowing the program to start running.
Process States
Processes can exist in three primary states, each reflecting their current status in relation to CPU execution and other system resources:
Running: The process is actively executing instructions on the CPU.
Ready: The process is prepared to run but is not currently executing due to OS scheduling decisions.
Blocked: The process cannot proceed because it is waiting for some event to occur, such as the completion of an I/O operation.
In the process state diagram, a process can transition between ready, running, and blocked states based on the operating system’s decisions. When a process is in the ready state, it is prepared to run but not yet scheduled. When moved to the running state, it is actively executing instructions. If the process initiates an I/O operation, it becomes blocked until the operation completes, at which point it returns to the ready state and may be scheduled to run again. For example, as shown in the next figure, if Process0 starts an I/O request and becomes blocked, Process1 can then run. Once Process0’s I/O completes, it is moved back to the ready state. The OS must make several decisions, such as whether to run Process1 while Process0 is blocked and how to handle Process0 when its I/O completes, to optimize CPU utilization and overall system efficiency.
Data Structures for Process Management in Operating Systems
The operating system (OS) employs key data structures to manage processes, tracking various aspects such as their states and resources. For each process, the OS maintains a process list that includes information on processes that are ready to run, the currently running process, and those that are blocked. When an I/O event occurs, the OS ensures that the correct blocked process is woken and moved back to the ready state. For example, in the xv6 kernel, the OS tracks the register context, which saves the contents of a process’s registers when it is stopped and restores them when the process resumes, a technique known as context switching. Additionally, processes can be in states beyond running, ready, and blocked. For instance, a process may be in an initial state during creation or a final state called the zombie state, where it has finished executing but has not yet been cleaned up. This zombie state allows the parent process to check the return code of the finished process. The parent process eventually calls a function like wait()
to finalize the cleanup and remove references to the now-terminated process.
Limited Direct Execution: Balancing Efficiency and Control in CPU Virtualization
Direct execution is a method where the operating system allows a user program to run directly on the CPU. This approach has a significant advantage: it is fast. Since the program runs natively on the hardware, it can execute instructions as quickly as possible.
When the OS starts a program, it creates a process, allocates memory, loads the program code, and begins execution at the program’s entry point, like the main()
function.
Sounds simple, no? But this approach gives rise to a few problems in our quest to virtualize the CPU. The first is simple: if we just run a program, how can the OS make sure the program doesn’t do anything that we don’t want it to do, while still running it efficiently? The second: when we are running a process, how does the operating system stop it from running and switch to another process, thus implementing the time sharing we require to virtualize the CPU?
Introducing User and Kernel Modes
To address these challenges, operating systems use two distinct processor modes:
User Mode: In this mode, user programs run with restricted privileges. They cannot perform certain operations, such as accessing hardware directly or performing I/O operations. If a program tries to do something it’s not allowed to in user mode, the CPU raises an exception (an error), and the operating system can decide how to handle it, usually by terminating the offending program.
Kernel Mode: This mode is used by the operating system itself. Code running in kernel mode has full access to the system’s hardware and can execute any instruction. The operating system switches the CPU to kernel mode when it needs to perform critical tasks that require higher privileges, such as managing memory or handling I/O operations.
While user mode restricts access to certain operations, user programs still need to perform actions like reading from a file or allocating memory. To safely allow these operations, the operating system provides system calls.
System Call Mechanism: When a program needs to perform a restricted operation, it makes a system call. A system call is a special request to the operating system to perform a specific operation on behalf of the program.
Trap Instruction: The program uses a special instruction, often called a “trap,” to make a system call. This instruction causes the CPU to switch from user mode to kernel mode, where the operating system can safely execute the requested operation.
Return from Trap: After the operation is completed, the CPU switches back to user mode and returns control to the user program.
The Trap Table
When a system call or other exceptional event (like a hardware interrupt) occurs, the operating system needs to know which code to execute in response. This is managed through a trap table:
Trap Table Setup: During the system’s boot process, the operating system sets up a trap table. This table is a list of memory addresses that tell the CPU where to find the appropriate code (trap handlers) for different types of events, such as system calls, hardware interrupts, or exceptions.
Trap Handling: When a system call is made, the CPU uses the trap table to find and execute the correct piece of code in kernel mode. For example, if a user program requests to read a file, the trap table directs the CPU to the operating system’s file read handler.
Limited Direct Execution Protocol
The concept of Limited Direct Execution combines the speed of direct execution with the control of user and kernel modes. Here’s how it works:
During boot, the operating system initializes the trap table and prepares the system to handle system calls and other events.When a user program runs, the operating system sets up the necessary resources and starts the program in user mode.When the program needs to perform a restricted operation, it makes a system call, causing the CPU to switch to kernel mode.The operating system handles the system call, performs the requested operation, and then returns control to the user program, switching the CPU back to user mode.
This protocol ensures that user programs can run efficiently while the operating system maintains control over critical operations, providing a balance between performance and security.
Without these mechanisms, the operating system would lose control over the system, leading to potential security breaches, system instability, and inefficiency. By implementing user and kernel modes, system calls, and trap tables, the operating system can safely and efficiently manage the execution of user programs while protecting the system from malicious or faulty operations.
Switching Between Processes
The next problem with direct execution is achieving a switch between processes, switching between processes seems straightforward: the operating system should simply stop one process and start another. However, this task is more complex than it appears. The difficulty lies in the fact that when a process is actively running on the CPU, the OS itself is not running. If the OS isn’t running, it cannot intervene to stop the current process or start a new one. This creates a paradoxical challenge: the OS must find a way to regain control of the CPU in order to manage process switching, despite not being active itself. Addressing this issue is central to enabling efficient process management within the OS.
The Cooperative Approach: Waiting for System Calls
Historically, some systems have relied on a cooperative approach to process management. In this method, the OS trusts the processes to behave reasonably and yield control periodically. For example, early systems like the Macintosh operating system and the Xerox Alto used this approach. Here, processes were expected to frequently give up the CPU by making system calls—requests to the OS for performing tasks like opening files, sending messages, or creating new processes. These system calls transferred control back to the OS, allowing it to decide whether to continue running the same process or switch to another.
Applications transfer control to the OS when they do something illegal. For example, if an application divides by zero, or tries to access memory that it shouldn’t be able to access, it will generate a trap to the OS. The OS will then have control of the CPU again (and likely terminate the offending process).
While this cooperative scheduling system allows the OS to regain control of the CPU, it has a significant flaw: what happens if a process never makes a system call or encounters an error? If a process enters an infinite loop, the OS has no way to regain control, leaving the system stuck unless rebooted. This limitation highlights the need for a more robust solution to ensure that the OS can manage the CPU even when processes are uncooperative.
The Non-Cooperative Approach: The OS Takes Control
To address the shortcomings of the cooperative approach, modern operating systems utilize a non-cooperative method where the OS takes control of the CPU without relying on processes to yield it. The key to this approach is the use of a timer interrupt. A timer device can be programmed to raise an interrupt at regular intervals (e.g., every few milliseconds). When the interrupt occurs, the currently running process is paused, and the OS regains control of the CPU through a pre-configured interrupt handler. This mechanism allows the OS to intervene, stop the current process, and potentially start a new one.
As we discussed before with system calls, the OS must inform the hardware of which code to run when the timer interrupt occurs. During the system’s boot sequence, the OS configures the hardware by setting up the timer and instructing it on which code to execute when the timer interrupt occurs. This is a privileged operation, meaning only the OS can perform it. Once the timer is running, the OS can be assured that it will periodically regain control of the CPU, even if a process is uncooperative.
The timer can also be turned off (also a privileged operation), something we will discuss later when we understand concurrency in more detail.
Context Switching : Saving and Restoring Process States
Once the OS has regained control—whether through a system call, an illegal operation, or a timer interrupt—it must decide whether to continue running the current process or switch to a different one. This decision is made by a component of the OS known as the scheduler.
If the OS decides to switch processes, it performs a context switch. This involves saving the state (context) of the currently running process and restoring the state of the next process to be executed. The context includes the contents of the CPU registers, the program counter (PC) (which indicates the next instruction to execute), and the kernel stack pointer.
Here’s how a context switch works:
Saving the Current Process’s Context: The OS saves the general-purpose registers, the program counter, and the kernel stack pointer of the currently running process. These are typically stored in the process’s structure in memory.
Restoring the Next Process’s Context: The OS restores the saved registers, program counter, and stack pointer of the next process to be executed. This ensures that when the OS returns control to the CPU, it resumes execution of the new process as if it had never stopped running.
Switching Stacks: The OS switches the kernel stack from the old process’s stack to the new process’s stack. This is crucial because the OS operates on the stack of the currently running process.
Returning to User Mode: Finally, the OS executes a return-from-trap instruction, which transitions the CPU back to user mode and resumes the execution of the new process.
The Two Types of Register Saves/Restores
During a context switch, there are two distinct types of register saves and restores:
Timer Interrupt Save/Restore: When a timer interrupt occurs, the hardware automatically saves the user registers of the running process onto its kernel stack. This allows the OS to safely regain control without losing the state of the interrupted process.
OS-Initiated Context Switch Save/Restore: When the OS decides to switch from one process to another, it explicitly saves the kernel registers of the current process into its process structure in memory. Then, it restores the registers of the new process from its process structure. This action effectively shifts the system’s context from the old process to the new one.
Worried About Concurrency?
As you delve deeper into how operating systems manage processes and interrupts, you might wonder: What happens if a timer interrupt occurs during a system call? Or, *What if an interrupt occurs while the kernel is already handling another interrupt?*These are insightful questions, and they touch on a critical aspect of operating system design: concurrency.
Indeed, the operating system must be carefully designed to handle such scenarios. Imagine that while processing an interrupt, another one arrives. If not managed properly, this could lead to a chaotic situation where multiple tasks try to access shared resources simultaneously, potentially causing data corruption or system crashes.
To give you a glimpse of how these challenges are addressed, let’s explore some basic strategies that operating systems employ:
Disabling Interrupts During Critical Sections: One straightforward method to handle multiple interrupts is to disable further interrupts while one is being processed. This ensures that the CPU is not overwhelmed by multiple interrupts at once. However, this approach must be used judiciously. If interrupts are disabled for too long, it could result in lost interrupts, meaning that some events requiring immediate attention are missed. This can lead to system instability or performance issues.
Sophisticated Locking Mechanisms: Modern operating systems implement advanced locking schemes to protect internal data structures during concurrent access. This is particularly important in systems with multiple processors, where different parts of the kernel may be executing simultaneously on different CPUs. These locking mechanisms help ensure that one part of the system doesn’t inadvertently overwrite or corrupt the data another part is using.
While these solutions are effective, they also introduce complexity. Managing concurrency within the kernel is not just about preventing conflicts but also about doing so efficiently. Poorly implemented locking can lead to performance bottlenecks or subtle, hard-to-diagnose bugs.
The topic of concurrency is vast and complex, and it’s a significant focus of study in operating systems. In the next part of this book, we will dive deeper into concurrency, exploring how operating systems handle these tricky situations in detail.
Subscribe to my newsletter
Read articles from Mostafa Youssef directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Mostafa Youssef
Mostafa Youssef
Software Engineer with a Bachelor’s in Computer Science. Competitive programmer with top 10% on leetcode.