COA-Computer Architecture and Organization

Table of contents
- 1. Primary Memory (Main Memory)
- 2. Secondary Memory (Storage)
- 3. Cache Memory
- 4. Registers (Inside CPU)
- 5. Virtual Memory
- 6. Buffer & Cache (I/O Memory)
- 1. RISC (Reduced Instruction Set Computer)
- 2. CISC (Complex Instruction Set Computer)
- Types of Addressing Modes
- Comparison Table
- Why Are Addressing Modes Important?
- Components of the CPU
- Working of CPU (Fetch-Decode-Execute Cycle)
- 1. Microprocessor (MPU)
- 2. Microcontroller (MCU)
- Comparison Table: Microprocessor vs. Microcontroller
- Types of Parallelism
- Measuring Degree of Parallelism
- Conclusion
- 3. Types of Pipeline Hazards
- 4. Types of Pipelines
- 5. Advantages of Pipelining
- 6. Disadvantages of Pipelining
- Conclusion
- 1. Mapping in Cache Memory
- 2. Mapping in Virtual Memory (Page Table Mapping)
- 3. Mapping in Memory Addressing
- Cache Memory in Detail
- Cache Miss and Its Types
1. Primary Memory (Main Memory)
These are directly accessible by the CPU and are volatile (lose data when power is off).
(a) RAM (Random Access Memory)
Volatile: Data is lost when power is off.
Fastest among main memory types.
Types of RAM:
SRAM (Static RAM) β Faster, used in cache memory, expensive.
DRAM (Dynamic RAM) β Slower than SRAM, used as main memory.
(b) ROM (Read-Only Memory)
Non-volatile: Data remains even after power loss.
Stores firmware, bootloader, BIOS, etc.
Types of ROM:
PROM (Programmable ROM) β Can be programmed once.
EPROM (Erasable PROM) β Can be erased using UV light.
EEPROM (Electrically Erasable PROM) β Can be erased electronically.
Flash Memory β Faster EEPROM, used in USBs, SSDs.
2. Secondary Memory (Storage)
Used for long-term data storage, non-volatile.
HDD (Hard Disk Drive)
SSD (Solid State Drive)
USB Flash Drive
Memory Cards (SD cards, microSD, etc.)
Optical Discs (CD, DVD, Blu-ray)
3. Cache Memory
Small, ultra-fast memory inside the CPU.
Stores frequently used instructions for quick access.
Levels of Cache:
L1 Cache (Primary Cache) β Fastest but smallest, located in the CPU core.
L2 Cache (Secondary Cache) β Slightly larger but slower than L1.
L3 Cache (Shared Cache) β Shared across CPU cores, larger but slower than L2.
4. Registers (Inside CPU)
Fastest memory in a system.
Used to store temporary data, instructions, and addresses.
Types of Registers:
Accumulator (Stores intermediate results)
Program Counter (Holds next instruction address)
Instruction Register (Stores the current instruction)
Stack Pointer (Points to stack memory)
5. Virtual Memory
Part of secondary storage used as temporary RAM.
Used when RAM is full (swap memory in OS).
Slower than physical RAM.
6. Buffer & Cache (I/O Memory)
Buffer: Temporary storage for I/O operations (e.g., keyboard buffer).
Disk Cache: Stores frequently accessed disk data to speed up operations.
Summary Table:
Memory Type | Volatility | Speed | Size | Purpose |
Registers | Volatile | Fastest | Few Bytes | CPU internal operations |
Cache (L1, L2, L3) | Volatile | Very Fast | KBs - MBs | CPU instruction storage |
RAM (SRAM, DRAM) | Volatile | Fast | GBs | Main system memory |
ROM (EEPROM, Flash, etc.) | Non-Volatile | Slow | MBs - GBs | Firmware, BIOS |
Virtual Memory | Volatile | Slow | Varies | Expands RAM using disk |
HDD/SSD (Storage) | Non-Volatile | Slowest | TBs | Long-term data storage |
Swap Memory (Swap Space)
Swap memory (or swap space) is a portion of the hard drive or SSD that is used as virtual memory when the systemβs RAM is full.
How Swap Memory Works
When RAM is full, the operating system moves some inactive data from RAM to the swap space.
This frees up RAM for active tasks.
If the data in swap space is needed again, the OS swaps it back into RAM.
Why is Swap Space Used?
Prevents system crashes when RAM runs out.
Allows running large applications that exceed available RAM.
Helps in multitasking by storing less-used processes.
Types of Swap Memory
Swap Partition β A dedicated partition on the hard drive for swap.
Swap File β A special file on the hard drive used for swap (more flexible).
Disadvantages of Swap Space
Slower than RAM (because SSD/HDD speeds are much lower than RAM).
Excessive swapping (thrashing) can slow down the system significantly.
How Much Swap Space is Needed?
< 4GB RAM β Swap = 2x RAM
4GB - 8GB RAM β Swap = Same as RAM
> 8GB RAM β Swap = Optional, but 2-4GB is recommended for hibernation.
RISC (Reduced Instruction Set Computer) and CISC (Complex Instruction Set Computer)
RISC and CISC are two different types of CPU architectures used in computer processors. They define how instructions are designed and executed by the CPU.
1. RISC (Reduced Instruction Set Computer)
RISC is a simplified CPU architecture that uses a small set of simple instructions. The goal is to make execution fast and efficient by processing instructions in a single clock cycle.
Characteristics of RISC:
Simple Instructions β Each instruction performs only a single operation.
Fixed Instruction Length β Makes it easier to pipeline instructions.
More Registers β Reduces memory access and improves speed.
Load/Store Architecture β Data is loaded into registers first before being processed.
Efficient Pipelining β Due to uniform instruction size and simpler operations.
Examples of RISC Processors:
ARM (used in smartphones and embedded systems)
MIPS
PowerPC
RISC-V
2. CISC (Complex Instruction Set Computer)
CISC is a more complex CPU architecture that supports a wide range of instructions, including multi-step operations. The goal is to reduce the number of instructions needed to complete a task, even if they take multiple clock cycles.
Characteristics of CISC:
Complex Instructions β A single instruction can perform multiple operations (e.g., fetching, decoding, executing).
Variable Instruction Length β Some instructions take more bits than others.
Fewer Registers β Relies more on accessing memory.
Memory-to-Memory Operations β Instructions can directly manipulate data in memory.
Difficult to Pipeline β Because of variable instruction sizes.
Examples of CISC Processors:
Intel x86 (used in most desktops and laptops)
AMD processors
IBM System/360
Key Differences Between RISC and CISC
Feature | RISC (Reduced Instruction Set Computer) | CISC (Complex Instruction Set Computer) |
Instruction Set | Small, simple instructions | Large, complex instructions |
Execution Speed | Faster (one instruction per cycle) | Slower (multiple cycles per instruction) |
Memory Access | Load/Store architecture | Memory-to-memory operations |
Registers | More registers | Fewer registers, more memory usage |
Code Size | Larger (more instructions needed) | Smaller (one instruction does more work) |
Pipelining | Easier to implement | More difficult to implement |
Power Consumption | Lower (efficient execution) | Higher (more complex processing) |
Modern CPUs: A Hybrid Approach
Today, most processors use a mix of RISC and CISC.
For example, Intel and AMD processors use CISC architecture (x86) but internally translate instructions into RISC-like micro-operations for faster execution.
Which is Better?
RISC is better for mobile devices, embedded systems, and low-power applications (e.g., smartphones, IoT).
CISC is better for high-performance computing, desktops, and software compatibility (e.g., Windows PCs, gaming laptops).
Both architectures have their advantages, and modern CPUs blend elements from both to optimize performance.
Addressing Mode in Computer Architecture
Addressing mode defines how an operand (data) is specified in an instruction. It determines where the CPU should fetch data from (register, memory, or immediate value) when executing instructions.
Types of Addressing Modes
Different CPUs use various addressing modes based on their architecture (RISC or CISC). Below are the common types:
1. Immediate Addressing Mode
The operand (data) is directly given in the instruction.
Fastest because no memory access is needed.
Example:
assemblyCopyEditMOV A, #10 ; Load value 10 into register A
Here,
#10
is the immediate value.
2. Register Addressing Mode
The operand is stored in a register inside the CPU.
Faster than memory access since registers are inside the processor.
Example:
assemblyCopyEditMOV A, B ; Copy the value from register B to register A
Here, both operands are registers.
3. Direct Addressing Mode
The instruction contains the memory address where the data is stored.
Example:
assemblyCopyEditMOV A, 2000H ; Load data from memory address 2000H into A
Here,
2000H
is a memory location.
4. Indirect Addressing Mode
The instruction contains a register that holds the address of the operand in memory.
Used for accessing large data structures like arrays.
Example:
assemblyCopyEditMOV A, (R1) ; Load data from the memory address stored in R1 into A
Here, R1 holds the memory address, not the actual data.
5. Indexed Addressing Mode
The operand's address is calculated using a base address + an index register.
Useful for accessing arrays and tables.
Example:
assemblyCopyEditMOV A, (Base + Index) ; Load data from (Base address + Index offset)
If Base = 1000H and Index = 05H, then data is loaded from 1005H.
6. Register Indirect Addressing Mode
Similar to indirect addressing, but only registers are used to store addresses.
Example:
assemblyCopyEditMOV A, [BX] ; Load data from the memory address stored in BX
Here, BX contains the memory location of the operand.
7. Relative Addressing Mode
The operand's address is given as an offset relative to the current instruction address.
Used in branching and looping instructions.
Example:
assemblyCopyEditJMP +5 ; Jump 5 bytes ahead from the current instruction
Useful for loops, conditional jumps, and function calls.
Comparison Table
Addressing Mode | How it Works | Speed | Example |
Immediate | Operand is inside instruction | Fastest | MOV A, #10 |
Register | Operand is in a register | Fast | MOV A, B |
Direct | Operand is at a memory address | Medium | MOV A, 2000H |
Indirect | Register holds memory address | Medium | MOV A, (R1) |
Indexed | Uses base + index for address | Slow | MOV A, (Base+Index) |
Register Indirect | Register holds address of operand | Medium | MOV A, [BX] |
Relative | Address is relative to current instruction | Slow | JMP +5 |
Why Are Addressing Modes Important?
Efficient Memory Usage: Helps optimize memory access.
Faster Execution: Reducing memory accesses speeds up processing.
Code Optimization: Allows writing shorter and more flexible instructions.
Supports Complex Data Structures: Enables handling arrays, stacks, and loops efficiently.
Different CPU architectures (RISC & CISC) use these addressing modes based on their design.
What is a CPU?
The Central Processing Unit (CPU) is the primary component of a computer that carries out instructions from programs by performing arithmetic, logical, control, and input/output operations. It is often called the "brain" of the computer.
Components of the CPU
1. Arithmetic Logic Unit (ALU)
The ALU is responsible for performing arithmetic (addition, subtraction, multiplication, and division) and logical operations (AND, OR, NOT, XOR).
It processes data based on instructions provided by the control unit.
2. Control Unit (CU)
The Control Unit directs operations within the CPU and coordinates communication between different components.
It fetches, decodes, and executes instructions from memory.
It controls the flow of data between the CPU, memory, and I/O devices.
3. Registers
Registers are small, high-speed storage units inside the CPU that temporarily hold data and instructions.
Common types of registers include:
Accumulator (ACC): Stores intermediate arithmetic and logic operation results.
Program Counter (PC): Keeps track of the next instruction's address.
Instruction Register (IR): Holds the current instruction being executed.
Memory Address Register (MAR): Stores the address of the memory location to be accessed.
Memory Data Register (MDR): Holds data that is being transferred to or from memory.
Stack Pointer (SP): Points to the top of the stack in memory.
4. Cache Memory
Cache Memory is a small-sized, high-speed memory located close to the CPU.
It stores frequently accessed data and instructions to reduce memory access time.
It has multiple levels:
L1 (Level 1) Cache: Fastest but smallest, located inside the CPU core.
L2 (Level 2) Cache: Larger but slower than L1, may be inside or outside the CPU.
L3 (Level 3) Cache: Shared among multiple cores, larger and slower than L2.
5. Buses
Buses are communication pathways that transfer data between components of the CPU and other hardware.
There are three main types:
Data Bus: Transfers actual data.
Address Bus: Carries memory addresses.
Control Bus: Sends control signals between CPU and other components.
6. Clock (System Clock)
The clock synchronizes the operations of the CPU by generating electrical pulses at regular intervals.
The speed is measured in Hertz (Hz) (e.g., 3.5 GHz means 3.5 billion cycles per second).
7. Execution Unit
- Some modern CPUs include an Execution Unit with multiple ALUs and floating-point units (FPU) for faster processing.
Working of CPU (Fetch-Decode-Execute Cycle)
Fetch: The CPU fetches the instruction from memory (RAM) using the Program Counter (PC).
Decode: The Control Unit deciphers the instruction.
Execute: The instruction is processed by the ALU or other components.
Store: The result is stored in a register or memory.
Conclusion
The CPU is an essential part of a computer, handling all processing tasks. With advancements in technology, modern CPUs include multiple cores, hyper-threading, and advanced cache mechanisms for improved performance.
Microprocessor vs. Microcontroller
Both microprocessors (MPUs) and microcontrollers (MCUs) are integrated circuits used in electronic systems, but they serve different purposes.
1. Microprocessor (MPU)
A microprocessor is the central unit of a computer system that performs computation and processing but requires external components to function fully.
Key Features:
CPU only: Contains only the Central Processing Unit (CPU); external components (RAM, ROM, I/O ports, timers) must be connected separately.
Powerful Processing: Used in general-purpose computing tasks, capable of handling complex operations.
High-Speed Execution: Can run at high clock speeds, making it suitable for high-performance applications.
No Built-in Memory or Peripherals: Requires external RAM, ROM, and I/O devices for operation.
Flexible but Costly: More flexible due to modular design but increases system cost and complexity.
Examples of Microprocessors:
Intel: 8086, Pentium, Core i3/i5/i7/i9
AMD: Ryzen, Athlon
ARM Processors (used in mobile phones)
Applications of Microprocessors:
Computers & Laptops (Intel, AMD processors)
Smartphones (ARM-based processors)
High-end Embedded Systems (Industrial automation, servers)
2. Microcontroller (MCU)
A microcontroller is a compact integrated circuit that includes a CPU, memory (RAM/ROM), and I/O peripherals on a single chip. It is designed for specific embedded applications.
Key Features:
All-in-One System: Contains CPU, RAM, ROM (Flash Memory), I/O ports, and timers on a single chip.
Power-Efficient: Operates at lower power compared to microprocessors, making it ideal for battery-powered applications.
Limited Processing Power: Optimized for specific control-oriented applications rather than general-purpose computing.
Cost-Effective: Since everything is integrated, the total cost is lower than a microprocessor-based system.
Examples of Microcontrollers:
8-bit: Intel 8051, Atmel ATmega328 (used in Arduino)
16-bit: MSP430 (Texas Instruments)
32-bit: ARM Cortex-M series (STM32, ESP32, Raspberry Pi Pico)
Applications of Microcontrollers:
Embedded Systems (IoT devices, smart home systems)
Appliances (Microwaves, washing machines)
Automotive Systems (Engine control, ABS braking)
Medical Devices (Pacemakers, glucose monitors)
Comparison Table: Microprocessor vs. Microcontroller
Feature | Microprocessor (MPU) | Microcontroller (MCU) |
Components | Only CPU, needs external RAM, ROM, I/O | CPU, RAM, ROM, I/O built-in |
Processing Power | High | Moderate |
Power Consumption | High (not power efficient) | Low (optimized for battery use) |
Speed | Fast (GHz range) | Slower (MHz range) |
Cost | Expensive (due to external components) | Cost-effective (integrated design) |
Application | Computers, servers, smartphones | Embedded systems, IoT devices, home appliances |
Summary
A microprocessor is a powerful, general-purpose processing unit that requires external components for complete functionality.
A microcontroller is an all-in-one system on a chip (SoC) designed for specific embedded applications, integrating CPU, memory, and peripherals.
Degree of Parallelism in Computer Organization and Architecture (COA)
The degree of parallelism refers to the number of operations or tasks that can be executed simultaneously in a computing system. It is a key factor in determining the performance and efficiency of processors, especially in parallel computing and pipelined architectures.
Types of Parallelism
1. Instruction-Level Parallelism (ILP)
Refers to the ability of a processor to execute multiple instructions simultaneously.
Achieved through techniques like pipelining, superscalar execution, and out-of-order execution.
Measured by the number of instructions executed per cycle (IPC).
πΉ Example:
A 5-stage pipeline (Fetch, Decode, Execute, Memory, Write-back) allows 5 instructions to be in different stages at the same time, increasing ILP.
2. Data-Level Parallelism (DLP)
Involves processing multiple data elements simultaneously using a single instruction.
Used in vector processors, SIMD (Single Instruction Multiple Data), and GPUs.
πΉ Example:
A SIMD processor (like in GPUs) can apply the same operation (e.g., addition) to multiple data points at once.
3. Task-Level Parallelism (TLP)
Focuses on executing different tasks (processes or threads) in parallel.
Used in multithreading and multiprocessing.
Requires multi-core processors or hyper-threading to work efficiently.
πΉ Example:
A quad-core CPU can run 4 independent threads at the same time, increasing the degree of task parallelism.
4. Memory-Level Parallelism (MLP)
Describes how many memory operations can be performed in parallel.
Achieved using caches, memory interleaving, and multiple memory banks.
πΉ Example:
A system with dual-channel memory can access two memory modules simultaneously, increasing throughput.
Measuring Degree of Parallelism
The degree of parallelism is often quantified by:
Pipeline depth (n-stage pipeline β n-degree ILP).
Number of functional units (multiple ALUs β increased ILP/DLP).
Number of threads or cores (TLP).
Memory bandwidth (number of simultaneous memory accesses).
Conclusion
The higher the degree of parallelism, the better the performance, as more tasks are executed simultaneously. However, achieving high parallelism requires efficient hardware support, optimized software, and proper workload balancing.
Pipeline in Computer Organization & Architecture
What is Pipelining?
Pipelining is a technique used in modern processors to improve instruction throughput by overlapping the execution of multiple instructions. Instead of executing one instruction at a time (sequential execution), pipelining breaks down instruction execution into smaller stages and processes multiple instructions simultaneously.
A basic RISC pipeline has the following stages:
Fetch (IF) β Fetch the instruction from memory.
Decode (ID) β Decode the instruction and identify required registers.
Execute (EX) β Perform the arithmetic/logic operation using the ALU.
Memory (MEM) β Access memory if needed (load/store instructions).
Write Back (WB) β Store the result back into a register.
3. Types of Pipeline Hazards
Although pipelining increases efficiency, it also introduces problems known as hazards:
1. Structural Hazards
Occur when hardware resources (memory, ALU, registers) are not sufficient for parallel execution.
Example: If the processor has a single memory unit, it cannot fetch and write data at the same time.
Solution: Use separate instruction and data memory (Harvard architecture).
2. Data Hazards
Occur when an instruction depends on the result of a previous instruction that has not yet completed.
Example:
assemblyCopyEditADD R1, R2, R3 ; R1 = R2 + R3 SUB R4, R1, R5 ; R4 = R1 - R5 (R1 is not ready yet!)
Solution:
Data Forwarding: Forward result from EX/MEM stage to a later instruction.
Stalling: Insert NOP (No Operation) until data is available.
3. Control Hazards
Occur due to branching instructions (e.g., IF-ELSE, loops). The CPU doesnβt know which instruction to fetch next until the branch is resolved.
Example:
assemblyCopyEditBEQ R1, R2, LABEL ; Branch if R1 == R2
Solution:
Branch Prediction: Predict the outcome of the branch and continue execution.
Delayed Branching: Execute some instructions before the actual branch decision.
4. Types of Pipelines
1. Arithmetic Pipeline
Used in floating-point operations and complex calculations.
Example: Computing A Γ (B + C) using separate addition and multiplication units.
2. Instruction Pipeline
Used in CPU instruction execution (Fetch β Decode β Execute β Memory β Write Back).
Most modern CPUs use instruction pipelining.
3. Superpipelining
- Uses more pipeline stages (e.g., 10-stage pipeline instead of 5-stage) to increase instruction throughput.
4. Superscalar Pipelining
Executes multiple instructions per cycle by having multiple execution units (e.g., multiple ALUs, FPUs).
Example: Modern Intel & AMD processors use superscalar execution.
5. Advantages of Pipelining
β
Increases instruction throughput (more instructions executed per unit time).
β
Efficient CPU utilization (reduces idle time of hardware components).
β
Improves overall system performance (better than sequential execution).
6. Disadvantages of Pipelining
β Increased complexity (requires extra logic for hazard handling).
β Not all instructions benefit equally (e.g., branch instructions cause stalls).
β Resource conflicts (if not enough execution units are available).
Conclusion
Pipelining is a fundamental technique in modern processor design that significantly improves performance by overlapping instruction execution. However, it introduces hazards that must be managed through techniques like data forwarding, branch prediction, and pipeline stalls.
Mapping in Computer Architecture
Mapping in Computer Organization & Architecture (COA) refers to how data (such as memory locations, cache blocks, or virtual addresses) is organized, stored, and accessed efficiently. It is commonly used in cache memory, virtual memory, and memory addressing
1. Mapping in Cache Memory
Cache memory is a small, fast memory located between the main memory (RAM) and the CPU to speed up data access. Since the cache is smaller than RAM, mapping techniques are used to decide where a memory block should be placed in the cache.
Types of Cache Mapping Techniques:
1. Direct Mapping
Each main memory block is mapped to exactly one cache block.
Simple and fast but has a high chance of cache conflicts.
πΉ Example:
If cache size = 8 blocks, a memory block X is placed in cache block (X mod 8).
π Pros: Simple, low hardware cost.
π Cons: High conflict rate (frequent replacement).
2. Fully Associative Mapping
Any memory block can be stored in any cache block.
Uses search algorithms (e.g., Least Recently Used - LRU) to find the best place.
π Pros: Eliminates cache conflicts.
π Cons: Expensive (requires complex hardware).
3. Set-Associative Mapping
A compromise between direct mapping and fully associative mapping.
The cache is divided into sets, and a memory block can be placed in any block of a specific set.
Example: 2-way set-associative cache β each set contains 2 blocks.
π Pros: Reduces conflict, balances cost and performance.
π Cons: More complex than direct mapping.
2. Mapping in Virtual Memory (Page Table Mapping)
Virtual memory allows programs to use more memory than physically available by mapping virtual addresses to physical addresses. This is done using a page table.
πΉ Paging: Virtual memory is divided into pages, and physical memory is divided into frames. Each virtual page is mapped to a physical frame in RAM.
π Types of Page Table Mapping:
Direct Mapping (One-level page table): Simple, but large for big memory.
Multilevel Page Table: Uses multiple levels to save memory space.
Inverted Page Table: Reduces memory overhead by storing only active pages.
3. Mapping in Memory Addressing
1. Memory Interleaving (Mapping Multiple Memory Banks)
Used in high-speed computers to allow parallel access to memory.
Divides memory into multiple banks and accesses them simultaneously.
Example: Instead of accessing one word at a time, four memory banks allow fetching four words at once.
2. I/O Address Mapping
Memory-Mapped I/O: I/O devices share the same address space as memory.
Isolated I/O: I/O devices have a separate address space.
Random Access Memory (RAM) β Types and Working
What is RAM?
Random Access Memory (RAM) is a type of volatile memory that temporarily stores data and instructions that the CPU needs while performing tasks. Unlike storage devices (HDDs, SSDs), RAM is much faster but loses data when power is turned off.
Types of RAM
RAM can be classified into two main types:
1. SRAM (Static RAM)
Stores data using flip-flops, which do not need to be refreshed constantly.
Faster and more reliable than DRAM.
Used in cache memory (L1, L2, L3) for processors.
Expensive and consumes more power.
2. DRAM (Dynamic RAM)
Stores data using capacitors and transistors, requiring periodic refreshing.
Slower but cheaper and has higher storage capacity than SRAM.
Used as main memory in computers.
Types of DRAM:
SDRAM (Synchronous DRAM) β Works in sync with the CPU clock for faster performance.
DDR SDRAM (Double Data Rate SDRAM) β Transfers data on both rising and falling edges of the clock.
- DDR1, DDR2, DDR3, DDR4, DDR5 β Each generation improves speed and efficiency.
GDDR (Graphics DDR) β Optimized for GPUs, used in gaming and high-performance computing.
How RAM Works?
When you open a program, the CPU fetches data from storage (SSD/HDD) and loads it into RAM.
The CPU reads/writes data in RAM much faster than it would from storage.
As you work, RAM continuously updates with new instructions and data.
When you close the program or shut down the computer, the data in RAM is lost.
Cache Memory in Detail
1. What is Cache Memory?
Cache memory is a small, high-speed memory located closer to the CPU than RAM. It stores frequently accessed data and instructions to reduce the time required to fetch them from main memory (RAM).
Speed: Much faster than RAM but smaller in size.
Purpose: Reduces the CPU's waiting time by providing quick access to frequently used data.
Location: Inside the CPU or between the CPU and RAM.
Why Do We Need Cache Memory?
The CPU operates much faster than RAM.
If the CPU waits for data from RAM every time, the system will slow down.
Cache stores recently used data, reducing delays and improving efficiency.
2. Cache Memory Hierarchy (Levels of Cache)
Modern processors have multiple cache levels to balance speed and size.
Cache Level | Location | Speed | Size | Purpose |
L1 Cache | Inside CPU core | Fastest | Small (32KB β 128KB) | Stores most frequently used data & instructions. |
L2 Cache | Inside or near CPU | Slower than L1 but faster than RAM | Medium (256KB β 8MB) | Acts as a backup for L1. |
L3 Cache | Shared among cores | Slower than L2 | Large (4MB β 64MB) | Reduces access to RAM. |
L4 Cache (if present) | Outside CPU (on motherboard) | Slowest cache | Very large (up to 128MB) | Used in high-end processors. |
π Example:
A Core i7 processor may have:
L1 = 64KB per core
L2 = 512KB per core
L3 = 12MB shared among all cores
3. Cache Mapping Techniques
Cache memory is much smaller than RAM, so we need efficient mapping techniques to determine where memory blocks will be placed in the cache.
1. Direct Mapping
Each block in main memory is mapped to one fixed cache block.
Uses the formula: Cache Block Index=(Main Memory Block Address)modββ(Number of Cache Blocks)\text{Cache Block Index} = (\text{Main Memory Block Address}) \mod (\text{Number of Cache Blocks})Cache Block Index=(Main Memory Block Address)mod(Number of Cache Blocks)
Simple and fast but suffers from cache conflicts.
π Example:
If memory block 5 and 13 map to the same cache block, one will overwrite the other.
2. Fully Associative Mapping
Any memory block can be stored in any cache block.
Requires complex hardware for searching the cache.
Reduces conflict but is costly.
π Example:
If block 5 is needed, the CPU searches all cache blocks and places it in any available block.
3. Set-Associative Mapping
A compromise between direct mapping and fully associative mapping.
Cache is divided into sets, and each set has multiple blocks.
A memory block can be placed in any block of a specific set.
π Example:
2-way set-associative β Each set has 2 blocks.
4-way set-associative β Each set has 4 blocks.
β Balances performance & cost.
4. Cache Replacement Policies
When the cache is full, a replacement policy decides which block to remove.
1. Least Recently Used (LRU)
The oldest used block is replaced.
Most common method.
2. First-In, First-Out (FIFO)
The oldest inserted block is removed.
Simple but may replace frequently used blocks.
3. Random Replacement
Removes a random block.
Used when no clear pattern exists.
5. Cache Performance: Hit & Miss
1. Cache Hit
When the required data is found in the cache.
Fast access β CPU efficiency increases.
2. Cache Miss
When the required data is not found in the cache.
The data must be fetched from RAM, slowing down the CPU.
π Cache Hit Ratio:
Hit Ratio=Cache HitsTotal Cache Accesses\text{Hit Ratio} = \frac{\text{Cache Hits}}{\text{Total Cache Accesses}}Hit Ratio=Total Cache AccessesCache Hitsβ
A higher hit ratio means better performance.
6. Cache Write Policies
When the CPU modifies data, it must update both the cache and main memory.
1. Write-Through
Updates both cache and main memory immediately.
Ensures consistency but is slower.
2. Write-Back
Updates only the cache first.
Writes to main memory only when needed.
Faster but requires extra logic.
7. Advantages of Cache Memory
β
Increases CPU speed by reducing memory access time.
β
Improves system performance by reducing dependency on RAM.
β
Minimizes CPU waiting time and enhances multitasking.
8. Disadvantages of Cache Memory
β Expensive compared to RAM.
β Limited size (canβt store all data).
β Complex management (cache mapping, replacement policies).
9. Summary
Feature | Cache Memory | RAM |
Speed | Very Fast | Slower |
Size | Small (KB to MB) | Large (GB) |
Location | Inside CPU or near CPU | External to CPU |
Purpose | Stores frequently used data | Stores all running processes |
Cost | Expensive | Cheaper |
Conclusion
Cache memory boosts CPU performance by reducing the time needed to access frequently used data. Different mapping techniques, replacement policies, and write policies ensure efficiency.
Cache Miss and Its Types
1. What is a Cache Miss?
A cache miss occurs when the CPU requests data that is not found in the cache, forcing it to fetch the data from main memory (RAM). This results in increased memory access time, slowing down performance.
Cache Performance Formula
Hit Ratio=Number of Cache HitsTotal Cache Accesses\text{Hit Ratio} = \frac{\text{Number of Cache Hits}}{\text{Total Cache Accesses}}Hit Ratio=Total Cache AccessesNumber of Cache HitsβMiss Ratio=1βHit Ratio\text{Miss Ratio} = 1 - \text{Hit Ratio}Miss Ratio=1βHit Ratio
A higher miss ratio means more time is spent accessing RAM, reducing performance.
2. Types of Cache Misses (Three Cβs Model)
Cache misses are classified into three main categories:
1. Compulsory Miss (Cold Start Miss)
Occurs when data is accessed for the first time and is not yet in the cache.
Happens even with an empty cache because the data has never been loaded before.
π Example:
- If a program starts and accesses memory block 100 for the first time, it will result in a compulsory miss.
β How to Reduce It?
Use prefetching (load data before itβs needed).
Increase block size (bring more data in one go).
2. Conflict Miss (Collision Miss)
Happens when multiple memory blocks map to the same cache block, causing conflicts in cache.
Common in Direct Mapping and Set-Associative Mapping.
π Example:
- If Block 5 and Block 13 map to the same cache location, loading Block 13 will replace Block 5, causing a miss when Block 5 is needed again.
β How to Reduce It?
Use Fully Associative Mapping (any block can go anywhere).
Increase the number of cache sets (higher associativity).
3. Capacity Miss
Occurs when the cache size is too small to hold all the needed data, forcing frequent replacements.
Even if associativity is high, if the cache is full, old data must be replaced, leading to misses.
π Example:
- A program working with large data arrays (bigger than the cache size) will frequently evict old blocks.
β How to Reduce It?
Increase cache size.
Optimize software to reuse data efficiently.
3. Additional Types of Misses
Some modern architectures also consider:
4. Coherence Miss (in Multi-Core CPUs)
Happens in multi-processor systems when one core updates data that another core has in its cache.
The second core must invalidate or reload its cache copy.
Managed by cache coherence protocols like MESI (Modified, Exclusive, Shared, Invalid).
β How to Reduce It?
Use efficient cache coherence protocols.
Minimize frequent memory updates across multiple cores.
4. Summary of Cache Miss Types
Miss Type | Cause | Solution |
Compulsory Miss | First-time access | Prefetching, larger block size |
Conflict Miss | Multiple blocks map to the same location | Higher associativity, fully associative mapping |
Capacity Miss | Cache is too small | Increase cache size, optimize data reuse |
Coherence Miss | Multi-core inconsistency | Cache coherence protocols |
Conclusion
Cache misses reduce CPU performance by forcing memory access from RAM. Optimizing cache size, mapping strategies, and prefetching techniques can significantly improve efficiency.
Cache Hit and Cache Miss
1. Cache Hit
A cache hit occurs when the CPU requests data, and the required data is found in the cache. Since the cache is much faster than RAM, a cache hit results in quick data access, improving performance.
πΉ Example:
Suppose data from memory block 100 is stored in the cache.
When the CPU requests data from block 100, it finds it in the cache.
β Result: Fast access, no need to fetch from RAM.
π Formula for Hit Ratio:
Hit Ratio=Number of Cache HitsTotal Cache Accesses\text{Hit Ratio} = \frac{\text{Number of Cache Hits}}{\text{Total Cache Accesses}}Hit Ratio=Total Cache AccessesNumber of Cache Hitsβ
A higher hit ratio means better performance.
2. Cache Miss
A cache miss occurs when the CPU requests data, but the required data is not found in the cache. The CPU must then fetch the data from RAM, which is slower.
πΉ Example:
Suppose the CPU requests data from memory block 200, but it is not in the cache.
The CPU fetches it from RAM and stores it in the cache for future use.
β Result: Slow access due to fetching from RAM.
π Formula for Miss Ratio:
Miss Ratio=1βHit Ratio\text{Miss Ratio} = 1 - \text{Hit Ratio}Miss Ratio=1βHit Ratio
A higher miss ratio means poor performance.
π RAM & Its Types
1οΈβ£ Which type of RAM needs to be constantly refreshed?
β
Answer: (b) DRAM
π‘ Explanation: DRAM (Dynamic RAM) needs periodic refreshing because it stores data using capacitors, which lose charge over time.
2οΈβ£ What is the primary difference between SRAM and DRAM?
β
Answer: SRAM (Static RAM) is faster and doesnβt require refreshing, whereas DRAM (Dynamic RAM) is slower and needs constant refreshing.
π‘ Explanation: SRAM stores data using flip-flops, making it more expensive but faster. DRAM uses capacitors, making it cheaper but requiring refresh cycles.
3οΈβ£ Which type of RAM is used in cache memory due to its high speed?
β
Answer: SRAM
π‘ Explanation: Cache memory needs very fast access, and SRAM is faster than DRAM, making it the preferred choice.
4οΈβ£ What is the role of RAM in computer performance?
β
Answer: RAM stores data and instructions temporarily, allowing faster access than storage devices (HDD/SSD), improving system speed.
5οΈβ£ Which memory type is volatile?
β
Answer: (d) Both b and c (SRAM and DRAM)
π‘ Explanation: Volatile memory loses data when power is turned off. Both SRAM and DRAM require power to retain data, while ROM is non-volatile.
π Mapping Techniques in Cache Memory
6οΈβ£ Which of the following is NOT a cache mapping technique?
β
Answer: (c) Hybrid Mapping
π‘ Explanation: The three main cache mapping techniques are Direct Mapping, Fully Associative Mapping, and Set-Associative Mapping.
7οΈβ£ In direct mapping, where is a main memory block stored in cache?
β
Answer: (b) A specific block determined by an index
π‘ Explanation: In Direct Mapping, each main memory block maps to only one cache block using the formula:
Cache Block Index=(Memory Block Address)modββ(Number of Cache Blocks)\text{Cache Block Index} = (\text{Memory Block Address}) \mod (\text{Number of Cache Blocks})Cache Block Index=(Memory Block Address)mod(Number of Cache Blocks)
8οΈβ£ How does set-associative mapping differ from direct mapping?
β
Answer: Set-Associative Mapping allows a memory block to be stored in any block within a set, reducing conflict misses compared to Direct Mapping.
9οΈβ£ Which mapping technique minimizes conflict misses the most?
β
Answer: Fully Associative Mapping
π‘ Explanation: Since a memory block can be stored anywhere in the cache, Fully Associative Mapping avoids conflict misses.
π Why is fully associative mapping less common despite reducing conflict misses?
β
Answer: It requires complex hardware for searching all cache blocks, making it expensive and slower for large caches.
π Cache Memory
1οΈβ£1οΈβ£ Which level of cache is shared among multiple CPU cores?
β
Answer: (c) L3 Cache
π‘ Explanation: L1 and L2 are core-specific, while L3 is shared among multiple cores, improving communication between them.
1οΈβ£2οΈβ£ What is the main purpose of cache memory?
β
Answer: Cache memory speeds up CPU operations by storing frequently accessed data, reducing the need to fetch from RAM.
1οΈβ£3οΈβ£ What does the term "cache hit" mean?
β
Answer: A cache hit occurs when the CPU finds the requested data in the cache, resulting in fast access.
1οΈβ£4οΈβ£ Which cache write policy updates both the cache and main memory simultaneously?
β
Answer: (a) Write-Through
π‘ Explanation: Write-Through ensures data consistency but is slower. Write-Back updates RAM only when necessary, improving performance.
1οΈβ£5οΈβ£ Why is cache memory faster than RAM?
β
Answer: Cache is built using SRAM, which has faster access time than DRAM used in RAM.
π Types of Misses in Cache
1οΈβ£6οΈβ£ Which type of cache miss occurs even when the cache is empty?
β
Answer: (a) Compulsory Miss
π‘ Explanation: Compulsory (or Cold Start) Miss happens when data is accessed for the first time and isnβt in cache yet.
1οΈβ£7οΈβ£ How can capacity misses be reduced?
β
Answer: Increase cache size or optimize data reuse to fit within cache capacity.
1οΈβ£8οΈβ£ What is a conflict miss, and in which mapping technique is it most common?
β
Answer: A conflict miss happens when multiple memory blocks map to the same cache block, causing unnecessary evictions. It is most common in Direct Mapping.
1οΈβ£9οΈβ£ Why do coherence misses occur in multi-core processors?
β
Answer: When one CPU core updates a shared memory block, other cores must invalidate or update their cached copies to maintain consistency.
2οΈβ£0οΈβ£ What is the formula for hit ratio?
β
Answer:
Hit Ratio=Cache HitsTotal Cache Accesses\text{Hit Ratio} = \frac{\text{Cache Hits}}{\text{Total Cache Accesses}}Hit Ratio=Total Cache AccessesCache Hitsβ
π‘ Explanation: A higher hit ratio means better cache performance, reducing delays in fetching data from RAM.
β¨ Bonus Questions for Discussion
How does increasing cache size affect performance?
Why does Direct Mapping suffer from high conflict misses?
How do modern CPUs balance between L1, L2, and L3 caches?
How does Write-Back policy improve efficiency compared to Write-Through?
What is the impact of cache coherence protocols in multi-core systems?
Subscribe to my newsletter
Read articles from Maurya kavi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
