How Databases Organize Data on Disks: A Simplified Guide
Before going to the actual database organization part, you first need to understand some prerequisites more clearly.
Disk Block:
When data is stored on disk-based storage devices, it is stored as blocks of data. You can say a "disk block" is the smallest unit of data that you can read or write on the disk in a single operation.
It's a fixed-size chunk of data, typically 4KB in modern systems, but it varies according to the disk and file system being used.
Pages:
The basic unit of data for relational databases is a record (row).
These records are organized to make the relations (tables).
Whenever we need to search, modify, add, or delete any row in a table, we first move the related data into the main memory (RAM). After performing the operation, we move it back to storage. This is where pages are important.
A page is the smallest unit of data that we can transfer between disk and memory in a single operation. It's a fixed-size chunk of data and usually larger than a disk block. Page sizes are typically powers of 2, such as 4 KB, 8 KB, or 16 KB.
Most of the time, a database can't hold all data in the memory, thus it employs a buffer pool to maintain a list of active pages based on LRU (Least Recently Used). The buffer pool swaps the pages continuously based on the access pattern, a process known as page-in, and page-out.
Database paging and OS virtual memory paging:
Database paging and OS virtual memory paging are both memory management techniques, but they operate at different levels.
Database paging, used by DBMSs like Oracle, moves data between memory and disk in database-specific units (e.g., 8KB pages) to handle large datasets efficiently.
OS paging, implemented by systems like Linux, swaps fixed-size memory pages (typically 4KB) between RAM and disk to extend available memory for all processes.
For instance, a database might page out rarely accessed table data, while the OS might page out inactive parts of a running application. Both aim to optimize performance and resource utilization, but database paging is application-specific and OS paging is a system-wide mechanism.
Let's consider a database system storing customer information:
Disk Block: Assume the disk block size is 4 KB. Each customer record is 200 bytes long. One disk block can store up to 20 customer records (4 KB / 200 bytes = 20).
Database Page: The database system uses an 8 KB page size. Each page can store up to 40 customer records (8 KB / 200 bytes = 40).
When the database needs to read or write customer data:
It operates on whole pages (8 KB).
Each page read or write operation involves two disk blocks (8 KB / 4 KB = 2).
So, if the database needs to update a single customer record:
It reads the entire 8 KB page containing that record into memory.
This involves reading two 4 KB disk blocks.
The system modifies the record in memory.
When writing back, it writes the entire 8 KB page, which again involves two 4 KB disk block write operations.
Heap:
Heap in the context of databases refers to the data structure where the table’s actual data is stored in consecutive pages. It serves as the physical storage for the table and contains all the rows and columns.
Heaps are stored on disk using a series of disk pages. Each disk page is a fixed-size unit of storage (typically 4KB, 8KB, or 16KB, depending on the database system). The heap consists of a collection of these pages, which together store all the records of a table.
These records would be stored on disk pages in no particular order. They might be stored in the order they were inserted, but this isn't guaranteed. Searching for specific data can be slow, requiring a full heap scan.
This is where indexes come into play. Indexes are data structures separate from the heap that provides optimized search mechanisms.
Indexes:
Index in the context of database systems is a separate data structure different from the heap, and it contains the actual value of a column in which we have applied the index and a “pointer” to the heap where the actual record is stored.
However, the downside of indexes is that they require additional space to store, and also make database write time more increased. Whenever a record that is indexed is created, updated, or deleted, its corresponding indexes are also created, updated, or deleted.
You actually have two writing operations in the file system: one for the table data and another for the index data (including its resorting). Thus, a table without an index allows for quicker write operations. If you had two indexes, you would end up with three write operations, and so on.
So, before creating indexes on the database, you should analyze your system. Consider whether your system is read-heavy or write-heavy, and evaluate other factors before making a decision.
I think all these internal concepts help you to visualize the internal implementation of the database data storage.
Subscribe to my newsletter
Read articles from Muhammad Abubakar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Muhammad Abubakar
Muhammad Abubakar
A Senior Software Engineer, Who Loves learning Software Engineering Internals, having core mastery of Frontend, but a keen interest in backend, infra, and devOps.