Important Database Concepts (Part-3)


Welcome to the third part of our DBMS blog series! In this section, we’ll delve into more advanced and performance-critical concepts that form the backbone of modern database systems. From Indexes that speed up data retrieval to Concurrency Control that ensures consistency in multi-user environments, we’ll also explore the CAP Theorem, Cursors, and Functional Dependencies—each playing a key role in efficient and reliable data management.
Views in DBMS
A View is a virtual table derived from one or more base tables through a SQL query. Unlike a regular table created using CREATE TABLE
, a view does not store data physically on disk—it dynamically presents data each time it's queried. Think of it as a stored SELECT statement.
For example, when you run:
CREATE VIEW emp_details AS
SELECT emp_id, emp_name, dept FROM employee
WHERE dept = 'HR';
This creates a virtual table emp_details
that only shows employees from the HR department, including just emp_id
, emp_name
, and dept
—without storing any new data.
Use Case
Imagine you’re the admin of a college database. You want to give students access to their details but without exposing the full student table. Instead of granting them direct access, you can create a view that only shows specific columns or filtered data, ensuring both security and simplicity.
Purpose of Views
Simplicity: Predefine complex queries (joins, filters) into a view to make data access easier for users.
Data Abstraction: Hide complex relationships among tables behind a simplified view.
Security: Restrict access by exposing only specific columns/rows instead of entire tables.
Types of Views
Simple View:
Based on a single table without aggregates orGROUP BY
. Usually updatable.Complex View:
Involves multiple tables, aggregates, or clauses likeDISTINCT
,GROUP BY
. Often not updatable.Materialized View (Snapshot / Indexed View):
Stores a physical copy of the query result. Great for improving performance on expensive queries but consumes disk space. Typically refreshed periodically.
Views can be read-only or updatable*. Updates on an updatable view will reflect in the base table. However, most complex views are read-only.*
-- Example
CREATE VIEW CustomerOrders AS
SELECT C.CustomerID, C.Name, O.OrderID, O.OrderDate
FROM Customers C
JOIN Orders O ON C.CustomerID = O.CustomerID
WHERE O.OrderDate >= '2023-01-01';
Indexes in DBMS
An Index is a performance optimization tool that speeds up data retrieval operations in large databases. It functions like the index of a book — helping the DBMS quickly find data without scanning the entire table.
Purpose of Indexes
Boost SELECT query performance by enabling faster search.
Improve operations using
ORDER BY
,GROUP BY
, or join conditions.Enforce uniqueness in columns when needed.
Drawbacks of Indexes
Consume extra disk space.
Slows down INSERT, UPDATE, DELETE operations since the index must be updated whenever data changes.
Types of Indexes
Primary Index:
Automatically created on the primary key. Enforces uniqueness and often serves as a clustered index.Secondary Index:
Created on non-primary key columns to improve lookup performance.
CREATE INDEX index_name ON Students(name);
- Clustered Index:
Determines the physical order of rows in the table. Only one allowed per table.
CREATE CLUSTERED INDEX idx_rollno ON Students(RollNo);
- Non-Clustered Index:
Creates a separate structure pointing to the data. Multiple non-clustered indexes can exist.
CREATE NONCLUSTERED INDEX idx_name ON Students(name);
- Unique Index:
Ensures that all values in a column are distinct. Ideal for columns likeemail
,username
.
CREATE UNIQUE INDEX idx_email ON Users(Email);
Bitmap Index:
Best suited for columns with low cardinality (few distinct values like gender or status). Efficient for read-heavy workloads.Composite Index:
Created on multiple columns to speed up queries involving all those columns.
CREATE INDEX idx_name_dob ON Students(Name, DOB);
- Full-Text Index:
Optimized for text searches (e.g.,LIKE
,MATCH
,AGAINST
). Used for searching in large text fields like blogs or articles.
Transactions and ACID Properties in DBMS
🔹 What is a Transaction?
A transaction is a sequence of one or more database operations — such as READ
, WRITE
, UPDATE
, or DELETE
—that are executed as a single unit of work. A transaction ensures that either all operations are successfully completed or none of them are. There's no in-between.
Example*: Consider a bank transfer from Account A to Account B. This involves:*
Debiting amount from Account A
Crediting the same amount to Account B
Both operations must succeed together*. If one fails, the transaction must **rollback** to preserve accuracy.*
Purpose of Transactions
The primary goal of transactions is to maintain data integrity and consistency, especially in multi-user environments where concurrent operations could otherwise lead to conflicts or incorrect results
ACID Properties
ACID is an acronym that defines the four key properties every reliable transaction must satisfy to ensure correctness and stability:
1. Atomicity — “All or Nothing”
Atomicity ensures that a transaction is indivisible. It either completes entirely or does not happen at all.
If any part of the transaction fails, the entire operation is rolled back, and the database is restored to its previous consistent state.
🔹 Why it matters: Prevents partial updates (e.g., debiting Account A but failing to credit Account B).
2. Consistency — “Valid to Valid State”
Consistency guarantees that a transaction will bring the database from one valid state to another, ensuring that all rules, constraints, and integrity conditions are maintained.
🔹 Why it matters: Prevents data anomalies and ensures that no transaction violates referential integrity, constraints like UNIQUE
, NOT NULL
, or foreign key dependencies.
3. Isolation — “Concurrent but Independent”
Isolation ensures that transactions executing concurrently do not interfere with each other. Each transaction behaves as though it is running in isolation, unaware of the other in-progress operations.
🔹 Why it matters: Prevents problems like:
Dirty reads — reading uncommitted changes
Non-repeatable reads — data changing mid-transaction
Phantom reads — new rows appearing during execution
4. Durability — “Once Committed, It’s Forever”
Durability ensures that once a transaction is committed, the changes are permanent, even in the event of system crashes, power failures, or hardware issues.
🔹 Why it matters: Ensures that committed data is saved to non-volatile storage and can be recovered after failure.
Concurrency Control in DBMS
In a multi-user database environment, multiple transactions often run simultaneously. When these transactions try to access or modify the same data concurrently, it can lead to inconsistencies and conflicts if not managed properly.
This is where Concurrency Control comes in — it ensures that concurrent transactions execute in a way that preserves database consistency and avoids problems such as:
Lost updates
Dirty reads
Non-repeatable reads
Phantom reads
Mechanisms for Concurrency Control
Locks
Locks are used to synchronize access to data items. Before accessing a data item, a transaction must request a lock:
If granted, it proceeds.
If not, it waits until the resource is released.
🔸 Types of Locks
Shared Lock (S-Lock / Read Lock)
Allows multiple transactions to read a data item concurrently, but no transaction can write to it while the shared lock is held.Exclusive Lock (X-Lock / Write Lock)
Grants exclusive access to a data item. Only one transaction can hold this lock, preventing others from reading or writing the same item.
Two-Phase Locking Protocol (2PL)
2PL is a concurrency control protocol that ensures serializability, the highest level of isolation. It prevents anomalies like dirty reads, lost updates, and inconsistent data by dividing the transaction into two distinct phases:
Growing Phase:
The transaction acquires all required locks.
No locks are released.Shrinking Phase:
The transaction starts releasing locks.
No new locks can be acquired.
-- Transaction T1
LOCK-X(A) -- Growing Phase
READ A
WRITE A
LOCK-X(B)
WRITE B
UNLOCK(A) -- Shrinking Phase starts
UNLOCK(B)
-- Transaction T2:
LOCK-S(B) -- Has to wait until T1 releases the B. As T1 holds the exclusive lock on B untill the shrinking phase, T2 can't access B before T1 finishes esuring serliaziblity
READ B
UBLOCK(B)
️Since T1 holds an exclusive lock on B, T2 must wait. This guarantees serializability*.*
Deadlock
A deadlock occurs when two or more transactions are waiting indefinitely for each other to release locks.
Deadlock Conditions
Mutual Exclusion — Resources can’t be shared.
Hold and Wait — A transaction holds a resource and waits for others.
No Preemption — A resource can’t be forcibly taken away.
Circular Wait — A closed chain of transactions exists, each waiting for the next.
Deadlocks can be avoided by requiring all resource requests to be declared in advance, allowing the system to avoid unsafe states.
Isolation Levels
Isolation levels define how visible one transaction’s changes are to others. Lower isolation means higher concurrency but greater risk of anomalies.
Common Anomalies
Dirty Read — Reading uncommitted changes from another transaction.
Non-Repeatable Read — Re-reading data returns different values due to external modifications.
Phantom Read — Re-running a query returns a different set of rows due to inserts/deletes by other transactions.
Cursors in SQL
In SQL, Cursors are special database objects that allow row-by-row processing of a result set. While most SQL operations work on entire sets of rows at once (set-based operations), there are situations where individual row-level processing is required — especially when logic involves conditional updates or sequential computation.
Example Use Case*: Increase the salary of all employees earning less than 5000 by 10%.
This cannot be done directly if custom logic or procedural steps are involved for each row.*
Example: Updating Salary Using Cursor
DECLARE
-- Declare cursor
CURSOR emp_cursor IS
SELECT emp_id, emp_name, salary
FROM employees
WHERE salary < 5000;
-- Variables to hold fetched data
v_emp_id employees.emp_id%TYPE;
v_emp_name employees.emp_name%TYPE;
v_salary employees.salary%TYPE;
BEGIN
-- Open cursor
OPEN emp_cursor;
-- Loop through each row
LOOP
-- Fetch data into variables
FETCH emp_cursor INTO v_emp_id, v_emp_name, v_salary;
-- Exit loop when no more rows
EXIT WHEN emp_cursor%NOTFOUND;
-- Update salary by 10%
UPDATE employees
SET salary = v_salary * 1.1
WHERE emp_id = v_emp_id;
END LOOP;
-- Close cursor
CLOSE emp_cursor;
-- Commit changes
COMMIT;
END;
Breakdown of the Flow
Declare Cursor:
emp_cursor
selects employees whose salary is less than 5000.Open Cursor:
Executes the query and stores the result set in memory.Fetch & Process Rows:
Each row is fetched into variables. A conditional operation (in this case, salary increment) is applied.Loop Control:
Continues until all rows are processed usingEXIT WHEN emp_cursor%NOTFOUND
.Close Cursor:
Releases memory and resources.Commit:
Saves all updates permanently to the database.
When to Use Cursors?
Use cursors when:
You need row-level operations or complex logic that can’t be expressed using set-based SQL.
You’re performing conditional updates based on intermediate values.
You’re inside a PL/SQL block or stored procedure needing procedural control.
However, avoid overusing cursors — they are generally slower and more resource-intensive than set-based alternatives.
Drawbacks of Cursors
Performance Overhead: Row-by-row operations are slower than set-based ones.
Network Latency: Each
FETCH
might result in a network round trip.Resource Locks: Can hold locks for long durations, affecting concurrency.
Scalability Issues: Cursor-heavy applications are harder to scale for large datasets.
Alternatives to Cursors
UPDATE with JOIN: Efficient for updating multiple rows based on conditions.
DELETE with JOIN: To remove rows based on relationships.
Temporary Tables / CTEs: Store and transform intermediate data without explicit loops.
Set-based SQL: Use whenever possible for faster and more optimized operations.
SQL Set Operators: UNION, INTERSECT, MINUS (EXCEPT)
In SQL, set operators are used to combine the results of two or more SELECT
queries. They allow you to perform set-based operations like union, intersection, and difference on query results.
Requirements for Set Operators
Before using these operators, the SELECT
statements must be union-compatible, meaning:
They must return the same number of columns.
The corresponding columns must have compatible data types.
The order of columns must be the same in both queries.
UNION
The UNION
operator combines the result sets of two SELECT
queries and removes duplicate rows from the final output.
-- Syntax
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;
Use when you want to merge datasets and eliminate duplicates automatically.
UNION ALL
If you want to include duplicate rows, use UNION ALL
.
This is faster than UNION
because it skips the duplicate removal step.
-- Example
SELECT Name FROM Customers
UNION ALL
SELECT Name FROM Employees;
INTERSECT
The INTERSECT
operator returns only the rows that are common to both SELECT
statements.
-- Syntax
SELECT column1, column2 FROM table1
INTERSECT
SELECT column1, column2 FROM table2;
Not supported in some databases like MySQL*, but you can simulate it using
INNER JOIN
orEXISTS
.*
-- Example
SELECT Name FROM Customers
INTERSECT
SELECT Name FROM Employees;
-- Returns names that exist in both tables
MINUS / EXCEPT
The MINUS
(or EXCEPT
) operator returns the difference between two result sets.
It returns rows from the first query that are not present in the second.
-- Syntax (for EXCEPT in SQL Server, PostgreSQL)
SELECT column1, column2 FROM table1
EXCEPT
SELECT column1, column2 FROM table2;
🔴 In Oracle*, the equivalent operator is
MINUS
.*
-- Syntax (Oracle)
SELECT column1, column2 FROM table1
MINUS
SELECT column1, column2 FROM table2;
Functional Dependencies in DBMS
A Functional Dependency (FD) is a constraint between two sets of attributes in a relation. It describes a relationship between columns such that the value of one attribute (or set of attributes) uniquely determines the value of another attribute.
What is a Functional Dependency?
A functional dependency is denoted as:
X → Y
Where:
X is the determinant (a set of one or more attributes).
Y is the dependent attribute that is functionally determined by X.
Meaning: If two rows have the same value for X*, they must also have the **same value for Y**.*
Key Characteristics
Uniqueness:
The value ofX
must uniquely determine the value ofY
.Not Bi-directional:
IfX → Y
, it does not mean thatY → X
.Trivial Dependency:
A dependency is trivial ifY ⊆ X
.
Example:{StudentID, StudentName} → StudentName
is trivial becauseStudentName
is already part of the determinant.Non-Trivial Dependency:
IfY
is not a subset ofX
, it's a non-trivial dependency.
Example:StudentID → StudentName
Types of Functional Dependencies
1. Full Functional Dependency
A dependency X → Y
is fully functional if Y depends on the entire set X
, and not on any subset of it.
Example:*Consider a relation with attributes
{StudentID, CourseID, Grade}
.
IfGrade
is determined by bothStudentID
andCourseID
, then:*
{StudentID, CourseID} → Grade
This is a full functional dependency*.*
2. Partial Functional Dependency
A dependency is partial if the dependent attribute is determined by a part of a composite key, not the entire key.
Example:*From the relation
{StudentID, CourseID} → StudentName
,
IfStudentID
alone determinesStudentName
, then this is a **partial dependency**.*
3. Transitive Functional Dependency
A dependency is transitive when one attribute depends indirectly on another through a third attribute.
Example:*
If:*
StudentID → Department
Department → DepartmentName
Then:
StudentID → DepartmentName
Transitive dependencies are a key reason for redundancy in data and are usually eliminated during Normalization (especially in 3NF).
Why Functional Dependencies Matter?
Functional dependencies are foundational for:
Normalization: Breaking down tables to eliminate redundancy and anomalies.
Identifying candidate keys.
Designing efficient, consistent database schemas.
CAP Theorem — The Core Trade-Off in Distributed Systems
The CAP Theorem, also known as Brewer’s Theorem, states that a distributed data system can provide only two out of the following three guarantees at any given time:
Consistency(C)
Availability(A)
Partition Tolerance(P)
You can choose two*, but not all **three simultaneously** in the presence of network partitions.*
What Do These Properties Mean?
Consistency(C)
Every read receives the most recent write or an error.
→ All nodes show the same data at the same time, no matter which node is accessed.Availability(A)
Every request gets a non-error response, regardless of how up-to-date the data is.
→ The system remains responsive, even if it’s serving stale data.Partition Tolerance(P)
The system continues to function even when network communication fails between nodes.
→ Ensures the system can survive network partitions, which are inevitable in distributed environments.
The Trade-Offs in CAP
Since it’s impossible to achieve all three properties simultaneously during a network failure, systems must make a trade-off between them:
CP (Consistency + Partition Tolerance)
Prioritizes data correctness over availability.
If a node is isolated, it may reject requests to avoid inconsistency.
Example: MongoDB (when configured for strong consistency)
Use Case: Financial apps like banking, where accuracy is critical*, and it’s acceptable to **delay** transactions during a partition.*
AP (Availability + Partition Tolerance)
Prioritizes availability, even if data may temporarily be inconsistent.
Nodes continue to accept reads/writes, and consistency is eventually restored (eventual consistency).
Example: Cassandra, DynamoDB
Use Case: Social media or messaging platforms where availability is more important*, and it’s okay if users see different like counts briefly.*
CA (Consistency + Availability)
Cannot tolerate network partitions.
Assumes a perfectly reliable network, which is unrealistic in distributed systems.
Example: PostgreSQL, in non-distributed or single-node setups.
Use Case: Internal tools or applications in a local network without partition risks.
The CAP theorem is not about picking the “best” — it’s about understanding what your application needs the most:
Need strong consistency? Be ready to sacrifice availability during partitions.
Need high uptime? Accept temporary inconsistencies.
Need both but working on a single node? You’re in the CA zone.
CAP helps you make informed architectural decisions based on the needs of your application.
Conclusion
In this third installment of our DBMS blog series, we explored several foundational yet powerful concepts that shape how databases perform in real-world applications. From understanding how views and indexes optimize query efficiency, to delving deep into transactions, concurrency control, and the ACID properties that ensure data reliability — each concept plays a critical role in designing robust and scalable systems.
We also discussed cursors for row-wise data processing, analyzed the impact of SQL set operators, and examined functional dependencies, which form the backbone of normalization. Finally, we wrapped up with the CAP theorem, a cornerstone in distributed system design that forces us to make tough but necessary architectural trade-offs.
🚀 Stay tuned for Part 4*, where we’ll take these concepts a step further and see how they’re applied in modern database systems.*
Until then, keep learning, keep querying!
Subscribe to my newsletter
Read articles from Pratyush Pragyey directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
