Important Database Concepts (Part-3)

Pratyush PragyeyPratyush Pragyey
15 min read

Welcome to the third part of our DBMS blog series! In this section, we’ll delve into more advanced and performance-critical concepts that form the backbone of modern database systems. From Indexes that speed up data retrieval to Concurrency Control that ensures consistency in multi-user environments, we’ll also explore the CAP Theorem, Cursors, and Functional Dependencies—each playing a key role in efficient and reliable data management.

Views in DBMS

A View is a virtual table derived from one or more base tables through a SQL query. Unlike a regular table created using CREATE TABLE, a view does not store data physically on disk—it dynamically presents data each time it's queried. Think of it as a stored SELECT statement.

For example, when you run:

CREATE VIEW emp_details AS 
SELECT emp_id, emp_name, dept FROM employee 
WHERE dept = 'HR';

This creates a virtual table emp_details that only shows employees from the HR department, including just emp_id, emp_name, and dept—without storing any new data.

Use Case

Imagine you’re the admin of a college database. You want to give students access to their details but without exposing the full student table. Instead of granting them direct access, you can create a view that only shows specific columns or filtered data, ensuring both security and simplicity.

Purpose of Views

  • Simplicity: Predefine complex queries (joins, filters) into a view to make data access easier for users.

  • Data Abstraction: Hide complex relationships among tables behind a simplified view.

  • Security: Restrict access by exposing only specific columns/rows instead of entire tables.

Types of Views

  • Simple View:
    Based on a single table without aggregates or GROUP BY. Usually updatable.

  • Complex View:
    Involves multiple tables, aggregates, or clauses like DISTINCT, GROUP BY. Often not updatable.

  • Materialized View (Snapshot / Indexed View):
    Stores a physical copy of the query result. Great for improving performance on expensive queries but consumes disk space. Typically refreshed periodically.

Views can be read-only or updatable*. Updates on an updatable view will reflect in the base table. However, most complex views are read-only.*

-- Example
CREATE VIEW CustomerOrders AS 
SELECT C.CustomerID, C.Name, O.OrderID, O.OrderDate 
FROM Customers C 
JOIN Orders O ON C.CustomerID = O.CustomerID 
WHERE O.OrderDate >= '2023-01-01';

Indexes in DBMS

An Index is a performance optimization tool that speeds up data retrieval operations in large databases. It functions like the index of a book — helping the DBMS quickly find data without scanning the entire table.

Purpose of Indexes

  • Boost SELECT query performance by enabling faster search.

  • Improve operations using ORDER BY, GROUP BY, or join conditions.

  • Enforce uniqueness in columns when needed.

Drawbacks of Indexes

  • Consume extra disk space.

  • Slows down INSERT, UPDATE, DELETE operations since the index must be updated whenever data changes.

Types of Indexes

  • Primary Index:
    Automatically created on the primary key. Enforces uniqueness and often serves as a clustered index.

  • Secondary Index:
    Created on non-primary key columns to improve lookup performance.

CREATE INDEX index_name ON Students(name);
  • Clustered Index:
    Determines the physical order of rows in the table. Only one allowed per table.
CREATE CLUSTERED INDEX idx_rollno ON Students(RollNo);
  • Non-Clustered Index:
    Creates a separate structure pointing to the data. Multiple non-clustered indexes can exist.
CREATE NONCLUSTERED INDEX idx_name ON Students(name);
  • Unique Index:
    Ensures that all values in a column are distinct. Ideal for columns like email, username.
CREATE UNIQUE INDEX idx_email ON Users(Email);
  • Bitmap Index:
    Best suited for columns with low cardinality (few distinct values like gender or status). Efficient for read-heavy workloads.

  • Composite Index:
    Created on multiple columns to speed up queries involving all those columns.

CREATE INDEX idx_name_dob ON Students(Name, DOB);
  • Full-Text Index:
    Optimized for text searches (e.g., LIKE, MATCH, AGAINST). Used for searching in large text fields like blogs or articles.

Transactions and ACID Properties in DBMS

🔹 What is a Transaction?

A transaction is a sequence of one or more database operations — such as READ, WRITE, UPDATE, or DELETE—that are executed as a single unit of work. A transaction ensures that either all operations are successfully completed or none of them are. There's no in-between.

Example*: Consider a bank transfer from Account A to Account B. This involves:*

  • Debiting amount from Account A

  • Crediting the same amount to Account B

Both operations must succeed together*. If one fails, the transaction must **rollback** to preserve accuracy.*

Purpose of Transactions

The primary goal of transactions is to maintain data integrity and consistency, especially in multi-user environments where concurrent operations could otherwise lead to conflicts or incorrect results

ACID Properties

ACID is an acronym that defines the four key properties every reliable transaction must satisfy to ensure correctness and stability:

1. Atomicity — “All or Nothing”

Atomicity ensures that a transaction is indivisible. It either completes entirely or does not happen at all.
If any part of the transaction fails, the entire operation is rolled back, and the database is restored to its previous consistent state.

🔹 Why it matters: Prevents partial updates (e.g., debiting Account A but failing to credit Account B).

2. Consistency — “Valid to Valid State”

Consistency guarantees that a transaction will bring the database from one valid state to another, ensuring that all rules, constraints, and integrity conditions are maintained.

🔹 Why it matters: Prevents data anomalies and ensures that no transaction violates referential integrity, constraints like UNIQUE, NOT NULL, or foreign key dependencies.

3. Isolation — “Concurrent but Independent”

Isolation ensures that transactions executing concurrently do not interfere with each other. Each transaction behaves as though it is running in isolation, unaware of the other in-progress operations.

🔹 Why it matters: Prevents problems like:

  • Dirty reads — reading uncommitted changes

  • Non-repeatable reads — data changing mid-transaction

  • Phantom reads — new rows appearing during execution

4. Durability — “Once Committed, It’s Forever”

Durability ensures that once a transaction is committed, the changes are permanent, even in the event of system crashes, power failures, or hardware issues.

🔹 Why it matters: Ensures that committed data is saved to non-volatile storage and can be recovered after failure.

Concurrency Control in DBMS

In a multi-user database environment, multiple transactions often run simultaneously. When these transactions try to access or modify the same data concurrently, it can lead to inconsistencies and conflicts if not managed properly.

This is where Concurrency Control comes in — it ensures that concurrent transactions execute in a way that preserves database consistency and avoids problems such as:

  • Lost updates

  • Dirty reads

  • Non-repeatable reads

  • Phantom reads

Mechanisms for Concurrency Control

Locks

Locks are used to synchronize access to data items. Before accessing a data item, a transaction must request a lock:

  • If granted, it proceeds.

  • If not, it waits until the resource is released.

🔸 Types of Locks

  • Shared Lock (S-Lock / Read Lock)
    Allows multiple transactions to read a data item concurrently, but no transaction can write to it while the shared lock is held.

  • Exclusive Lock (X-Lock / Write Lock)
    Grants exclusive access to a data item. Only one transaction can hold this lock, preventing others from reading or writing the same item.

Two-Phase Locking Protocol (2PL)

2PL is a concurrency control protocol that ensures serializability, the highest level of isolation. It prevents anomalies like dirty reads, lost updates, and inconsistent data by dividing the transaction into two distinct phases:

  • Growing Phase:
    The transaction acquires all required locks.
    No locks are released.

  • Shrinking Phase:
    The transaction starts releasing locks.
    No new locks can be acquired.

-- Transaction T1
LOCK-X(A)         -- Growing Phase
READ A
WRITE A
LOCK-X(B)
WRITE B
UNLOCK(A)         -- Shrinking Phase starts
UNLOCK(B)

-- Transaction T2:
LOCK-S(B) -- Has to wait until T1 releases the B. As T1 holds the exclusive lock on B untill the shrinking phase, T2 can't access B before T1 finishes esuring serliaziblity
READ B
UBLOCK(B)

️Since T1 holds an exclusive lock on B, T2 must wait. This guarantees serializability*.*

Deadlock

A deadlock occurs when two or more transactions are waiting indefinitely for each other to release locks.

Deadlock Conditions

  1. Mutual Exclusion — Resources can’t be shared.

  2. Hold and Wait — A transaction holds a resource and waits for others.

  3. No Preemption — A resource can’t be forcibly taken away.

  4. Circular Wait — A closed chain of transactions exists, each waiting for the next.

Deadlocks can be avoided by requiring all resource requests to be declared in advance, allowing the system to avoid unsafe states.

Isolation Levels

Isolation levels define how visible one transaction’s changes are to others. Lower isolation means higher concurrency but greater risk of anomalies.

Common Anomalies

  • Dirty Read — Reading uncommitted changes from another transaction.

  • Non-Repeatable Read — Re-reading data returns different values due to external modifications.

  • Phantom Read — Re-running a query returns a different set of rows due to inserts/deletes by other transactions.

Cursors in SQL

In SQL, Cursors are special database objects that allow row-by-row processing of a result set. While most SQL operations work on entire sets of rows at once (set-based operations), there are situations where individual row-level processing is required — especially when logic involves conditional updates or sequential computation.

Example Use Case*: Increase the salary of all employees earning less than 5000 by 10%.
This cannot be done directly if custom logic or procedural steps are involved for each row.*

Example: Updating Salary Using Cursor

DECLARE
  -- Declare cursor
  CURSOR emp_cursor IS
    SELECT emp_id, emp_name, salary
    FROM employees
    WHERE salary < 5000;
-- Variables to hold fetched data
  v_emp_id    employees.emp_id%TYPE;
  v_emp_name  employees.emp_name%TYPE;
  v_salary    employees.salary%TYPE;
BEGIN
  -- Open cursor
  OPEN emp_cursor;
  -- Loop through each row
  LOOP
    -- Fetch data into variables
    FETCH emp_cursor INTO v_emp_id, v_emp_name, v_salary;
    -- Exit loop when no more rows
    EXIT WHEN emp_cursor%NOTFOUND;
    -- Update salary by 10%
    UPDATE employees
    SET salary = v_salary * 1.1
    WHERE emp_id = v_emp_id;
  END LOOP;
  -- Close cursor
  CLOSE emp_cursor;
  -- Commit changes
  COMMIT;
END;

Breakdown of the Flow

  • Declare Cursor:
    emp_cursor selects employees whose salary is less than 5000.

  • Open Cursor:
    Executes the query and stores the result set in memory.

  • Fetch & Process Rows:
    Each row is fetched into variables. A conditional operation (in this case, salary increment) is applied.

  • Loop Control:
    Continues until all rows are processed using EXIT WHEN emp_cursor%NOTFOUND.

  • Close Cursor:
    Releases memory and resources.

  • Commit:
    Saves all updates permanently to the database.

When to Use Cursors?

Use cursors when:

  • You need row-level operations or complex logic that can’t be expressed using set-based SQL.

  • You’re performing conditional updates based on intermediate values.

  • You’re inside a PL/SQL block or stored procedure needing procedural control.

However, avoid overusing cursors — they are generally slower and more resource-intensive than set-based alternatives.

Drawbacks of Cursors

  • Performance Overhead: Row-by-row operations are slower than set-based ones.

  • Network Latency: Each FETCH might result in a network round trip.

  • Resource Locks: Can hold locks for long durations, affecting concurrency.

  • Scalability Issues: Cursor-heavy applications are harder to scale for large datasets.

Alternatives to Cursors

  • UPDATE with JOIN: Efficient for updating multiple rows based on conditions.

  • DELETE with JOIN: To remove rows based on relationships.

  • Temporary Tables / CTEs: Store and transform intermediate data without explicit loops.

  • Set-based SQL: Use whenever possible for faster and more optimized operations.

SQL Set Operators: UNION, INTERSECT, MINUS (EXCEPT)

In SQL, set operators are used to combine the results of two or more SELECT queries. They allow you to perform set-based operations like union, intersection, and difference on query results.

Requirements for Set Operators

Before using these operators, the SELECT statements must be union-compatible, meaning:

  • They must return the same number of columns.

  • The corresponding columns must have compatible data types.

  • The order of columns must be the same in both queries.

UNION

The UNION operator combines the result sets of two SELECT queries and removes duplicate rows from the final output.

-- Syntax
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

Use when you want to merge datasets and eliminate duplicates automatically.

UNION ALL

If you want to include duplicate rows, use UNION ALL.
This is faster than UNION because it skips the duplicate removal step.

-- Example
SELECT Name FROM Customers
UNION ALL
SELECT Name FROM Employees;

INTERSECT

The INTERSECT operator returns only the rows that are common to both SELECT statements.

-- Syntax
SELECT column1, column2 FROM table1
INTERSECT
SELECT column1, column2 FROM table2;

Not supported in some databases like MySQL*, but you can simulate it using INNER JOIN or EXISTS.*

-- Example
SELECT Name FROM Customers
INTERSECT
SELECT Name FROM Employees;
-- Returns names that exist in both tables

MINUS / EXCEPT

The MINUS (or EXCEPT) operator returns the difference between two result sets.
It returns rows from the first query that are not present in the second.

-- Syntax (for EXCEPT in SQL Server, PostgreSQL)
SELECT column1, column2 FROM table1
EXCEPT
SELECT column1, column2 FROM table2;

🔴 In Oracle*, the equivalent operator is MINUS.*

-- Syntax (Oracle)
SELECT column1, column2 FROM table1
MINUS
SELECT column1, column2 FROM table2;

Functional Dependencies in DBMS

A Functional Dependency (FD) is a constraint between two sets of attributes in a relation. It describes a relationship between columns such that the value of one attribute (or set of attributes) uniquely determines the value of another attribute.

What is a Functional Dependency?

A functional dependency is denoted as:

X → Y

Where:

  • X is the determinant (a set of one or more attributes).

  • Y is the dependent attribute that is functionally determined by X.

Meaning: If two rows have the same value for X*, they must also have the **same value for Y**.*

Key Characteristics

  • Uniqueness:
    The value of X must uniquely determine the value of Y.

  • Not Bi-directional:
    If X → Y, it does not mean that Y → X.

  • Trivial Dependency:
    A dependency is trivial if Y ⊆ X.
    Example: {StudentID, StudentName} → StudentName is trivial because StudentName is already part of the determinant.

  • Non-Trivial Dependency:
    If Y is not a subset of X, it's a non-trivial dependency.
    Example: StudentID → StudentName

Types of Functional Dependencies

1. Full Functional Dependency

A dependency X → Y is fully functional if Y depends on the entire set X, and not on any subset of it.

Example:*Consider a relation with attributes {StudentID, CourseID, Grade}.
If Grade is determined by both StudentID and CourseID, then:*

{StudentID, CourseID} → Grade

This is a full functional dependency*.*

2. Partial Functional Dependency

A dependency is partial if the dependent attribute is determined by a part of a composite key, not the entire key.

Example:*From the relation {StudentID, CourseID} → StudentName,
If StudentID alone determines StudentName, then this is a **partial dependency**.*

3. Transitive Functional Dependency

A dependency is transitive when one attribute depends indirectly on another through a third attribute.

Example:*
If:*

StudentID → Department  
Department → DepartmentName

Then:

StudentID → DepartmentName

Transitive dependencies are a key reason for redundancy in data and are usually eliminated during Normalization (especially in 3NF).

Why Functional Dependencies Matter?

Functional dependencies are foundational for:

  • Normalization: Breaking down tables to eliminate redundancy and anomalies.

  • Identifying candidate keys.

  • Designing efficient, consistent database schemas.

CAP Theorem — The Core Trade-Off in Distributed Systems

The CAP Theorem, also known as Brewer’s Theorem, states that a distributed data system can provide only two out of the following three guarantees at any given time:

  1. Consistency(C)

  2. Availability(A)

  3. Partition Tolerance(P)

You can choose two*, but not all **three simultaneously** in the presence of network partitions.*

What Do These Properties Mean?

  • Consistency(C)
    Every read receives the most recent write or an error.
    → All nodes show the same data at the same time, no matter which node is accessed.

  • Availability(A)
    Every request gets a non-error response, regardless of how up-to-date the data is.
    → The system remains responsive, even if it’s serving stale data.

  • Partition Tolerance(P)
    The system continues to function even when network communication fails between nodes.
    → Ensures the system can survive network partitions, which are inevitable in distributed environments.

The Trade-Offs in CAP

Since it’s impossible to achieve all three properties simultaneously during a network failure, systems must make a trade-off between them:

CP (Consistency + Partition Tolerance)

  • Prioritizes data correctness over availability.

  • If a node is isolated, it may reject requests to avoid inconsistency.

  • Example: MongoDB (when configured for strong consistency)

Use Case: Financial apps like banking, where accuracy is critical*, and it’s acceptable to **delay** transactions during a partition.*

AP (Availability + Partition Tolerance)

  • Prioritizes availability, even if data may temporarily be inconsistent.

  • Nodes continue to accept reads/writes, and consistency is eventually restored (eventual consistency).

  • Example: Cassandra, DynamoDB

Use Case: Social media or messaging platforms where availability is more important*, and it’s okay if users see different like counts briefly.*

CA (Consistency + Availability)

  • Cannot tolerate network partitions.

  • Assumes a perfectly reliable network, which is unrealistic in distributed systems.

  • Example: PostgreSQL, in non-distributed or single-node setups.

Use Case: Internal tools or applications in a local network without partition risks.

The CAP theorem is not about picking the “best” — it’s about understanding what your application needs the most:

  • Need strong consistency? Be ready to sacrifice availability during partitions.

  • Need high uptime? Accept temporary inconsistencies.

  • Need both but working on a single node? You’re in the CA zone.

CAP helps you make informed architectural decisions based on the needs of your application.

Conclusion

In this third installment of our DBMS blog series, we explored several foundational yet powerful concepts that shape how databases perform in real-world applications. From understanding how views and indexes optimize query efficiency, to delving deep into transactions, concurrency control, and the ACID properties that ensure data reliability — each concept plays a critical role in designing robust and scalable systems.

We also discussed cursors for row-wise data processing, analyzed the impact of SQL set operators, and examined functional dependencies, which form the backbone of normalization. Finally, we wrapped up with the CAP theorem, a cornerstone in distributed system design that forces us to make tough but necessary architectural trade-offs.

🚀 Stay tuned for Part 4*, where we’ll take these concepts a step further and see how they’re applied in modern database systems.*

Until then, keep learning, keep querying!

0
Subscribe to my newsletter

Read articles from Pratyush Pragyey directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pratyush Pragyey
Pratyush Pragyey