Concurrency Problems in Database Systems
Before beginning, it's important to introduce some abbreviations that will be used:
R(A): This refers to reading the value of A from the database.
W(A): This refers to writing or updating the value of A with a new value.
What is Concurrency?
It's the ability of a system to execute multiple tasks at the same time.
How can that lead to problems in Database Systems?
These problems occur when multiple users or processes try to access and modify the same data at the same time. To better understand, let's explore some scenarios.
Suppose there is a bank account that has an initial balance of $500.
There are 2 user requests on the same account at the same time.
User "A" wants to deposit $100 in the account.
User "B" wants to withdraw $50 from the account.
The first step for both requests is to read the value of the account balance.
suppose that User "A" reads it then User "B" read it.
both of them now read $500 (The account balance).
User "A" wants to deposit $100, so it sends a request to the database to update the balance to $600.
User "B" wants to withdraw $50, so it sends a request to the database to update the balance to $450.
There are 2 scenarios here.
if User A's update is processed first, the balance will be updated to $600, and then User B's update will be processed, updating the balance to $450. now the final value in the database is $450.
if User B's update is processed first, the balance will be updated to $450, and then User A's update will be processed, updating the balance to $600. now the final value in the database is $600.
Since the true value of the balance after both requests are done should be $550. but in both scenarios the stored value is wrong!
Let's establish some context before we can discuss the conflicts.
Every transaction consists of three main operations:
BEGIN -> beginning the transaction.
operate -> performing the desired operations.
COMMIT or ABORT -> means failed or success transaction.
COMMIT and ABORT are the outcomes (return) of a transaction in Database Systems.
If COMMIT: all the transaction modifications are saved to the Database.
If ABORT: all transaction changes are undone. it's like the transaction never happened.
That's called Automicity: all actions in the transaction happen or none happen. "All or Nothing"
Let's take an example but first, let's define what does transaction mean.
A transaction is a sequence of read-and-write operations. ex: one transaction might involve reading a database attribute, updating it, and then reading it again. All of these operations are performed as a single entity. Why?
Let's say that there is a transaction that contains transferring $100 from account A to account B.
Here are the transaction's operations one by one.
R(A), W(A), R(B), W(B)
BEGIN -> That means transaction starts.
R(A) -> suppose that's done successfully.
A = A-100 -> decrease A by $100.
W(A) -> suppose that updating A also successfull.
** suppose now there is a power failure happened **
ABORT or COMMIT??
Now the system should ABORT this transaction.but before
aborting it. it should (and will) undo the operations done.
like updating A. because the transaction failed so User A
money should be returned to him.
And That's the reason to be "All or Nothing"
Let's Identify the conflicting operations that might happen:
- Write-Read Conflicts.
suppose there are a 2 transactions (T1 and T2) on attribute
"A" at the same time
let's say that the system schedules them like that:
T1 T2
BEGIN
R(A) -> $10
W(A) -> $20
BEGIN
R(A) -> $20
A = A * 2
W(A) -> $40
COMMIT
ABORT
Here transaction T1 read $10 then updated it to $20. but
before it commits. transaction T2 Read "A" $20 and updated
it with it's doubled value $40. then T1 ABORTED the
transaction for any reason power failure or deadlock
detection ( we will discuss it later). so T2 has read an
aborted value and doubled it. and the correct answer for
it was to read $10 and double it to $20. That's
a Write-Read conflict.
Write-Write conflicts:
supppose there are a 2 transactions (T1 and T2) on attribute
"A" and "B" at the same time
let's say that the system schedules them like that:
T1 T2
BEGIN
W(A) -> $10
BEGIN
W(A) -> $20
W(B) -> Diaa
COMMIT
W(B) -> Ahmed
COMMIT
Here the final value of the attributes will depend on the
schedule. based on that schedule the final values will
be A = $20 and B = "Ahmed"
That's a conflict because $20 came from T2. but Ahmed came
from T1. If the user which requested T2 Read A and B again
he will be confused! A is his value but B is the other one!.
You may wonder why we don't execute one transaction and then the other, rather than running them simultaneously. The reason for this is that doing so would ignore the benefits of multitasking and reduce parallelism, ultimately leading to decreased performance. By running transactions in parallel, we can take advantage of available resources and increase efficiency.
Conclusion
Concurrency problems can occur when multiple users or processes attempt to access and manipulate the same data simultaneously. These problems can include write-read conflicts, write-write conflicts, and other types of conflicts that can lead to inconsistent or incorrect results. Concurrency problems can be difficult to detect and can cause significant issues in systems that rely on accurate and up-to-date data. It is important to be aware of the potential for concurrency problems. There are various techniques and solutions available to address concurrency problems. These solutions include locking mechanisms, optimistic concurrency control, transaction isolation levels, and other techniques. we can explore these solutions in more detail in the next articles.
Subscribe to my newsletter
Read articles from Diaa Badr directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by