Know About RPO (Recovery Point Objective)

akhil kvakhil kv
4 min read

Welcome to this Article where I’m going to step through a concept which
I think should be mandatory knowledge for any solutions architects, and for
anyone else working in IT, these are also really useful. First, we have Recovery
Point Objective
, known as RPO.

RPO is defined as the maximum amount of data, generally expressed in time, that can be lost during a disaster recovery situation before that loss exceeds what the organization can tolerate.

Generally, if you’re a solutions architect helping a client, they will give you their required values for both of these. In some cases, you might need to work with key stakeholders within the business to determine appropriate values. In either case, if you get them wrong, it can have a massive negative consequence for a business.

Let's understand RPO with a Scenario

I want you to consider an animal rescue business with animals arriving to be fostered 24/7/365. They have intake vet exams and data is stored within on-premises systems which need to be referred to constantly throughout the day. At a certain point in time, let’s say 2:00 AM, we have a server failure, and for this example, let’s assume this is a single server that stores all of the data for the organization, and they have no redundancy. This is a terrible situation, but it’s all too common for cash-strapped charities.

If an organization tells you that they have an RPO of six hours, it means the organization cannot tolerate more than six hours of data loss when recovering from a disaster like this server failure. Now, different organizations will have different RPO values. Banks logically will be able to tolerate almost no data loss because they deal with customer money, whereas an online store might be able to tolerate some data loss as they can, in theory, recreate orders in other ways.

Understanding how data can be lost during disaster recovery scenarios is key to understanding how to implement a given RPO requirement.

Let’s consider this scenario, every six hours starting at 3:00 PM on day 1, the business takes a full backup of the server which has failed. So normally,
we have a backup at 3:00 PM, one at 9:00 PM, one at 3:00 AM, and one at 9:00
AM, so four backups every 24 hours are split by six hours. To recover data from the failed server, we need to restore a backup. Ideally, assuming that we have no failures, it will be from the most recent backup. Now, success backups are known as recovery points. In the case of full backups, each successful backup is one recovery point.

If you use full backups and incremental backups, it’s possible that to restore a single incremental backup, i.e., to use that one recovery point, you’ll need the most recent full backup and every incremental backup between that full and the most recent okay incremental backup. So a recovery point may need more than one backup. With this scenario, if the server fails at 2:00 AM, the data loss would be the time between 2:00 AM and the most recent recovery point, in this case, 9:00 PM the previous day. So this represents five hours of lost data. If the failure occurred right after the 9:00 PM backup had finished, we’d have almost no data loss. If the failure occurred one hour later at 3:00 AM, we would have had six hours of data loss.

Now, the maximum loss of data for this type of scenario is the time between two successful backups.

An RPO of six hours means at minimum a backup every six hours, but to cope with random backup failure, generally, you’ll want to make sure backups occur more frequently than required, so in this example, maybe once every three hours or maybe even once an hour. Lower RPOs generally require more frequent backups, which historically has resulted in higher costs for backup systems, both in terms of media, but also licensing management overhead, and other associated processes. So RPO is a value that is generally given to you by an organization or you might have to work with an organization to identify an appropriate value, and it states how much maximum loss of data in time the business can tolerate. Different businesses will have different RPOs and sometimes even different RPOs for different systems within a single organization. A bank might have super low RPOs for its financial systems, but it might tolerate a much higher one for its website .

Summary,

  • RPO is how much data(MAX TIME) a business can lose in case of any server failure.

  • The worst or highest RPO for any company would be the time difference between two successful backups.

  • Low RPO leads to High Backups, in turn high cost of maintenance.

  • Generally, RPO varies for each business's needs depending on the criticality of its operations.

  • Aim for the best possible RPO in the interest of the Company.

please refer to this link for the RTO another important term.

0
Subscribe to my newsletter

Read articles from akhil kv directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

akhil kv
akhil kv