11/19/2024 | News release | Distributed by Public on 11/19/2024 12:37
When planning for disaster recovery, two key availability metrics determine your recovery objectives and the maximum risk that your organization can endure after a disaster occurs. These metrics are:
RPOs and RTOs are the metrics organizations use to determine backup and recovery objectives and how well those objectives were met after a disaster occurs.
In this article, I'll look at the roles that RTOs and RPOs play in disaster recovery (DR), high availability (HA), and business continuity (BC)-these all play a factor in unplanned downtime, which can cost businesses a lot.
Illustration showing the various direct costs of downtime, measured in USD millions, The Hidden Cost of Downtime, 2024
RPOs and RTOs inform the business decisions that help determine the software, hardware, processes, personnel, and other resources needed to restore operations after a disaster. Both metrics measure timeframes.
As illustrated in the example below, RPOs and RTOs together look backwards and forwards from the point in time of a disaster:
An organization may designate several different RPO and RTO metrics for different items.
RTO and RPO illustrated on a timeline, before and after a disaster occurs. (Original image source.)
As defined in our introduction, an RPO is the maximum amount of data loss an organization can tolerate after a disaster happens.
Recovery Point Objectives measure how much data will be lost that cannot be restored after an incident occurs. RPOs create goals for minimizing data loss during an outage. They guide recovery personnel as to what previous state data (backups) must be available for restoration.
RPOs are usually measured in minutes or hours, designating how much valuable data will be lost when systems are unexpectedly terminated. For example, a typical RPO may state that:
Systems will be recovered with no more than 15 minutes of data loss.
RPO requirements help drive system backup and disaster recovery business decisions for items such as:
Short for "recovery time objective," an RTO defines the maximum time period from when a resource failure occurs to when critical resources, processes, and systems must be restored and reactivated.
Recovery Time Objectives set a target for:
RTOs specify the desired timeframe for restoring critical resources, processes, and systems. RTOs should not be taken lightly. After all, recent high-profile ransomware attacks have shown that, without proper recovery processes, organizations can be disabled for days - if not weeks - while systems are being restored.
RTOs should be determined based on organizational needs for:
You'll need to review your critical/non-critical resource list. RPOs and DR/HA/BC plans will need to be reviewed on a regular basis to update RTO metrics for new apps and processes.
RTOs are designated in minutes, hours, days, etc. A typical RTO may state, for example:
Critical solutions will be completely restored within four hours while non-critical solutions may be restored within three days of the incident.
Similar to RPOs, RTOs help drive system backup and disaster recovery strategies for items such as:
RPOs and RTOs are also useful for other purposes outside of disaster recovery planning.
SLAs and related contracts or leases. Service Level Agreements (SLAs), leases, and other contracts may contain RPO/RTO numbers for:
You may see RPO and RTO numbers appear in data center leases, cloud backup contracts, and other contract items where business risk could occur if systems or data are not available.
Alongside other availability metrics. They are also used along with other availability metrics such as Maximum Acceptable Outage (MAO) in Business Impact Analysis (BIA) planning. BIAs help organizations to:
(Related reading: availability management & five 9's of availability.)
Absolutely use RPOs and RTOs as key inputs when developing your strategies for disaster recovery, high availability, and business continuity. These should be included in any scripts, runbooks, and other documentation in that strategy.
You'll also reference RPOs and RTOs in management and operational reporting for auditing and accountability, and for planning how other line-of-business functions will respond during a disaster.
When testing disaster recovery plans, it is helpful to record a third metric: Recovery Time Actual (RTA).
RTA measures the actual amount of time it takes to activate your DR/HA/BC solution after a disaster. Unlike RPOs and RTOs which are objectives, an RTA is a benchmark that can be compared against your RTO to determine how effective your restoration strategy is during an actual disaster recovery process.
Performing regular disaster recovery tests can also help you gauge your RPO effectiveness, helping to determine whether your disaster recovery backup and restore procedures can meet RPO and RTO objectives.