Disaster Recovery Guidance Banner

Sponsored Links

Recommended Reading

Home Tutorials Foundations Getting Started - Laying the Groundwork (RTO's and RPO's)
Getting Started - Laying the Groundwork (RTO's and RPO's) Print E-mail
Disaster Recovery Tutorials
Written by Gareth Eagar   

The first step in developing a disaster recovery plan is to work with the business to determine the objectives for your plan. This includes determining metrics such as the Recovery Time Objective and the Recovery Point Objective for critical business processes, which you can then map to your IT environment. This tutorial gets you started with determining the objectives for your plan.


In this tutorial:

Recovery Time Objective

Recovery Point Objective


Working with the business

The first step in preparing a Disaster Recovery plan is to work with the business to determine various Business Continuity objectives. While Disaster Recovery as we are looking at it references the recovery of systems required for the business to continue operations, it is the business that needs to determine the priorities and requirements for recovery of those systems.

In larger organizations these objectives should be contained in the Business Continuity (BC) plan, if one has already been developed. If there is no existing Business Continuity plan than the business should undertake to develop a BC plan prior to development of the DR plan.

This aspect of the planning needs to be driven by the business. For smaller organizations, the person developing the disaster recovery plan may also be required to drive the business to determine the recovery objectives, however it is essential that it be the business that determines the objectives as this ultimately is a reflection of the organizations risk profile.

The primary values that need to be determined by the business for development of the Disaster Recovery plan are:

- The Recovery Time Objective

- The Recovery Point Objective

At this point you may be thinking that these objectives will be for the systems that you plan to recover, however while these will ultimately relate to systems, these objectives need to be specified for business processes and not systems.

When determining these values the business cannot be expected to know about any specific system – the business knows about the processes that make the business run, such as the process to take an order from a customer, the process to generate an invoice, the process to accept payment from a customer or to generate payslips for staff.

Your job as the Disaster Recovery planner is to then take the business processes and to map each process to the underlying system/s. For example, the process of accepting an order from a customer may reference the following systems (and by systems, we include other IT related infrastructure).

- The server running the ERP system (such as Baan or SAP)

- A server running the database that the ERP system uses (the Oracle, DB2 or MySQL server for example)

- The printer at the warehouse that prints the order detail slip

- The data line that links the production data center (where the ERP and database system sits) and the warehouse where the printer is

- The related networking infrastructure such as the LAN hubs / switches and the router for the WAN data lines

While ultimately the Disaster Recovery plan will contain procedures for individual system recovery which could be used in the event of a single system failing, you need to always plan for a wider scale disaster where multiple systems are affected simultaneously. This is why it is critical to understand the flows of data and systems involved for each business process.

While it the responsibility of the business to set the RTO and RPO objectives, we’ll look at these concepts in a bit more detail as it is essential that you have an understanding of these concepts, and in smaller organizations the DR planner may be the person driving the business to develop these objectives.

The Recovery Time Objective

Essentially the Recovery Time Objective (RTO) is the measurement of how long the business can survive without the systems being in place to run the specific business process. This may vary from zero (ie, the underlying systems always need to be available) or could run as long as days or weeks (if there are sufficient manual processes in place for the process to continue for this length of time without the systems).

While some processes may only be run at certain times – such as a payroll system generating payslips and making salary payments – these systems may still have a very low RTO since a disaster can occur at any point in time. For example, if the payroll system was unavailable just after payday, the system may not be required again for close to 30 days. However if a disaster occurred the day prior to the monthly salary run, then the system would need to be up within 24 hours.

When determining the RTO for a business process, it is important to ensure that the worst case scenario is planned for – i.e. plan for the payroll system going down just before payroll processing.

Another important point to consider when determining the RTO for a business process is the amount of time that would be required to catch up on lost processing time. For example, their may be a manual system that could be used to process customer orders if the IT system was down. However, once the system becomes available again all the manual transactions would need to be captured. The capturing of all the manual transactions would need to take place while staff continued to capture current orders, meaning that overtime or additional staff may be required to catch up on processing manual transactions.

For some systems and in environments that run around the clock, it may become very difficult or nearly impossible to catch up on the capturing of the manual transactions, or in the least it will add significant time to full recovery. Therefore it is critical that this is considered when setting the RTO. This is also related to the other metric we need to determine – the Recovery Point Objective.

The Recovery Point Objective

The Recovery Point Objective (RPO) is a measure of how much data can be lost when a disaster occurs. This is effectively the difference in time between when the disaster happens and when the last backup occurred.

If a disaster occurs at 4pm and the last backup took place at 8pm the night before, then all transactions that took place in those 20 hours will be lost. Depending on the system and the organization, these transactions will most likely need to be recaptured and the business needs to plan for how the information required to recapture those transactions will be determined.

The amount of time that would be required to recapture those transactions must also be considered.

In the case of an email server, the business may determine that loosing up to 24 hours worth of emails will be acceptable to the business and they may accept that the lost emails may never be recovered.

For a transaction system that captures a high volume of customer orders, the business may decide that it would not be possible to find the information required to recapture those orders and that it is critical that no customer order be missed and therefore that no data can be lost. This will effectively set an objective of zero loss for the RPO.

Keeping it real

While it is up to the business to set these objectives, it is up to the Disaster Recovery planner to give the business the information they require to keep these objectives real.

What is meant by this is that the business may initially tell the Disaster Recovery planner that they require a zero point RTO and RPO for a certain business process – no data loss and no time loss. It is then up to the Disaster Recovery planner to give the business an estimate on the cost for providing high availability which may then cause the business to re-think a more realistic objective.

Ultimately the first reaction of most business unit heads would be to say that they cannot accept any data loss or time loss for their processes. However once they understand the cost that would be involved to provide that level of high availability cover, they may look more realistically at what plans they could put in place to manually capture transactions for a period of time while the system was unavailable and how they could recapture a certain amount of lost transactions.

In reality, there is a good chance that the first objective stated by the business may require a cost that the business cannot accept. When this happens, the business needs to seriously consider their risk profile. Some businesses will be willing to take a higher risk to keep the costs lower, while other businesses will have a lower risk profile and be able to justify the higher costs.

There is no right and wrong numbers here. What is required though is that the business seriously consider the implications of a disaster through proper consultation with the system users and that they put in the effort to determine if there are short term alternatives to the automated IT process in the event of a disaster. Then once the cost implications have been considered, they make a decision based on their organizations risk profile and that they be prepared to justify that decision to the organisations stakeholders in the event of a disaster (and stakeholders include stock holders, staff, customers, federal / government departments, etc).

A final note on this topic – the board of directors of the organisation must be made aware of the risk that the business believes they are willing to accept once the RTO and RPO objectives have been set, and before implementation of the plans that are built around these objectives. It is critical that the highest levels of the organization are aware of the risks and that they are willing to accept responsibility for the risks the business is exposed to in the event of a disaster.

 

Disaster Recovery Books from Amazon