Disaster Recovery Guidance Banner

Sponsored Links

Recommended Reading

Home Tutorials Documentation Disaster Recovery Documentation Overview
Disaster Recovery Documentation Overview Print E-mail
Disaster Recovery Tutorials
Written by Gareth Eagar   

Once you have selected a Disaster Recovery strategy and laid the foundations for Disaster Recovery in your organization, you need to put together the documentation that will make up your Disaster Recovery Plan. For all except the smallest of organizations this will be more than just one document and these plans will need to be constantly reviewed and updated. In this tutorial we give an overview of the DR documentation you'll be putting together.

What's in a DR plan?

The documentation that will form your Disaster Recovery Plan (DRP) is going to consist of multiple documents that all need to be 'living' documents - that is, they can't be documents that you create once and forget about. These documents should be almost continually changing in line with changes in your IT environment.

These DR documents will also need to be managed carefully, which includes both managing changes to the documents and also storing them in such a way that they are easily available to a remote recovery team at the time of a disaster incident. We'll cover management of DR documents in detail in a separate tutorial.

The following are the primary types of documents that you'll need to create:

The Master Plan

The Master Plan is effectively going to be your WAR plan - it's not going to be a technical document but rather it will guide you through the recovery process, all the way from declaring a disaster to eventually moving out of recovery mode and back to normal production operations. This document will reference all the other Disaster Recovery documentation that you create.

Recovery Site Guide

The objective of the Recovery Site Guide is to provide all relevant information relating to your recovery site (whether it is your own site or a commercial DR site) to the people that will be attending the recovery site. This will include a diagram showing the layout of the recovery site, contact details for recovery site staff, extension lists for the phone system, health and safety information for the site, access control and security details, a list of restaurants, hotel and shops close to the recovery site, etc.

It is important that this document is made available to all who will be attending the recovery site (including users who will be based at the recovery site during a disaster event, end users who will be involved with application testing on-site, auditors who will be monitoring test events, etc)

Technical Recovery Procedures

You will have multiple technical recovery procedure documents, which will be the detailed technical procedures that will guide your system administrators through the process of rebuilding your IT infrastructure or switching over to your backup site.

These documents will either be for a single system, or for a small group of systems (such as a dependency group). It is important to ensure that these procedures do not become bloated by attempting to cover too many systems in a single document. As some notes will be made in the documentation during recovery events, it is important to avoid a situation where each member of a recovery team has a copy of the same document even though they are each recovering different systems. It is far cleaner to have each member of the recovery team work with documentation specific to the system they are recovering at that point in time.

The recovery procedures should cover all the technical steps required to recover your systems or to switch over to the stand-by systems, as well as the steps to perform basic verification of the recovery (this is not detailed system testing such as ensuring data integrity, but more along the lines of ensuring that services which should start do so, that users can connect and login to the system, the databases are consistent and available, etc).

Support / Reference Guides

The Technical Recovery Procedures discussed in the previous section should generally contain procedures only, not other supporting documentation such as system configuration information, license key details, support contact information, etc.

All information that may be common to multiple systems (such as details of hardware and software support, agent license keys, etc) should be contained in a separate document. If information of this sort that is common to multiple systems is contained in the procedures for each system, you will be required to update this information in multiple places when changes occur and run a chance of missing a document update and having contradictory information in documents.

In certain cases, the support and reference guide may even contain some recovery steps that are common to all systems (such as the installation of the backup agent). This is a personal choice - some people prefer all technical recovery steps needed for recovery of a system to be in a single document, while others prefer to only have to update common procedures once (such as installation of the backup agent). As a general rule though, the team responsible for recovering a system should not have to refer to more than 2 documents for recovery.

For example, the DR project lead may give each recovery team two documents - one would be the Technical Recovery Procedures document specifically relating to the system (or dependency group of systems) they are recovering, and the other would be a copy of the support and reference guide that applies to all systems. The Technical Recovery Procedures may then refer the recovery team to the support and reference guide for instructions on how to install the backup agent (which is common to all systems), for getting license key information and for getting information on additional support contacts (hardware and software support).

 

 

 

 

 

 

 

Disaster Recovery Books from Amazon