Even the best-laid plans can fail. This is why disaster recovery testing is important. The primary goal of disaster recovery testing is to ensure that the solutions established in a disaster recovery plan will actually work when needed. Testing disaster recovery solutions reveals whether the fallback systems are as foolproof as required. Will connectivity to the data center and data storage resume as quickly as needed? What about access to applications and other computing resources? Ongoing testing of a disaster recovery plan is important for another reason.
A plan is only as good as the individual team members involved in implementing disaster recovery protocols. Changes in personnel, varying skill levels of individuals involved in the disaster recovery team, and changes to the hardware and software infrastructures within the organization all impact the effectiveness of the overall plan. Regular disaster recovery testing ensures that the plan remains viable as these factors change with time.
As a part of any testing of disaster recovery solutions, be sure that employees have the disaster recovery document available that outlines scope, timeline, and communications. Issue an alert, follow procedures, and perform analysis on the impact to hardware, networks, software, data, and businesses phases. Disaster recovery testing can take on many forms, including:
- Paper test: the disaster recovery team reads through the recovery plan and analyzes the viability of its policies, procedures, benchmarks, and checklists
- Walkthrough test: a group of employees walks through the recovery plan to pinpoint any issues that should be addressed and modifications that may need to be made
- Simulation: similar to a fire drill in nature, disaster recovery teams practice implementing the plan in real life to ensure it is sufficient
- Parallel test: failover recovery systems are tested to ensure that each performs real business transactions to support key processes and applications
- Cutover test: this type of test is similar to a parallel test, except primary systems are actually disconnected during the test to ensure that full production workloads could be handled by fallback systems in the event of a disaster