The world is an unpredictable place, and even though the online world seems as though it’s always up and running, disasters can strike. Disaster Recovery (DR) is a form of security planning that ensures an organization is as insulated as possible from the effects of a major negative event that interrupts any number of areas of operation.
What is the Definition of Disaster Recovery?
Simply put, Disaster Recovery consists of a plan to resume normal business processes after disaster strikes. A DR plan is developed the helps the organization regain access to data, networking technology infrastructure, power, connectivity, and hardware and software. While these factors play into the planning processes for cloud-based disaster recovery, a strong disaster recovery plan includes solutions for facility damage and destruction as well. As such, disaster recover activities extend to include logistical considerations ranging from finding alternate work locations and sourcing computers for employees to transportation for the workforce.
A Disaster has Happened – Now What?
Disasters can take various forms that include hurricanes, tornadoes, and earthquakes that negatively impacts daily onsite operations and prevents access to critical applications. However, disasters can also take the form of hacks that put cybersecurity at risk and even massive power outages that take the physical business location offline and block the availability of applications and networking technology. In any case, when a disaster strikes, there is a short period of time in which action must be taken to ensure the impact on normal operations is limited.
When a disaster does strike, it’s time to implement the disaster recovery plan. A good DR plan starts with a disaster recovery team. This team consists of IT professionals with well-defined roles and responsibilities. A disaster recovery team usually consists of in-house IT professionals, a third party IT provider, or even a mix of both for companies that employ hybrid solutions. It is the goal of the disaster recovery team to understand the broader IT infrastructure at risk and the mission-critical data or other resources. This group can decipher what hardware, software, and/or systems were impacted by a disaster and can coordinate with any outside provider of cloud-based services or applications to ensure normal operations resume as quickly as possible.
What is a Disaster Recovery Plan?
A disaster recovery plan is the specific, documented plan to respond to unplanned incidents, be they natural disasters, power outages, or cyber intrusion. A DR plan consists of step-by-step processes that minimize the impact of a disaster to help an organization recover its mission-critical functions as quickly as possible. The average disaster recovery plan includes an analysis of business processes and continuity needs.
In the process of developing a plan to recover, an organization should complete a business impact analysis (BIA) and a risk analysis (RA). These two steps serve to establish a recovery time objective (RTO) and a recovery point objective (RPO). The former refers to the maximum tolerable length of time systems, applications, and other operations can be down. Recovery point refers to the age of the files that must be restored from backup storage in order for normal operations to resume.
There are various types of backup that can be established as part of a disaster recovery plan. These are typically tailored to a particular environment and include:
- Virtual disaster recovery plan: virtualization plans provide opportunities to respond quicker and more efficiently to disasters using virtual machines.
- Network disaster recovery plan: this type of plan focuses strictly on networks and connectivity, such as performance following a disaster.
- Cloud disaster recovery plan: these plans focus on concepts as simple as file backup or those as complex as complete replication.
- Data center recovery plan: this plan focuses extensively on data recovery, data center facilities, and technology infrastructure.
Why You Need a Disaster Recovery Plan
Planning for a disaster is something every business should do. Even the smallest data loss, such as 100 files or fewer, can result in downtime and lost revenue. However, cost is just one of the many reasons that organizations should undertake the planning process to develop a disaster response. The modern cyber environment is full of vulnerabilities for a business, from virtual outages to loss of physical infrastructure in a data center. When something such as a security measure fails and exposes a company to risk such as a loss of high availability, the possible outcomes include productivity losses, financial costs, and permanent data loss.
With the right disaster recovery solutions in place, businesses can minimize the negative impact on access to resources, connectivity to a data center, and data storage reliability.
What Components Should a Disaster Recovery Plan Include?
Planning for disaster recovery is not as simple as just authoring a document that offers guidelines for restoring access to data storage, connectivity to the data center, and access to applications. The most important thing to do is to identify the risk or risks facing regular business operations. This is the point of performing the business impact and risk analyses. Too many companies focus disaster recovery planning only on worst-case scenarios, which often results in tunnel vision that is limited only to the worst-case possibilities and ignores the disaster that is most likely to happen. Proper planning starts by focusing the recovery plan on managing the impacts of any disaster, ensuring all stakeholders understand the situation and their roles, and restoring business continuity as quickly as possible.
The planning process includes the creation of a disaster recovery team, as referenced earlier, and creating emergency response action items. The latter lays out the roles and responsibilities of each member of the disaster recovery team. Using the information on threats and vulnerabilities gathered during the BIA and RA processes, disaster recovery planning can then focus on the following key components of the plan:
- Establishing the scope of the activity
- Gathering network infrastructure documents pertinent to the disaster at hand
- Identifying the serious threats and vulnerabilities, as well as the critical assets
- Reviewing previous unplanned incidents and outages to discover how each was handled
- Identifying the existing disaster recovery strategies
- Building an emergency response team
- Review by management
- Disaster recovery testing
- Updating the plan as needed
It is critical that companies understand that disaster recovery planning, and the plan itself, is an ongoing process. Such solutions and plans are only as good as the information that forms the foundation of knowledge. As risks evolve, the plan has to evolve to remain relevant.
Who is Responsible for the Disaster Recovery Plan?
Disaster recovery planning should involve the entire team within an organization. While upper management should have the final say in approving a plan, everyone from management to entry-level employees should understand how the disaster recovery plan impacts their roles and what responsibilities it might put at their feet in helping the company as a whole recover. When it comes to the implementation of the actual plan, this is where the disaster recovery team comes into play.
What is Disaster Recovery Testing?
Even the best-laid plans can fail. This is why disaster recovery testing is important. The primary goal of disaster recovery testing is to ensure that the solutions established in a disaster recovery plan will actually work when needed. Testing disaster recovery solutions reveals whether the fallback systems are as foolproof as required. Will connectivity to the data center and data storage resume as quickly as needed? What about access to applications and other computing resources? Ongoing testing of a disaster recovery plan is important for another reason.
A plan is only as good as the individual team members involved in implementing disaster recovery protocols. Changes in personnel, varying skill levels of individuals involved in the disaster recovery team, and changes to the hardware and software infrastructures within the organization all impact the effectiveness of the overall plan. Regular disaster recovery testing ensures that the plan remains viable as these factors change with time.
As a part of any testing of disaster recovery solutions, be sure that employees have the disaster recovery document available that outlines scope, timeline, and communications. Issue an alert, follow procedures, and perform analysis on the impact to hardware, networks, software, data, and businesses phases. Disaster recovery testing can take on many forms, including:
- Paper test: the disaster recovery team reads through the recovery plan and analyzes the viability of its policies, procedures, benchmarks, and checklists
- Walkthrough test: a group of employees walks through the recovery plan to pinpoint any issues that should be addressed and modifications that may need to be made
- Simulation: similar to a fire drill in nature, disaster recovery teams practice implementing the plan in real life to ensure it is sufficient
- Parallel test: failover recovery systems are tested to ensure that each performs real business transactions to support key processes and applications
- Cutover test: this type of test is similar to a parallel test, except primary systems are actually disconnected during the test to ensure that full production workloads could be handled by fallback systems in the event of a disaster
What is the Difference Between Disaster Recovery and Business Continuity?
Disaster recovery planning and business continuity planning are similar, but are not the same concept. The two terms are often used incorrectly in an attempt to refer to the same thing. Disaster recovery planning, and testing of that plan, create specific steps an IT organization needs to take to recover the systems necessary in supporting normal business operations. On the other hand, business continuity planning lays out a broader plan of action to ensure the products and services a business offers remain available to customers at all times. This includes a BIA, RA, and a greater business continuity strategy. A disaster recovery plan can, however, be a part of the business continuity plan. The two should not be viewed as synonymous though.
What are the Differences Between Backups and Disaster Recovery?
Another commonly misinterpreted term in the disaster recovery orbit is backup. Backups and disaster recovery are similar concepts, but are not the same idea in practice. Backup methods include off-site tape and cloud storage. Disaster recovery is a broader, more comprehensive plan to address operational needs, access to applications, and connectivity to data center resources.
What is RTO and RPO in Disaster Recovery?
Recovery Time Objective (RTO) and Recovery Point Objective (RPO) have been addressed briefly above, but are too important to the planning process and implementation of a disaster recovery plan to cover in the short sentences above. Recovery Time Objective refers to the amount of time that technology and systems can remain down before data is recovered and normal functions resume. This various for any given business based on its industry, but in most cases businesses focus on discovering the maximum allowable RTO before the negative impacts of a disaster grow exponentially.
Recovery Point Objective refers to how frequently a system backs itself up. This is important because the disaster recovery process includes restoring data to the most recent backup point. If that point is too far in the past, vital data could be permanently lost. Based on vital operations, an RPO should be established that ensures a recovery point is chosen that puts the business at the least risk of damaging, permanent data loss.
What’s the Difference Between RTO and RPO?
The primary difference between RTO and RPO is the purpose of each. RTO is the broader metric of the two and focuses on the business as a whole, including all of the systems and technology involved in daily operations. RPO, on the other hand, is focused strictly on data and the company’s resiliency in the face of potential data loss.
What are the Five Major Elements of a Typical Disaster Recovery Plan?
In the face of critical systems failure during a disaster, it is important that the right elements are included in a disaster recovery plan to ensure business operations return to normal as quickly as possible. There are a lot of elements that could be included in a plan, but the following are some of the critical elements that should be included in the average disaster recovery plan to help avoid failure:
- Plan goals: what is the point of the disaster recovery plan? What systems are most critical and how will those be dealt with?
- Communication plan: this includes both an outline for effective communication between all employees and a list of role assignments so everyone is in the best position to aid in disaster recovery.
- Data continuity: systems in place to ensure that all items critical to normal operations are supported, whether that’s access to a data center or how to fulfill outgoing shipments.
- Authentication tools: make sure that the disaster recovery team has access to authentication tools, such as passwords and software licensing information to aid in the swift resumption of operations.
- Understanding of geographic risk factors: though no one can predict disasters, there are geographic risks factors (hurricanes in the Gulf of Mexico, earthquakes along the Pacific Coast) that can be planned for, even if they aren’t predictable.
Disaster recovery is something that no business should ignore. It is important to remember in today’s virtual environment that a disaster doesn’t have to strike an organization directly to have a devastating impact. For example, something such as a vital data center going offline that supports business operations from a distance can still have ramifications. All possible threats to normal business operations need to be understood and those risks taken into account when developing a disaster recovery plan.