Disaster Recovery and Continuous Operations Plan

Executive Summary

Reblaze has led the development and implementation of an IT centric Disaster Recovery Plan (DRP) focusing on the timely recovery of critical IT applications and infrastructure that support its core business.

To guide this effort a Risk Assessment and Business Impact Analysis (BIA) were conducted to both identify potential threats and establish the prioritized timeframes for the recovery of Reblaze’s core processes and systems.

Disaster Recovery Procedures

Reblaze has documented crisis management and recovery procedures, together with functions to ensure continuous DR training, testing and maintenance.

Third Party Dependencies

Reblaze relies on cloud vendors to supply all critical IT services such as servers and infrastructure.

Plan Maintenance

The COO will be responsible for maintaining the DR Plan by coordinating the following activities:

Regular evaluation of Disaster Recovery Plan (at least annually) to ensure it still aligns with business needs.
Regular testing (annually) of DR procedures to measure accuracy and efficiency.
Changes to the IT Infrastructure and Applications that may impact DR procedures.
Organizational changes that may impact DR team assignments and or procedures.

Any questions regarding Reblaze’s disaster recovery process should be directed to Yaniv Yagolnitzer, Chief of Operations.

Plan Scope

Overview

Business continuity planning is the process by which organizations address unplanned business interruptions with the goal of recovering critical business functions in a timely manner. Outlined in this document are the recovery procedures utilized by the Support Team and Management. Additionally, this document identifies the personnel comprising each team and the required resources to successfully complete the response and recovery efforts.

Assumptions

This plan was designed based on the following assumptions:

The people, skills and financial resources necessary to implement, develop, maintain and regularly test the plan are available internally.
Reblaze has significant flexibility as to where it will house its core employees, including a work from home strategy.
Active client data is available offsite and is viable and recoverable.

Recovery Objectives

As outlined in the Business Impact Analysis, the following recovery time objectives for critical business processes and support functions have been established (Note: the RTOs and RPOs identified by the Company Reblaze management team are forward looking in nature and represent what the organization is moving toward as additional technologies are applied. They should not necessarily be considered Reblaze’s current capabilities.)

Disaster Recovery Strategy

The Reblaze BC/DR strategy is focused on two primary goals,

Recovery of the organization’s technology to ensure that employees have tools they need to execute core job functions.
Ensure that customer cloud environments have not been disrupted.

Recovery Of System Access

Reblaze has engaged Google, AWS, and Azure (3rd party cloud vendors), to house all customer and organization supporting applications, databases and systems. This provides for a high availability environment with the requisite power redundancies (UPS and backup generators) and physical security.
Provided the key cloud systems are available, employee teams have the ability to work from a variety of locations including from home.
An event that impacts Reblaze office locations is minimally impactful to the organization. Employees at that location will work from home until an alternate work location can be identified.
Unanswered calls to Reblaze offices and support team are automatically routed to an external call center which has the cell phone numbers of oncall personnel. In case of office evacuation the calls will be automatically routed to this hotline.

Recovery Of Customer Environment

In case of disaster we will need to ensure that customers has a FW open for backup IP addresses.

There are several options available for clients when a disaster happens:

Multi-zone
Multi-region
Multi-vendor

Multi-zone (default):

This is the default option, and it ensures that in the same region servers are split across zones, keeping at least one server for each zone.

Multi-region:

This option will make sure that in case a region fails, traffic will be served from a different region. This option can be set as active or on demand. That is, in case an entire region of the vendor in use has failed, we can quickly opt out and route traffic to a different region (for example, if the entire Singapore failed, we can quickly switch to Taiwan or Tokyo). This option can be set as always active or on demand.

Multi-vendor:

In this option, Reblaze will swap an entire cloud vendor (for example, from AWS to GCP or Azure, and vice versa). This option can be set as always active, or on demand.

Crisis Management

IT Crisis Management is composed of the following processes:

Identify and communicate disaster events (Disaster Declaration).
Identify and establish the Emergency Operations Center (EOC).

Disaster Declaration

Any loss of operations capability will be considered an “Event” and should be noted and acted on immediately.

Disaster Declaration Authority

The following individuals are authorized to declare a disaster and should be the first to be notified once an event occurs. Once notified, they will initiate calls to the relevant Reblaze team members. As Reblaze is a lean organization a formal call tree is not required at this time, as the headcount and number of required calls is manageable. The person(s) identifying the event must notify management immediately.

As Reblaze is a small organization there are few reasonable alternate lead options. In the event that a disaster occurs and the primary team lead is unavailable, other CMT leads will take on the requisite responsibilities.

The DR Recovery Coordinator will keep a copy of this plan. A phone list of all personnel will be kept with the plan.

Disaster Declaration Decision

Using information gathered during the preliminary damage assessment, the CMT Leader should estimate the duration of the disaster. If the duration is over one hour, expected to last more than 2 hours AND it is determined that the Reblaze office or one of Reblaze’s Cloud environments is critically impacted, a disaster should be declared.

Responsibility for declaring a disaster rests with the CMT Leader. In his absence, the responsibility falls to the Alternative CMT Team Lead. The main goal of this disaster recovery plan is the timely recovery of critical Reblaze IT systems. As such, it is important to declare an emergency as soon as possible to allow recovery efforts to begin.

Emergency Operations Center

The Emergency Operations Center is a gathering place for the CMT to assess the events that have occurred, formulate an appropriate response and direct the recovery. As Reblaze is a small organization, only two EOCs will exist; (1) the Reblaze office itself and (2) a virtual EOC through an established Whatsapp group (Salamandra), or Conference Bridge.

Crisis Management Team

The Crisis Management Team is responsible for overseeing emergency response and recovery activities during a business interruption or disaster. The team works closely with the Disaster Assessment and Salvage, Business Resumption, and IT Recovery Teams to stabilize the situation, ensure the safety of Reblaze personnel, limit the impact of business interruptions and disasters, and recover critical business functions. The IT Crisis Management Team has oversight responsibilities for the entire response and recovery process.

Crisis Management Team Members

CMT Roles and Responsibilities

Below are descriptions of the roles and responsibilities of the CMT members. Several CMT procedures require team collaboration. These procedures include:

Conduct initial and secondary situation briefing; decide on reaction level.
Assess organization-wide impact taking into account all interdependencies.
Communicate anticipated impacts for all locations affected by outage.
Estimate downtime for key business functions.
Validate business continuity planning assumptions; implement contingency plans for invalid assumptions.
Make the decision to rebuild, restore or repair the facility or move to a new location.
Initiate open conference call number to EOC.
Establish a status reporting process and disseminate instructions to all employees. Identify the frequency of the status reports and how they should be submitted (e-mail, phone, whatsapp,etc’.).
Establish a specific client communication strategy.

Crisis Management Team Response Procedures

The IT Crisis Management Team will consult the following procedures during a business interruption or disaster.

Policy Management

Policies are to be reviewed annually and revised if needed. Policy revisions should be approved in writing by the COO, who will seek approval from the CEO if needed. Revisions are tracked in the Change Record and Approval Tables above. Pertinent changes should be communicated to the affected staff members within a reasonable time of the revision release.

PreviousReblaze Business Continuity Plan NextEscalation Path

Last updated 4 years ago