Importance of Business Continuity and Disaster Recovery Planning
BCDR planning is to achieve High Availability for your business application, services, data. For example, your application is hosted in a data center that resides in a natural disaster zone like an Earthquake, Flood, Tsunami, etc. When these such a bad event will occur there quite a high chance to lose data/application forever even the whole data center that can not be recovered at all if the organization does not have the right BCDR planning.
To support business continuity your company must have disaster recovery plans and there's a mixture of virtualized and physical systems at the data center.Business Continuity and Disaster Recovery(BCDR)
Whenever systems are unavailable can cause your company to lost revenue. Generally, every application and services come with SLA(Service Level Agreement) so the company also might face financial penalties for breaking agreements written for the availability of the services you provide.
BCDR plans are documented steps that the company prepares up that cover the scope and actions to be taken when a disaster or outage happens. Each outage is assessed on its own merit. For example, a disaster recovery plan comes into action when a whole datacenter having a power outage, internet outage, etc.
For the example scenario, a natural disaster occurred like an earthquake, Tsunami, etc. and damaged communications lines made the datacenter or region where your application is hosted and running so your application is useless until that region is up.
A disaster of this size might bring services down for days, or more than 24 hours, so a full BCDR plan must be invoked to get the service back online.
As part of your BCDR plan for your applications
- Identify the recovery time objectives (RTOs)
- Identify the recovery point objectives (RPOs)
Both objectives help to realize the maximum tolerable hours that your business can be without specified services, and what the data recovery process should be.
Recovery Time Objective(RTO)
An RTO is a measure of the maximum amount of time your business can survive after a disaster before normal service is restored.For Example,
- Your RTO is 12 hours, which means that operations can continue for 12 hours without the business's core services functioning.
- If the downtime is 24 hours, your business would be seriously harmed.
Recovery Point Objective(RPO)
An RPO is a measure of the maximum amount of data loss that's acceptable during a disaster. A business can typically decide to do a backup every 24 hours, 12 hours, or even in real-time. incase of disaster, always some data loss.For example,
- Let's say your backup occurred every 24 hours, at midnight
- A disaster happened at 9:00 AM the following day,
- So 9 hours of data would be lost.
- If your application RPO was 12 hours, It would be fine because only 9 hours passed(still 3 hours remaining)
- If the RPO was 4 hours, for sure there would be a problem and damage would occur to the business(5 hours over in this given scenario).
Bonus Tips:
Whenever you planning to host your application/services (including all the services like the frontend, web API, database, etc).- You should host in two different physical regions (like east US and west US)
- That way you will be having the secondary and primary region
- So at least your services can be available up and running if a disaster occurred in one region
- See the below diagram for the example of a Web Application and related services (BCDR)
Credit to Microsoft Docs(MSND)- Fig from MSND |
Comments
Post a Comment