business-continuity

Installation
SKILL.md

Business Continuity Planning

Develop and maintain business continuity capabilities including Business Impact Analysis, communication plans, recovery procedures, and testing schedules for organizational resilience.

When to Use

  • Developing a formal Business Continuity Plan (BCP) for the organization
  • Conducting a Business Impact Analysis (BIA) to prioritize recovery efforts
  • Establishing communication plans for crisis scenarios
  • Defining recovery procedures for critical business processes
  • Scheduling and conducting BCP exercises and tests
  • Meeting compliance requirements for continuity planning (SOC 2, ISO 27001, HIPAA, FedRAMP)

BCP Framework

bcp_phases:
  1_governance:
    actions:
      - Obtain executive sponsorship and funding
      - Assign BCP coordinator and team
      - Define BCP scope and policy
      - Establish BCP committee with cross-functional representation
    deliverables:
      - BCP policy statement
      - BCP team charter and roster
      - Scope document

  2_analysis:
    actions:
      - Conduct Business Impact Analysis (BIA)
      - Perform risk assessment for continuity threats
      - Identify critical business processes and dependencies
      - Determine recovery priorities and resource requirements
    deliverables:
      - BIA report
      - Risk assessment report
      - Critical process inventory

  3_strategy:
    actions:
      - Select recovery strategies for each critical process
      - Identify alternate work arrangements (remote, alternate site)
      - Define technology recovery strategies (DR plan)
      - Establish vendor and supply chain contingencies
    deliverables:
      - Recovery strategy document
      - Technology recovery plan
      - Alternate site arrangements

  4_plan_development:
    actions:
      - Write detailed recovery procedures
      - Develop communication plans (internal and external)
      - Create emergency response procedures
      - Document roles, responsibilities, and contact information
    deliverables:
      - Business Continuity Plan document
      - Communication plan
      - Emergency response procedures
      - Contact lists and call trees

  5_testing:
    actions:
      - Develop test plan and schedule
      - Conduct exercises (tabletop, functional, full-scale)
      - Evaluate results and identify gaps
      - Update plans based on lessons learned
    deliverables:
      - Test plan
      - Exercise reports
      - Updated BCP based on findings

  6_maintenance:
    actions:
      - Review and update BCP annually (minimum)
      - Update after significant organizational changes
      - Refresh BIA when business processes change
      - Maintain training and awareness program
    deliverables:
      - Annual BCP review record
      - Updated BIA (if changes occurred)
      - Training completion records

Business Impact Analysis Template

bia_template:
  process_assessment:
    process_name: ""
    process_owner: ""
    department: ""
    description: ""

    criticality_classification:
      mission_critical:
        max_tolerable_downtime: "0-4 hours"
        description: "Failure causes immediate, severe impact to customers or revenue"
        examples:
          - Payment processing
          - Authentication and authorization
          - Core API serving customer requests
          - Order fulfillment

      essential:
        max_tolerable_downtime: "4-24 hours"
        description: "Failure causes significant degradation but not complete loss"
        examples:
          - Customer support systems
          - Reporting and dashboards
          - Email and notifications
          - Billing and invoicing

      important:
        max_tolerable_downtime: "1-3 days"
        description: "Failure causes inconvenience and workarounds are available"
        examples:
          - Internal collaboration tools
          - Analytics and BI platforms
          - HR self-service systems
          - Knowledge base

      non_essential:
        max_tolerable_downtime: "3-7 days"
        description: "Failure has minimal operational impact"
        examples:
          - Development and test environments
          - Training platforms
          - Archive systems

    impact_categories:
      financial:
        revenue_loss_per_hour: ""
        penalty_or_fine_risk: ""
        recovery_cost_estimate: ""

      operational:
        affected_employees: ""
        affected_customers: ""
        workaround_available: "yes/no"
        workaround_description: ""

      reputational:
        customer_visibility: "high/medium/low"
        media_attention_risk: "high/medium/low"
        regulatory_reporting_required: "yes/no"

      legal_regulatory:
        compliance_impact: ""
        contractual_sla_breach: "yes/no"
        sla_penalty_details: ""

    dependencies:
      technology:
        - system: ""
          rto: ""
          rpo: ""
          dr_strategy: ""
      people:
        - role: ""
          minimum_staff: ""
          remote_capable: "yes/no"
      vendors:
        - vendor: ""
          service: ""
          sla: ""
          alternative: ""
      facilities:
        - location: ""
          alternative: ""

    recovery_requirements:
      rto: ""
      rpo: ""
      minimum_recovery_level: "Description of minimum acceptable service"
      full_recovery_target: "Time to full normal operations"

Communication Plan

communication_plan:
  activation_criteria:
    - Event affecting multiple critical systems
    - Physical facility unavailable
    - Pandemic or workforce availability crisis
    - Major vendor/partner outage
    - Cybersecurity incident with operational impact

  internal_communication:
    executive_notification:
      who: "CEO, CTO, CFO, VP Engineering, VP Operations"
      when: "Within 15 minutes of BCP activation"
      method: "Phone call (primary), SMS (secondary)"
      message_template: |
        BUSINESS CONTINUITY EVENT ACTIVATED
        Incident: [Brief description]
        Impact: [Systems/processes affected]
        Status: [Current state]
        Next update: [Time]
        Bridge call: [Number/link]

    team_notification:
      who: "All affected department leads and their teams"
      when: "Within 30 minutes of BCP activation"
      method: "Slack/Teams (primary), Email (secondary), SMS (tertiary)"
      message_template: |
        BCP ACTIVATED - [Event Type]
        What happened: [Description]
        What is affected: [Systems/services]
        What to do: [Immediate actions for your team]
        Status updates: [Channel/frequency]
        Questions: Contact [BCP coordinator]

    all_staff_notification:
      who: "All employees"
      when: "Within 1 hour of BCP activation"
      method: "Email, Slack/Teams announcement, intranet"
      content: "Situation summary, impact on work, expectations"

    status_updates:
      frequency: "Every 2 hours during active event, daily after stabilization"
      channel: "Dedicated Slack channel, email distribution list"
      content: "Current status, actions taken, next steps, timeline"

  external_communication:
    customers:
      who: "Affected customers"
      when: "Within 2 hours of BCP activation (if customer-facing impact)"
      method: "Status page update, email, in-app notification"
      message_template: |
        We are currently experiencing [issue description].
        Impact: [What customers may notice]
        Status: We are actively working to resolve this.
        Updates: Follow our status page at status.example.com
        ETA: [Estimated resolution time or "investigating"]

    regulatory:
      who: "Applicable regulatory bodies"
      when: "Per regulatory requirements (e.g., 72 hours for GDPR breach)"
      method: "Formal notification per regulatory procedure"

    media:
      who: "Press inquiries"
      when: "Only if media attention occurs"
      method: "Prepared statement through communications team"
      rule: "All media inquiries routed to designated spokesperson"

    vendors_partners:
      who: "Critical vendors and business partners"
      when: "Within 4 hours if partner services affected"
      method: "Direct contact via relationship manager"

  contact_lists:
    maintenance: "Updated quarterly"
    storage: "Accessible offline (printed, mobile app, cloud-independent)"
    includes:
      - BCP team members (name, role, phone, email, alternate phone)
      - Executive team
      - Department leads
      - Key vendor contacts
      - Regulatory contacts
      - Legal counsel
      - Insurance broker
      - PR/communications firm

Recovery Procedures

recovery_procedures:
  immediate_response:
    step_1: "Incident commander assesses situation and declares BCP activation"
    step_2: "Notify BCP team and establish command structure"
    step_3: "Activate communication plan"
    step_4: "Assess damage and determine scope of disruption"
    step_5: "Initiate appropriate recovery procedures based on scenario"

  scenario_specific:
    data_center_or_region_outage:
      - Activate DR failover procedures
      - Redirect traffic to DR region
      - Verify service restoration
      - Communicate status to stakeholders
      - Plan return to primary when available

    cybersecurity_incident:
      - Engage incident response team
      - Contain the threat (isolate affected systems)
      - Assess data impact and potential breach
      - Activate forensic investigation
      - Restore from known-good backups if needed
      - Notify legal and regulatory as required

    pandemic_workforce_disruption:
      - Activate remote work procedures
      - Verify VPN and remote access capacity
      - Redistribute critical functions if staff unavailable
      - Implement shift rotations to maintain coverage
      - Assess vendor ability to maintain service levels

    key_vendor_failure:
      - Assess impact on dependent business processes
      - Activate vendor contingency plan
      - Engage alternate vendor if available
      - Implement manual workarounds as needed
      - Communicate impact to affected stakeholders

    facility_unavailable:
      - Account for all personnel safety
      - Activate alternate work site arrangements
      - Redirect mail and deliveries
      - Set up temporary communication channels
      - Assess timeline for facility restoration

  stabilization:
    - Monitor recovered services continuously
    - Address any residual issues
    - Begin planning return to normal operations
    - Continue stakeholder communication
    - Document all actions and decisions

  return_to_normal:
    - Develop return-to-normal plan
    - Execute failback procedures (if DR was activated)
    - Verify data consistency and integrity
    - Restore standard operating procedures
    - Conduct post-event review
    - Update BCP based on lessons learned

Testing Schedule and Types

testing_schedule:
  tabletop_exercise:
    frequency: "Quarterly"
    duration: "2-3 hours"
    participants: "BCP team, department leads, executive sponsor"
    format: "Facilitated discussion of a scenario"
    scenarios_to_rotate:
      - Major cloud provider region outage
      - Ransomware attack on production systems
      - Key employee unavailability (bus factor scenario)
      - Critical vendor goes out of business
      - Office building inaccessible
    output: "Exercise report with findings and action items"

  functional_exercise:
    frequency: "Semi-annually"
    duration: "4-8 hours"
    participants: "BCP team, IT operations, affected departments"
    format: "Execute specific recovery procedures without full disruption"
    examples:
      - "Activate remote work for one department for a day"
      - "Failover a non-production database and verify application connectivity"
      - "Execute communication plan and verify contact list accuracy"
      - "Restore a critical system from backup in an isolated environment"
    output: "Functional test report with measured recovery times"

  full_scale_exercise:
    frequency: "Annually"
    duration: "1-2 days"
    participants: "All BCP team members, IT, communications, management"
    format: "Simulate a major disruption and execute full recovery"
    includes:
      - "Activate BCP command structure"
      - "Execute DR failover for production systems"
      - "Activate communication plan"
      - "Operate from alternate arrangements for set period"
      - "Execute failback and return to normal"
    output: "Full exercise report with comprehensive metrics and lessons learned"

  testing_metrics:
    - "Time to activate BCP command structure"
    - "Time to complete communication notifications"
    - "Contact list accuracy (% reachable)"
    - "Actual RTO vs. target RTO per system"
    - "Actual RPO vs. target RPO per system"
    - "Number of issues identified"
    - "Number of runbook corrections needed"

BCP Maintenance Checklist

bcp_maintenance_checklist:
  quarterly:
    - [ ] Contact lists verified and updated
    - [ ] Tabletop exercise conducted
    - [ ] BCP team roster reviewed
    - [ ] Vendor contact information verified
    - [ ] Communication channels tested

  semi_annually:
    - [ ] Functional exercise conducted
    - [ ] Recovery procedures reviewed for accuracy
    - [ ] Technology dependencies verified
    - [ ] Vendor continuity capabilities confirmed

  annually:
    - [ ] Full-scale exercise conducted
    - [ ] Business Impact Analysis refreshed
    - [ ] Risk assessment updated
    - [ ] BCP document fully reviewed and updated
    - [ ] Executive review and sign-off obtained
    - [ ] Training completed for all BCP team members
    - [ ] Lessons learned from all exercises incorporated

  triggered_by_change:
    - [ ] New critical business process added
    - [ ] Major organizational restructuring
    - [ ] Technology platform migration
    - [ ] New regulatory requirement
    - [ ] Significant vendor change
    - [ ] Actual disruption event (post-event update)

Best Practices

  • Secure executive sponsorship: BCP without leadership commitment will not be taken seriously
  • Base recovery priorities on Business Impact Analysis, not assumptions or technical preferences
  • Test the communication plan independently: it fails more often than the technology recovery
  • Maintain contact lists as if your primary systems are unavailable (offline copies, mobile access)
  • Conduct tabletop exercises quarterly at minimum: they are low-cost and high-value for identifying gaps
  • Include non-IT scenarios in planning (pandemic, facility loss, key personnel unavailability)
  • Define clear activation criteria so there is no ambiguity about when to invoke the BCP
  • Keep the BCP document practical and actionable, not a shelf document written for auditors
  • Update the BCP after every significant organizational or technology change
  • Review and incorporate lessons from every exercise and every real event into the plan
Weekly Installs
31
GitHub Stars
18
First Seen
5 days ago