Get the FREE Ultimate OpenClaw Setup Guide →

disaster-recovery

Scanned
npx machina-cli add skill BagelHole/DevOps-Security-Agent-Skills/disaster-recovery --openclaw
Files (1)
SKILL.md
1.6 KB

Disaster Recovery

Implement disaster recovery strategies and procedures.

DR Metrics

recovery_metrics:
  RTO: Recovery Time Objective
    - Maximum acceptable downtime
    - How long to restore service
    
  RPO: Recovery Point Objective
    - Maximum acceptable data loss
    - How much data can be lost

DR Strategies

StrategyRTORPOCost
Backup & RestoreHoursHours$
Pilot LightMinutes-HoursMinutes$$
Warm StandbyMinutesSeconds$$$
Multi-Site ActiveNear-zeroNear-zero$$$$

AWS Multi-Region

# Cross-region RDS replica
aws rds create-db-instance-read-replica \
  --db-instance-identifier dr-replica \
  --source-db-instance-identifier prod-db \
  --source-region us-east-1 \
  --region us-west-2

# S3 cross-region replication
aws s3api put-bucket-replication \
  --bucket source-bucket \
  --replication-configuration file://replication.json

DR Testing

dr_test_schedule:
  tabletop: Quarterly
  component_failover: Monthly
  full_failover: Annually
  
test_checklist:
  - [ ] Verify backup integrity
  - [ ] Test failover procedures
  - [ ] Validate data consistency
  - [ ] Measure actual RTO/RPO
  - [ ] Document lessons learned

Best Practices

  • Regular DR testing
  • Automate failover where possible
  • Document all procedures
  • Update runbooks after tests

Source

git clone https://github.com/BagelHole/DevOps-Security-Agent-Skills/blob/main/compliance/continuity/disaster-recovery/SKILL.mdView on GitHub

Overview

This skill guides the implementation of disaster recovery strategies and runbooks to meet RTO and RPO targets. It covers DR metrics, strategy options, AWS multi-region options, testing, and maintaining up-to-date runbooks for business continuity.

How This Skill Works

Define DR metrics (RTO and RPO) to establish acceptable downtime and data loss. Choose a DR strategy (Backup & Restore, Pilot Light, Warm Standby, or Multi-Site Active) and configure backups, replication, and automated failover. Regularly run DR tests and update runbooks based on lessons learned.

When to Use It

  • Planning for business continuity and resilience in case of outages.
  • Assessing data loss risk and aligning RPO targets with business needs.
  • Implementing cross-region resiliency using AWS multi-region patterns.
  • Validating DR readiness through scheduled testing (tabletop, component failover, full failover).
  • Documenting and updating DR runbooks after tests and incidents.

Quick Start

  1. Step 1: Define DR metrics (RTO, RPO) and select a DR strategy (Backup & Restore, Pilot Light, Warm Standby, or Multi-Site Active).
  2. Step 2: Implement backups/replication and automate failover for the chosen strategy.
  3. Step 3: Schedule regular DR tests (tabletop, component failover, full failover) and update runbooks after each test.

Best Practices

  • Regular DR testing
  • Automate failover where possible
  • Document all procedures
  • Update runbooks after tests
  • Consider cross-region replication (AWS Multi-Region) to improve resilience

Example Use Cases

  • A company uses Backup & Restore with weekly backups and hours-long RTO, ensuring rapid recovery from data loss.
  • An application implements Pilot Light with a cross-region RDS replica to enable quick restoration in another region.
  • A fintech app deploys Warm Standby to achieve minimal downtime and conducts frequent failover checks.
  • An enterprise runs Multi-Site Active across two Regions with near-zero RTO/RPO for critical services.
  • DR testing schedules include tabletop (quarterly), component failover (monthly), and full failover (annually) with a lessons-learned process.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers