ops-manager
Scannednpx machina-cli add skill karim-bhalwani/agent-skills-collection/ops-manager --openclawOps Manager Skill - DevOps, Deployment & Documentation
Overview
The Ops Manager skill ensures that software is deployable, maintainable, and well-documented. It bridges the gap between code and production.
Focus Areas
1. DevOps & Deployment
- Infrastructure-as-Code (IaC): Terraform, Docker Compose, and environment configuration.
- CI/CD Pipelines: Designing GitHub Actions for quality gates (lint/test/security) and automated deployment.
- Safe Releases: Blue-Green and Canary deployment strategies. Mandatory rollback procedures.
- Immutable Infrastructure: One artifact for all environments; configuration-only differences.
2. Technical Documentation (Diátaxis)
- Tutorials: Learning-oriented guides.
- How-To Guides: Task-oriented problem solving.
- Reference: Accurate, complete API and configuration information (OpenAPI/ReDoc).
- Explanations: Understanding-oriented conceptual docs.
3. Monitoring & Reliability
- Observability: Logging (RFC 5424), Metrics (Prometheus), and Tracing (OpenTelemetry).
- Health Checks: Defining Docker HEALTHCHECKs and API
/healthendpoints. - Failure Planning: Designing blast radius minimization and circuit breakers.
When to Use
- Configuring local/staging/production environments.
- Automating testing and deployment workflows.
- Updating documentation for developers or users.
- Planning releases and rollback strategies.
Outputs & Deliverables
- Primary Output: Deployment plan, technical documentation, and IaC templates (Dockerfile, CI yaml)
- Secondary Output: Monitoring and reliability configurations
- Success Criteria: Documented deployment steps, passing CI/CD pipeline, verified health checks
- Quality Gate:
guardianreview and production readiness approval before release
Constraints
- NO application business logic. Infrastructure only.
- NO direct database migrations without backup/rollback plan.
- All IaC must be version-controlled and tested.
Common Pitfalls
- Missing Rollback Plans: Deploying without a rollback procedure is reckless. Every deployment needs a documented "undo" plan.
- Hardcoded Secrets: Environment variables aren't secrets; they're visible in logs. Use proper secret management (AWS Secrets Manager, HashiCorp Vault).
- Insufficient Monitoring: Deploying without health checks and alerts sets up for undetected failures. Always deploy observability.
- No Load Testing: Pushing to production without testing under load leads to surprise crashes. Simulate expected peak traffic.
- Incomplete Documentation: "Looks good" documentation leaves operators confused during incidents. Use Diátaxis: Tutorials, How-Tos, Reference, Explanations.
- Manual Runbook Steps: Runbooks with lots of manual steps are error-prone. Automate as much as possible.
Integration Points
| Phase | Input From | Output To | Context |
|---|---|---|---|
| Requirements | architect, implementer | Deployment strategy | Understand performance and scale requirements |
| IaC Development | Tech stack decisions | Infrastructure templates | Generate Dockerfile, CI yaml, env templates |
| Documentation | API and service details | Technical docs | Create README, deployment guide, runbooks |
| Security Gate | Deployment ready | guardian | Security review before production deployment |
| Monitoring Setup | Application requirements | Observability config | Logging, metrics, tracing, health checks |
Source
git clone https://github.com/karim-bhalwani/agent-skills-collection/blob/main/skills/ops-manager/SKILL.mdView on GitHub Overview
Ops Manager focuses on making software deployable, maintainable, and well-documented. It bridges code and production through Infrastructure-as-Code, CI/CD pipelines, release planning, and comprehensive technical documentation.
How This Skill Works
It designs infrastructure templates (Terraform, Docker Compose) and builds GitHub Actions workflows with quality gates (lint/test/security). It defines safe release strategies (Blue-Green, Canary) and immutable infrastructure, producing deployment plans, IaC templates, and documentation. It also sets up observability, health checks, and a governance gate via guardian before production release.
When to Use It
- Configuring local, staging, and production environments
- Automating testing, validation, and deployment workflows
- Updating developer/user documentation and runbooks
- Planning releases and rollback strategies with safe deploys
- Setting up monitoring, health checks, and reliability configurations
Quick Start
- Step 1: Define scope (IaC, CI/CD, deployment docs) and establish constraints (no app logic, no risky migrations without backup).
- Step 2: Create versioned IaC templates (Terraform, Dockerfiles) and a CI YAML workflow with lint/test/security gates.
- Step 3: Add health checks, observability, and a deployment plan; route for guardian review and production readiness.
Best Practices
- Version-control and test all IaC templates (Terraform, Dockerfiles, CI YAML)
- Define explicit rollback plans and undo procedures for every deployment
- Integrate health checks, observability (RFC 5424 logging, Prometheus metrics, OpenTelemetry tracing)
- Adopt immutable infrastructure and separate environment configuration from code
- Manage secrets securely; avoid hardcoded values; use secret stores (e.g., AWS Secrets Manager, Vault)
Example Use Cases
- Create a Terraform + Docker Compose IaC package with a GitHub Actions pipeline that enforces lint, test, and security gates
- Implement Blue-Green or Canary deployment strategies with automated rollback procedures in the deployment plan
- Define Docker HEALTHCHECKs and expose a /health endpoint for service reliability checks
- Produce Diátaxis-style documentation: Tutorials, How-Tos, Reference (OpenAPI), and Explanations
- Configure monitoring and alerting using Prometheus, OpenTelemetry, and structured RFC 5424 logs