What is the Ops Manager skill used for?

It focuses on DevOps, CI/CD, IaC, release planning, deployment documentation, and monitoring—not on implementing business logic.

What outputs should I expect?

Deployment plan, IaC templates (Dockerfile, CI YAML), technical documentation, and monitoring/reliability configurations.

How is release quality enforced?

A guardian review and production readiness approval act as the quality gate before any release.

ops-manager

Scanned

npx machina-cli add skill karim-bhalwani/agent-skills-collection/ops-manager --openclaw

Files (1)

SKILL.md

3.8 KB

Ops Manager Skill - DevOps, Deployment & Documentation

Overview

The Ops Manager skill ensures that software is deployable, maintainable, and well-documented. It bridges the gap between code and production.

Focus Areas

1. DevOps & Deployment

Infrastructure-as-Code (IaC): Terraform, Docker Compose, and environment configuration.
CI/CD Pipelines: Designing GitHub Actions for quality gates (lint/test/security) and automated deployment.
Safe Releases: Blue-Green and Canary deployment strategies. Mandatory rollback procedures.
Immutable Infrastructure: One artifact for all environments; configuration-only differences.

2. Technical Documentation (Diátaxis)

Tutorials: Learning-oriented guides.
How-To Guides: Task-oriented problem solving.
Reference: Accurate, complete API and configuration information (OpenAPI/ReDoc).
Explanations: Understanding-oriented conceptual docs.

3. Monitoring & Reliability

Observability: Logging (RFC 5424), Metrics (Prometheus), and Tracing (OpenTelemetry).
Health Checks: Defining Docker HEALTHCHECKs and API /health endpoints.
Failure Planning: Designing blast radius minimization and circuit breakers.

When to Use

Configuring local/staging/production environments.
Automating testing and deployment workflows.
Updating documentation for developers or users.
Planning releases and rollback strategies.

Outputs & Deliverables

Primary Output: Deployment plan, technical documentation, and IaC templates (Dockerfile, CI yaml)
Secondary Output: Monitoring and reliability configurations
Success Criteria: Documented deployment steps, passing CI/CD pipeline, verified health checks
Quality Gate: guardian review and production readiness approval before release

Constraints

NO application business logic. Infrastructure only.
NO direct database migrations without backup/rollback plan.
All IaC must be version-controlled and tested.

Common Pitfalls

Missing Rollback Plans: Deploying without a rollback procedure is reckless. Every deployment needs a documented "undo" plan.
Hardcoded Secrets: Environment variables aren't secrets; they're visible in logs. Use proper secret management (AWS Secrets Manager, HashiCorp Vault).
Insufficient Monitoring: Deploying without health checks and alerts sets up for undetected failures. Always deploy observability.
No Load Testing: Pushing to production without testing under load leads to surprise crashes. Simulate expected peak traffic.
Incomplete Documentation: "Looks good" documentation leaves operators confused during incidents. Use Diátaxis: Tutorials, How-Tos, Reference, Explanations.
Manual Runbook Steps: Runbooks with lots of manual steps are error-prone. Automate as much as possible.

Integration Points

Phase	Input From	Output To	Context
Requirements	`architect`, `implementer`	Deployment strategy	Understand performance and scale requirements
IaC Development	Tech stack decisions	Infrastructure templates	Generate Dockerfile, CI yaml, env templates
Documentation	API and service details	Technical docs	Create README, deployment guide, runbooks
Security Gate	Deployment ready	`guardian`	Security review before production deployment
Monitoring Setup	Application requirements	Observability config	Logging, metrics, tracing, health checks

Source

git clone https://github.com/karim-bhalwani/agent-skills-collection/blob/main/skills/ops-manager/SKILL.mdView on GitHub

Overview

Ops Manager focuses on making software deployable, maintainable, and well-documented. It bridges code and production through Infrastructure-as-Code, CI/CD pipelines, release planning, and comprehensive technical documentation.

How This Skill Works

It designs infrastructure templates (Terraform, Docker Compose) and builds GitHub Actions workflows with quality gates (lint/test/security). It defines safe release strategies (Blue-Green, Canary) and immutable infrastructure, producing deployment plans, IaC templates, and documentation. It also sets up observability, health checks, and a governance gate via guardian before production release.

When to Use It

Configuring local, staging, and production environments
Automating testing, validation, and deployment workflows
Updating developer/user documentation and runbooks
Planning releases and rollback strategies with safe deploys
Setting up monitoring, health checks, and reliability configurations

Quick Start

Step 1: Define scope (IaC, CI/CD, deployment docs) and establish constraints (no app logic, no risky migrations without backup).
Step 2: Create versioned IaC templates (Terraform, Dockerfiles) and a CI YAML workflow with lint/test/security gates.
Step 3: Add health checks, observability, and a deployment plan; route for guardian review and production readiness.

Best Practices

Version-control and test all IaC templates (Terraform, Dockerfiles, CI YAML)
Define explicit rollback plans and undo procedures for every deployment
Integrate health checks, observability (RFC 5424 logging, Prometheus metrics, OpenTelemetry tracing)
Adopt immutable infrastructure and separate environment configuration from code
Manage secrets securely; avoid hardcoded values; use secret stores (e.g., AWS Secrets Manager, Vault)

Example Use Cases

Create a Terraform + Docker Compose IaC package with a GitHub Actions pipeline that enforces lint, test, and security gates
Implement Blue-Green or Canary deployment strategies with automated rollback procedures in the deployment plan
Define Docker HEALTHCHECKs and expose a /health endpoint for service reliability checks
Produce Diátaxis-style documentation: Tutorials, How-Tos, Reference (OpenAPI), and Explanations
Configure monitoring and alerting using Prometheus, OpenTelemetry, and structured RFC 5424 logs

Frequently Asked Questions

Add this skill to your agents