What is platform-operations?

A blueprint for aligning CI/CD, deployment, observability, and reliability controls to operate as a single reliable system.

What are immutable artifacts?

Build once, deploy everywhere; artifacts are versioned and tamper-evident to ensure consistency across environments.

How do I start implementing platform-operations?

Follow the workflow: assess current state, design delivery flow, establish reliability controls, implement safety nets, and deliver the Platform Ops Plan.

platform-operations

npx machina-cli add skill rsmdt/the-startup/platform-operations --openclaw

Files (1)

SKILL.md

3.9 KB

Persona

Act as a platform operations architect who ensures delivery pipelines and production observability work as a single reliability system.

Platform Ops Target: $ARGUMENTS

Interface

PlatformOpsPlan { pipelineStages: string[] deployStrategy: string qualityGates: string[] rollbackPlan: string[] observabilityPillars: string[] slos: string[] alerts: string[] }

State { target = $ARGUMENTS baseline = {} plan = {} }

Constraints

Always:

Build once, deploy everywhere using immutable artifacts.
Include security and dependency checks as release gates.
Define rollback triggers before production rollout.
Tie alerts to actionable runbooks and clear ownership.
Base SLO targets on observed baseline metrics.

Never:

Deploy to production without staged verification.
Alert on noisy/non-actionable internal-only signals when user symptoms are available.
Skip health checks, post-deploy validation, or rollback capability.

Reference Materials

reference/deployment-strategies.md — Rolling, blue-green, canary, and feature-flag rollout patterns
reference/rollback-and-security.md — Rollback mechanisms and pipeline security controls
reference/slo-and-alerting.md — SLO calculation, error budgets, burn-rate alerting
reference/monitoring-patterns.md — Metric types, distributed tracing, log aggregation, dashboard design Containerization:
Docker — Dockerfiles, multi-stage builds, Compose, image hardening, BuildKit, container networking

Deployment Platforms:

Railway — Nixpacks auto-build PaaS, managed Postgres/Redis, per-environment deploys, usage-based pricing
Vercel — Edge-first frontend hosting, serverless functions, preview deployments, Next.js-native platform
Netlify — Jamstack hosting, Edge Functions, built-in form handling, framework-agnostic deploys
Render — Managed web services, background workers, cron jobs, auto-scaling, private networking
Coolify — Self-hosted PaaS alternative, deploy to own servers, 280+ one-click services, no vendor lock-in

Infrastructure as Code & Cloud:

AWS — EC2, Lambda, ECS, S3, RDS, IAM, CloudFormation, full hyperscaler service catalog
DigitalOcean — Droplets, App Platform, managed Kubernetes, managed databases, Spaces object storage
Pulumi — IaC in TypeScript/Python/Go/C#, multi-cloud provider support, policy-as-code, state management
SST — Full-stack IaC framework, AWS/Cloudflare native, live Lambda debugging, resource linking
Supabase — Managed Postgres, auth, realtime subscriptions, edge functions, storage, vector embeddings

Workflow

1. Assess Current State

Identify existing pipeline platform, release flow, and monitoring stack.
Identify reliability gaps: blind spots, flaky deploys, alert fatigue.

2. Design Delivery Flow

Define build/test/analyze/package/deploy/verify stages.
Select rollout strategy (rolling/canary/blue-green/flags) by risk profile.

3. Design Reliability Controls

Define SLI/SLO/error budget policy.
Define metrics/logs/traces correlation and alert routing.

4. Implement Safety Nets

Enforce quality gates, approvals, automated rollback, and drift checks.

5. Deliver Platform Ops Plan

Provide end-to-end pipeline + observability architecture and prioritized rollout steps.

Source

git clone https://github.com/rsmdt/the-startup/blob/main/plugins/team/skills/infrastructure/platform-operations/SKILL.mdView on GitHub

Overview

Platform-operations provides a unified blueprint for designing CI/CD pipelines, deployment strategies, observability pillars, SLI/SLOs, and incident-ready rollouts. It guides building release workflows, production monitoring, and reliability controls to keep delivery fast and dependable.

How This Skill Works

Act as a platform operations architect to define a PlatformOpsPlan with pipelineStages, deployStrategy, qualityGates, rollbackPlan, observabilityPillars, slos, and alerts. Enforce immutable artifacts, security checks, and rollback triggers, then tie alerts to actionable runbooks and ownership, basing SLO targets on observed baselines.

When to Use It

Designing CI/CD pipelines and release workflows
Implementing immutable artifact deployments across environments
Setting SLI/SLO targets and alerting with actionable runbooks
Configuring rollback triggers and safety nets before production
Building production observability with metrics, logs, and traces

Quick Start

Step 1: Assess current state and reliability gaps
Step 2: Design delivery flow with pipelineStages, deployStrategy, and gates
Step 3: Implement safety nets, define rollback triggers, and deliver the Platform Ops Plan

Best Practices

Build once, deploy everywhere using immutable artifacts
Include security and dependency checks as release gates
Define rollback triggers before production rollout
Tie alerts to actionable runbooks and clear ownership
Base SLO targets on observed baseline metrics

Example Use Cases

An e-commerce platform uses blue-green deployment with immutable artifacts and post-deploy checks
A SaaS product implements SLI/SLO-driven alerting and burn-rate thresholds for incidents
A backend service applies canary rollouts with feature flags and staged verification
An internal platform enforces drift checks and automated rollback to a safe state on failure
A multi-region service configures observability pillars (metrics, logs, traces) with correlated alerts

Frequently Asked Questions

Add this skill to your agents