llmops-platform-engineering
npx machina-cli add skill BagelHole/DevOps-Security-Agent-Skills/llmops-platform-engineering --openclawLLMOps Platform Engineering
Design and operate an internal LLM platform that supports rapid experimentation without compromising reliability, cost, or compliance.
Outcomes
- Standardized path from experiment to production
- Safe model rollout with quality and safety gates
- Repeatable infra modules for inference, vector DB, and observability
- Clear ownership model across platform, app, and security teams
Reference Architecture
- Control Plane: model registry, prompt/version catalog, policy checks, eval pipeline.
- Data Plane: inference gateway, vector database, cache, feature store.
- Ops Plane: telemetry, alerting, SLO dashboards, cost analytics.
- Security Plane: IAM boundaries, secret rotation, content filters, audit logs.
Golden Delivery Workflow
- Train/fine-tune or onboard provider model.
- Register artifact and metadata (license, intended use, constraints).
- Run automated eval suite (quality + safety + latency + cost).
- Deploy canary behind gateway with strict traffic policy.
- Promote after SLO and business KPI thresholds pass.
- Keep rollback target hot for fast reversion.
CI/CD Design for AI Services
- Build immutable containers with pinned dependencies and model hashes.
- Use environment promotion:
dev -> stage -> prod. - Fail deployment if:
- regression evals drop below baseline,
- safety tests exceed risk threshold,
- p95 latency exceeds SLO budget.
- Store deployment evidence for audits (commit SHA, eval report, approver).
Operational SLOs
- Availability:
99.9%for synchronous inference endpoints. - Latency: p95 under product-specific target (for example,
<1200ms). - Cost: per-request and per-tenant budget ceilings.
- Quality: task success rate and groundedness thresholds.
Platform Guardrails
- Enforce tenant quotas and model allow-lists.
- Require structured output contracts for automation paths.
- Default to low-risk model settings for critical workflows.
- Disable unconstrained tool execution in production.
Tooling Stack (Example)
- Orchestration: Argo Workflows / GitHub Actions / Airflow.
- Model Registry: MLflow / custom metadata DB.
- Gateway: LiteLLM / Envoy-based API gateway.
- Observability: OpenTelemetry + Prometheus + Grafana + Langfuse.
- Policy: OPA/Rego for deployment and runtime checks.
Incident Readiness
- Runbooks for model outage, provider timeout spikes, and cost surges.
- Chaos drills for provider failover and vector DB degradation.
- Pre-approved rollback path with one-command execution.
Related Skills
- ai-pipeline-orchestration - Orchestrate ingestion and inference workflows
- agent-evals - Build evaluation gates for releases
- llm-gateway - Route and control LLM traffic
Source
git clone https://github.com/BagelHole/DevOps-Security-Agent-Skills/blob/main/devops/ai/llmops-platform-engineering/SKILL.mdView on GitHub Overview
Design and operate an internal LLM platform that supports rapid experimentation without compromising reliability, cost, or compliance. It standardizes the path from experiment to production and delivers safe model rollout with quality and safety gates, plus repeatable infra modules for inference, vector databases, and observability, while clarifying ownership across platform, app, and security teams.
How This Skill Works
It stacks four planes—Control (model registry, prompt/version catalog, policy checks, eval pipeline), Data (inference gateway, vector DB, cache, feature store), Ops (telemetry, alerting, SLO dashboards, cost analytics), and Security (IAM boundaries, secret rotation, content filters, audit logs)—to govern the lifecycle. The Golden Delivery Workflow trains or onboard models, registers artifacts, runs automated evals, deploys canaries behind strict gateways, promotes after SLO/KPI thresholds, and keeps a hot rollback target for fast reversion.
When to Use It
- You need a standardized path from experiment to production with governance and audits
- You require safe model rollout with quality, safety, latency, and cost gates
- You operate multi-cloud or self-hosted inference with repeatable infrastructure modules
- You must maintain clear ownership across platform, app, and security teams
- You want incident readiness with runbooks, chaos drills, and fast rollback
Quick Start
- Step 1: Set up the Control Plane with a model registry, prompt/version catalog, and policy checks
- Step 2: Implement CI/CD with immutable containers, pinned dependencies, and environment promotion (dev -> stage -> prod)
- Step 3: Configure the Golden Delivery Workflow: automated evals, canary deployment behind a gateway, and SLO/KPI gating before promotion
Best Practices
- Centralize control with a model registry, prompt/version catalog, and policy checks
- Build immutable containers with pinned dependencies and model hashes
- Promote deployments across environments (dev -> stage -> prod) using automated gates for regressions, safety, and latency
- Deploy canaries behind gateways and monitor SLOs and KPI thresholds
- Enforce platform guardrails: tenant quotas, model allow-lists, structured output contracts, and a hot rollback path
Example Use Cases
- A financial services organization uses a policy-driven LLMOps platform with audit logs and strict rollout gates for customer-facing chatbots
- An enterprise leverages a vector database and feature store to support retrieval-augmented generation with governance
- A SaaS provider uses MLflow-based model registry and eval pipelines to promote models from development to production
- A multi-tenant application enforces per-tenant budgets and latency targets while keeping strict gateway-controlled traffic
- Chaos drills and runbooks validate provider failover, vector DB degradation, and fast rollback readiness