What is infra used for?

Provision and manage cloud infrastructure using Terraform, configure Kubernetes deployments, and handle network and IAM resources

What must you confirm before applying?

Cloud provider and whether a Terraform state file exists for the target environment

How should the code be structured?

Modular Terraform architecture with root and modules for network compute and database, and a remote state backend

infra

npx machina-cli add skill mrsknetwork/supernova/infra --openclaw

Files (1)

SKILL.md

5.6 KB

Infrastructure Engineering

Purpose

Infrastructure code has the highest blast radius of any code in a project. A misconfigured security group can expose a database publicly. A missing IAM boundary can allow privilege escalation. This skill treats infrastructure as code with the same rigor as application code: reviewed, versioned, and tested before apply.

SOP: Infrastructure Provisioning

Step 1 - Stack Discovery (Critical Gate)

Before writing a single .tf file, confirm:

What is the cloud provider? (AWS, GCP, Azure, DigitalOcean)
Does a Terraform state file already exist? (terraform.tfstate, or remote state in S3/GCS?)
What environments need to be provisioned? (development, staging, production - or just one?)
Is there an existing VPC or network topology that must be matched?

Never run terraform apply on an existing environment without first running terraform plan and reviewing the output.

Step 2 - Terraform Module Structure

Organize infrastructure into modules, not one monolithic main.tf:

infra/
├── main.tf          # Root module - calls child modules
├── variables.tf     # Root input variables
├── outputs.tf       # Root outputs (VPC ID, DB endpoint, etc.)
├── versions.tf      # Provider version constraints
├── backend.tf       # Remote state backend config
└── modules/
    ├── network/     # VPC, subnets, route tables, NAT
    ├── compute/     # EC2, ECS, or GKE cluster
    └── database/    # RDS/CloudSQL, parameter groups, subnet groups

Step 3 - Remote State Backend

Never use local state for team or production environments. Configure remote state before any terraform apply:

# backend.tf (AWS S3 backend)
terraform {
  backend "s3" {
    bucket         = "myapp-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-lock"  # prevents concurrent applies
  }
}

The S3 bucket and DynamoDB table must be created manually (or via a bootstrap script) before terraform init.

Step 4 - VPC Network Design

# modules/network/main.tf
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.project}-${var.environment}-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]  # app and DB tiers
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]  # load balancers only

  enable_nat_gateway     = true
  single_nat_gateway     = var.environment != "production"  # cost optimization for non-prod
  enable_dns_hostnames   = true
}

Subnet rules: Application servers and databases live in private subnets. Only load balancers and bastion hosts live in public subnets. The database subnet must have no route to the internet.

Step 5 - IAM Least Privilege

Every service gets its own IAM role. No service runs with AdministratorAccess:

# IAM role for the FastAPI application
resource "aws_iam_role" "app" {
  name = "${var.project}-app-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{ Effect = "Allow", Principal = { Service = "ecs-tasks.amazonaws.com" }, Action = "sts:AssumeRole" }]
  })
}

resource "aws_iam_role_policy" "app" {
  role = aws_iam_role.app.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      { Effect = "Allow", Action = ["s3:GetObject", "s3:PutObject"], Resource = "arn:aws:s3:::${var.uploads_bucket}/*" },
      { Effect = "Allow", Action = ["secretsmanager:GetSecretValue"], Resource = var.db_secret_arn },
    ]
  })
}

Audit IAM permissions quarterly. Remove unused policies.

Step 6 - Kubernetes Manifests (If Using K8s)

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels: { app: api }
spec:
  replicas: 2
  selector:
    matchLabels: { app: api }
  template:
    metadata:
      labels: { app: api }
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
      containers:
        - name: api
          image: ghcr.io/myorg/api:{{ IMAGE_TAG }}
          ports: [{ containerPort: 8000 }]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef: { name: app-secrets, key: database-url }
          livenessProbe:
            httpGet: { path: /health, port: 8000 }
            initialDelaySeconds: 10
          readinessProbe:
            httpGet: { path: /health, port: 8000 }
            initialDelaySeconds: 5
          resources:
            requests: { cpu: "100m", memory: "256Mi" }
            limits: { cpu: "500m", memory: "512Mi" }

Always define resources.requests and resources.limits. A pod without limits can consume all node resources.

Step 7 - Deployment Verification Checklist

After every terraform apply or kubectl apply, verify:

terraform plan showed only expected changes (no unexpected destroys).
Health check endpoint returns 200 within 2 minutes of deploy.
Logs show no new ERROR entries in the first 5 minutes post-deploy.
DB connection count is within expected range.
Previous version is available for rollback: kubectl rollout undo deployment/api.

Source

git clone https://github.com/mrsknetwork/supernova/blob/main/skills/infra/SKILL.mdView on GitHub

Overview

Infra is a code driven approach to provisioning and managing cloud infrastructure using Terraform, configuring Kubernetes deployments, and handling network and IAM resources. It treats infrastructure like application code - reviewed, versioned, and tested before apply to minimize blast radius.

How This Skill Works

It follows a gate based provisioning workflow: discover provider and existing state, organize code into modular Terraform roots, and enforce a remote state backend with locking. It applies VPC design and IAM least privilege to ensure safe changes and predictable deployments.

When to Use It

Provision new cloud environments (AWS/GCP/Azure)
Set up Kubernetes clusters (EKS/GKE/AKS)
Write or refactor Terraform modules
Configure complex network topology (VPCs, subnets, NAT, routing)
Audit or modify existing infrastructure with plan review and state checks

Quick Start

Step 1: Identify provider and check for an existing state file
Step 2: Initialize modules and backend with terraform init and backend config
Step 3: Run terraform plan, review the output, then terraform apply

Best Practices

Confirm provider and existing Terraform state before any apply
Organize infra into modules (network, compute, database)
Use a remote state backend with locking
Design VPCs with private subnets for apps and DB and public subnets for load balancers
Apply IAM least privilege with per service roles; avoid AdministratorAccess

Example Use Cases

Deploy a multi environment VPC and compute cluster using modular Terraform
Set up an EKS/GKE/AKS cluster with proper IAM roles and security groups
Implement remote state in S3 with DynamoDB lock for production
Create cross region network topology with public and private subnets and NAT gateways
Create per service IAM roles with limited permissions for an application

Frequently Asked Questions

Add this skill to your agents