What tools are included in the monitoring stack?

Sentry for error tracking, structlog for structured logs, Prometheus for metrics, and uptime monitoring via BetterStack or UptimeRobot.

Where should I start with Sentry?

Begin with Backend (Step 1) to integrate Sentry in your FastAPI app and then add Frontend (Step 2) for Next.js; configure DSN and environment accordingly.

When should I add Prometheus?

Add Prometheus once you have enough traffic to justify dashboards; start with core metrics like latency, throughput, and error rate, then expand.

monitoring

npx machina-cli add skill mrsknetwork/supernova/monitoring --openclaw

Files (1)

SKILL.md

5.7 KB

Monitoring Engineering

Purpose

Vibe-coders ship apps and have no idea what's happening in production. Users encounter errors silently. Slow endpoints go unnoticed. This skill sets up the minimum viable observability layer so that when something breaks, you know about it before the user complains.

Monitoring Stack

Layer	Tool	What it catches
Error tracking	Sentry	Exceptions, stack traces, user context
Structured logs	structlog	Request logs, audit events, debug traces
App metrics	Prometheus + Grafana	Latency, throughput, error rates
Uptime	BetterStack / UptimeRobot	Is the server even responding?

Start with Sentry (10 min setup, immediate value). Add Prometheus when you have enough traffic to need dashboards.

SOP: Sentry Integration

Step 1 - Backend (FastAPI)

uv pip install sentry-sdk[fastapi]

# main.py
import sentry_sdk
from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration

sentry_sdk.init(
    dsn=settings.SENTRY_DSN,            # from Sentry dashboard
    environment=settings.ENV,           # "production", "staging", "development"
    traces_sample_rate=0.1,             # capture 10% of requests for performance tracing
    profiles_sample_rate=0.1,
    integrations=[FastApiIntegration(), SqlalchemyIntegration()],
    send_default_pii=False,             # never send passwords, credit cards, etc.
)

Sentry auto-captures unhandled exceptions from this point. No code changes needed to track errors.

Manually capture handled errors with context:

import sentry_sdk

async def process_payment(order_id: UUID, user: User):
    try:
        await stripe_service.charge(order)
    except stripe.error.CardError as e:
        sentry_sdk.set_user({"id": str(user.id), "email": user.email})
        sentry_sdk.capture_exception(e)
        raise HTTPException(402, "Payment declined")

Step 2 - Frontend (Next.js)

npx @sentry/wizard@latest -i nextjs

The wizard creates sentry.client.config.ts, sentry.server.config.ts, and patches next.config.js. After running it:

// sentry.client.config.ts
import * as Sentry from "@sentry/nextjs";

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.1,
  replaysOnErrorSampleRate: 1.0,  // always capture session replay on error
});

Wrap the root layout for error boundaries:

// app/global-error.tsx
"use client";
import * as Sentry from "@sentry/nextjs";

export default function GlobalError({ error, reset }: { error: Error; reset: () => void }) {
  Sentry.captureException(error);
  return (
    <html><body>
      <h2>Something went wrong</h2>
      <button onClick={() => reset()}>Try again</button>
    </body></html>
  );
}

Step 3 - Structured Logging (structlog)

# logging_config.py
import structlog
import logging

def configure_logging(debug: bool = False):
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.stdlib.add_log_level,
            structlog.stdlib.add_logger_name,
            structlog.dev.ConsoleRenderer() if debug else structlog.processors.JSONRenderer(),
        ],
        logger_factory=structlog.stdlib.LoggerFactory(),
        wrapper_class=structlog.stdlib.BoundLogger,
        cache_logger_on_first_use=True,
    )
    logging.basicConfig(level=logging.DEBUG if debug else logging.INFO)

# In main.py lifespan:
configure_logging(debug=settings.DEBUG)

Usage:

log = structlog.get_logger()

# Always log structured key-value pairs, never f-strings for log lines
log.info("order_created", order_id=str(order.id), user_id=str(user.id), amount=order.total)
log.error("payment_failed", order_id=str(order_id), error_code=e.code, exc_info=True)

In production, logs are JSON - they can be searched by field name in any log aggregator (Datadog, CloudWatch, Loki).

Step 4 - Prometheus Metrics (FastAPI)

uv pip install prometheus-fastapi-instrumentator

# main.py
from prometheus_fastapi_instrumentator import Instrumentator

Instrumentator().instrument(app).expose(app, endpoint="/metrics")

This auto-exposes /metrics with request count, latency histograms, and error rates per endpoint. Connect Prometheus to scrape /metrics and Grafana to visualize.

Step 5 - Health Check Endpoint (Required for k8s + Uptime Monitoring)

@app.get("/health", include_in_schema=False)
async def health(db: AsyncSession = Depends(get_db)):
    # Check DB connection
    try:
        await db.execute(text("SELECT 1"))
        db_status = "ok"
    except Exception:
        db_status = "error"

    status = "ok" if db_status == "ok" else "degraded"
    return {"status": status, "db": db_status, "version": settings.APP_VERSION}

Point BetterStack / UptimeRobot at https://your-api.com/health with a 1-minute check interval.

Step 6 - Alerts

Configure in Sentry dashboard:

Alert when error rate > 1% of requests
Alert when latency P95 > 2 seconds
Alert on any new error type (first occurrence)

All alerts route to Slack or email. Production incidents should wake someone up.

Source

git clone https://github.com/mrsknetwork/supernova/blob/main/skills/monitoring/SKILL.mdView on GitHub

Overview

This skill sets up a minimum viable observability stack for production apps, combining Sentry error tracking, structured logging with structlog, application metrics via Prometheus, and uptime monitoring. It helps you detect, diagnose, and alert on failures before users notice.

How This Skill Works

It instruments the app across backend, frontend, logging, metrics, and uptime. Start with Sentry for error tracking on the backend (and frontend via Next.js), then layer structured logging with structlog and expose key metrics with Prometheus. Uptime monitoring checks ensure server responsiveness and alerts when issues arise.

When to Use It

Preparing an app for production and visibility
Debugging production issues with error traces and logs
When there is no visibility into what's failing in the live app
Setting up dashboards for latency, throughput, and error rates
Triggering alerts for uptime and responsiveness

Quick Start

Step 1: Instrument Backend with Sentry (pip install sentry-sdk[fastapi], initialize with DSN, environment, and integrations)
Step 2: Instrument Frontend with Sentry (run Next.js wizard and configure sentry.client.config.ts and sentry.server.config.ts)
Step 3: Add StructLog, Prometheus metrics, and uptime checks (configure structlog, expose Prometheus metrics, and enable BetterStack/UptimeRobot alerts)

Best Practices

Start with Sentry for error tracking; configure environment and DSN; use a reasonable traces_sample_rate to balance cost and visibility
Keep send_default_pii = False in Sentry to avoid exposing sensitive data
Instrument critical code paths with structlog to include context like request IDs and user info
Expose metrics (latency, throughput, error rates) with Prometheus and visualize in Grafana
Set up uptime monitoring (BetterStack/UptimeRobot) and configure alerts for non-responsive endpoints

Example Use Cases

An API endpoint raises an unhandled exception; Sentry captures the error with stack trace and user context for rapid triage
Frontend navigation errors are reported by Sentry in Next.js, enabling quick user-impact analysis
Structured logs with structlog include request_id, user_id, and actions to aid security audits
Prometheus metrics reveal increasing latency under load, triggering Grafana dashboards and alerts
Uptime checks detect a non-responsive server and trigger outage alerts, reducing MTTR

Frequently Asked Questions

Add this skill to your agents