What is SEV1 and how quickly should we respond?

SEV1 = service down with all users affected; response is immediate with all-hands involvement.

How often should updates be provided during an incident?

Provide clear, factual updates at a regular cadence, including what's happening, who is affected, what we're doing, and when the next update will come.

incident-response

npx machina-cli add skill anthropics/knowledge-work-plugins/incident-response --openclaw

Files (1)

SKILL.md

1.4 KB

Incident Response

Guide incident response from detection through resolution and postmortem.

Severity Classification

Level	Criteria	Response Time
SEV1	Service down, all users affected	Immediate, all-hands
SEV2	Major feature degraded, many users affected	Within 15 min
SEV3	Minor feature issue, some users affected	Within 1 hour
SEV4	Cosmetic or low-impact issue	Next business day

Response Framework

Triage: Classify severity, identify scope, assign incident commander
Communicate: Status page, internal updates, customer comms if needed
Mitigate: Stop the bleeding first, root cause later
Resolve: Implement fix, verify, confirm resolution
Postmortem: Blameless review, 5 whys, action items

Communication Templates

Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.

Postmortem Format

Blameless. Focus on systems and processes. Include timeline, root cause analysis (5 whys), what went well, what went poorly, and action items with owners and due dates.

Source

git clone https://github.com/anthropics/knowledge-work-plugins/blob/main/engineering/skills/incident-response/SKILL.mdView on GitHub

Overview

Incident Response guides you from detection to resolution and postmortem. It defines severity levels (SEV1-SEV4), outlines a 5-step response framework, and provides communication templates to keep stakeholders informed.

How This Skill Works

Start with triage to classify severity, scope, and assign an incident commander. Then follow the framework: Triage, Communicate, Mitigate, Resolve, and Postmortem with a blameless 5 whys analysis to drive improvements.

When to Use It

You detect production downtime affecting all users (SEV1).
A major feature is degraded and impacts many users (SEV2).
A minor feature issue affects some users (SEV3).
Cosmetic or low-impact issues require monitoring and a scheduled fix (SEV4).
Any reported incident requires triage and an incident commander to coordinate response.

Quick Start

Step 1: Detect and triage severity, classify scope, and appoint the incident commander.
Step 2: Communicate clearly (status page, internal updates, customer updates if needed) and implement immediate mitigations.
Step 3: Resolve the issue, verify the fix, confirm resolution, and conduct a blameless postmortem with action items.

Best Practices

Classify severity and scope early and assign an incident commander to own the response.
Provide clear, regular communications (status page, internal updates, customer updates if needed).
Mitigate first to stop the bleeding, then investigate root cause later.
Follow the 5-step response framework: Triage, Communicate, Mitigate, Resolve, Postmortem.
Conduct a blameless postmortem with a 5 whys analysis, timeline, what went well, what went poorly, and actionable owners with due dates.

Example Use Cases

E-commerce checkout outage (SEV1): all users affected; incident commander appointed; rapid mitigation followed by root-cause analysis and postmortem.
Search indexing slowdown (SEV2): degraded feature quality for many users; internal updates and customer communications maintained throughout.
Payment gateway downstream failure (SEV2/SEV1): failed transactions across regions; immediate mitigation and a postmortem with action items.
Daily batch job delay due to database lock (SEV3): partial user impact; scheduled fix with blameless review afterward.
Frontend feature toggle causes UI jitter (SEV4): cosmetic impact; monitor and plan a targeted fix with minimal disruption.

Frequently Asked Questions

Add this skill to your agents