What is hotfix-triage?

A capability to rapidly classify production issues, identify root causes from logs/stack traces, and route fixes along the fastest path for hotfixes.

Which tools are used and how are they applied?

Read for logs, Grep for error patterns, Glob for file discovery, Bash for reproduction, and Write for triage reporting and hand-off to the fix path.

When is human approval required?

High-risk fixes require a human approval breakpoint before proceeding.

hotfix-triage

npx machina-cli add skill a5c-ai/babysitter/hotfix-triage --openclaw

Files (1)

SKILL.md

995 B

Hotfix Triage

Capabilities

Rapidly classifies production issues by severity, locates root causes from logs and stack traces, and routes to the appropriate fix path. Simple fixes skip planning entirely for maximum speed.

Tool Use Instructions

Use Read to examine log files and error reports
Use Grep to search for error patterns and stack trace references in code
Use Glob to find affected source files
Use Bash to reproduce issues and check system state
Use Write to generate triage reports

Process Integration

Used in maestro-hotfix.js (Triage and Root Cause)
Agent: Hotfix Specialist
Severity levels: critical (immediate), high (same-day), medium (next-sprint)
Simple fixes skip planning phase
High-risk fixes require human approval breakpoint

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/methodologies/maestro/skills/hotfix-triage/SKILL.md

View on GitHub

Overview

Hotfix Triage accelerates production issue handling by quickly classifying severity, pinpointing root causes from logs and stack traces, and routing fixes along the fastest path. It supports immediate, same-day, and next-sprint remediation, and can skip planning for simple fixes to maximize speed. Integration with maestro-hotfix.js ensures a consistent triage flow.

How This Skill Works

Operators use Read to inspect logs and error reports, Grep to identify patterns and stack references, and Glob to locate affected sources. Bash is used to reproduce issues and verify system state; Write generates structured triage reports that feed into the hotfix routing pipeline.

When to Use It

Urgent production outage requiring immediate triage and routing to the fastest fix path
Severity escalates to critical or high and requires rapid decision-making
Root-cause analysis using logs and stack traces to pinpoint failure sources
Routing simple fixes to fast-path with minimal planning
High-risk fixes that require human approval breakpoint

Quick Start

Step 1: Use Read to load recent production logs and error reports.
Step 2: Run Grep to identify error patterns and stack trace references; use Glob to locate affected sources; Bash to reproduce the issue.
Step 3: Generate a triage report with Write and route through maestro-hotfix.js for fast-path handling (or escalate for approval if high-risk).

Best Practices

Start with Read to collect the latest logs and error reports
Use Grep to surface error patterns and stack trace references
Use Glob to map all affected source files and modules
Reproduce the issue in Bash to validate the state before applying changes
Document triage decisions and routing paths with Write; escalate high-risk fixes for human approval

Example Use Cases

Critical outage caused by connection pool exhaustion; triaged quickly, root cause identified from stack traces, and routed to a fast-path hotfix.
Service degradation with high-severity impact; triage isolates the root cause from logs, enabling same-day fix without a full planning phase.
Memory leak symptom detected under load; triage confirms source file set with Glob and directs to a rapid patch.
Non-deterministic error during peak traffic; triage uses Bash reproduction to confirm the state and assigns an expedited fix path.
High-risk code change flagged; triage triggers human approval breakpoint before proceeding with the fix.

Frequently Asked Questions

Add this skill to your agents