hotfix-triage
npx machina-cli add skill a5c-ai/babysitter/hotfix-triage --openclawHotfix Triage
Capabilities
Rapidly classifies production issues by severity, locates root causes from logs and stack traces, and routes to the appropriate fix path. Simple fixes skip planning entirely for maximum speed.
Tool Use Instructions
- Use Read to examine log files and error reports
- Use Grep to search for error patterns and stack trace references in code
- Use Glob to find affected source files
- Use Bash to reproduce issues and check system state
- Use Write to generate triage reports
Process Integration
- Used in
maestro-hotfix.js(Triage and Root Cause) - Agent: Hotfix Specialist
- Severity levels: critical (immediate), high (same-day), medium (next-sprint)
- Simple fixes skip planning phase
- High-risk fixes require human approval breakpoint
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/methodologies/maestro/skills/hotfix-triage/SKILL.mdView on GitHub Overview
Hotfix Triage accelerates production issue handling by quickly classifying severity, pinpointing root causes from logs and stack traces, and routing fixes along the fastest path. It supports immediate, same-day, and next-sprint remediation, and can skip planning for simple fixes to maximize speed. Integration with maestro-hotfix.js ensures a consistent triage flow.
How This Skill Works
Operators use Read to inspect logs and error reports, Grep to identify patterns and stack references, and Glob to locate affected sources. Bash is used to reproduce issues and verify system state; Write generates structured triage reports that feed into the hotfix routing pipeline.
When to Use It
- Urgent production outage requiring immediate triage and routing to the fastest fix path
- Severity escalates to critical or high and requires rapid decision-making
- Root-cause analysis using logs and stack traces to pinpoint failure sources
- Routing simple fixes to fast-path with minimal planning
- High-risk fixes that require human approval breakpoint
Quick Start
- Step 1: Use Read to load recent production logs and error reports.
- Step 2: Run Grep to identify error patterns and stack trace references; use Glob to locate affected sources; Bash to reproduce the issue.
- Step 3: Generate a triage report with Write and route through maestro-hotfix.js for fast-path handling (or escalate for approval if high-risk).
Best Practices
- Start with Read to collect the latest logs and error reports
- Use Grep to surface error patterns and stack trace references
- Use Glob to map all affected source files and modules
- Reproduce the issue in Bash to validate the state before applying changes
- Document triage decisions and routing paths with Write; escalate high-risk fixes for human approval
Example Use Cases
- Critical outage caused by connection pool exhaustion; triaged quickly, root cause identified from stack traces, and routed to a fast-path hotfix.
- Service degradation with high-severity impact; triage isolates the root cause from logs, enabling same-day fix without a full planning phase.
- Memory leak symptom detected under load; triage confirms source file set with Glob and directs to a rapid patch.
- Non-deterministic error during peak traffic; triage uses Bash reproduction to confirm the state and assigns an expedited fix path.
- High-risk code change flagged; triage triggers human approval breakpoint before proceeding with the fix.