When should I run Stepback?

After 2+ fix attempts for the same class of problem or when several symptoms point to the same underlying system; proactively when infra/config changes.

How does Stepback decide between patching vs architecture-level fixes?

If testing the broader class of features fails, the issue is systemic. In that case, architecture-level fixes are prioritized over symptom patches, and you document the recommended path.

What signals indicate Stepback is needed?

Persistent symptoms after multiple fixes, a cascade of errors from a shared root, local success but production failure, and changes touching deployment/config or infrastructure.

stepback

npx machina-cli add skill anombyte93/atlas-session-lifecycle/stepback --openclaw

Files (1)

SKILL.md

5.3 KB

Stepback - Strategic Reassessment

Core principle: If you've tried 2+ fixes for the same class of problem, you're probably patching symptoms. Stop. Zoom out. Research the architecture.

When to Trigger

digraph stepback {
  "Fix attempt failed?" [shape=diamond];
  "Is this the 2nd+ attempt?" [shape=diamond];
  "Same root system?" [shape=diamond];
  "Keep debugging" [shape=box];
  "STOP — Run /stepback" [shape=box, style=bold];

  "Fix attempt failed?" -> "Is this the 2nd+ attempt?" [label="yes"];
  "Fix attempt failed?" -> "Keep debugging" [label="no, first try"];
  "Is this the 2nd+ attempt?" -> "Same root system?" [label="yes"];
  "Is this the 2nd+ attempt?" -> "Keep debugging" [label="no"];
  "Same root system?" -> "STOP — Run /stepback" [label="yes"];
  "Same root system?" -> "Keep debugging" [label="different systems"];
}

Symptoms that demand /stepback:

Same error returns after a "fix"
Fix A reveals error B reveals error C (cascade)
Locally it works, deployed it doesn't
Multiple components broken by the same underlying cause
Fixes feel like whack-a-mole

Proactive triggers (don't wait for failure):

Touching deployment config, build pipeline, or infrastructure
Changing middleware, routing, or auth at the platform level
Modifying next.config.js, vercel.json, Dockerfile, CI/CD

Execution

STEP 1 — Stop and Inventory

List every fix attempt so far in this session. For each:

What symptom it addressed
Whether it worked or revealed a new symptom
What assumption it was based on

Present this to the user as a table:

| # | Fix | Symptom | Result | Assumption |
|---|-----|---------|--------|------------|
| 1 | Added route to middleware whitelist | 405 on webhook | Still 405 | Middleware was blocking |
| 2 | Replaced parseBody with manual HMAC | 500 crash | Still 500 | parseBody incompatible |
| 3 | ... | ... | ... | ... |

STEP 2 — Find the Common Thread

Ask: "What system do ALL of these symptoms share?"

Don't look at each symptom individually. Look for:

Shared infrastructure (build system, deployment platform, DNS)
Shared config (one file that affects everything)
Shared assumption (e.g., "the code is running" when it isn't)

STEP 3 — Research the Architecture

Spawn a research agent (Perplexity) with THREE queries:

"Common causes of [shared symptom pattern] in [platform/framework]"
"[Platform] [framework] deployment issues [year]"
"[Specific error pattern] works locally fails in production [platform]"

This is mandatory. No skipping research because "I think I know."

STEP 4 — Test the Broadest Hypothesis First

Instead of testing your specific broken feature, test whether the ENTIRE CLASS of features works:

Before: "Does /api/revalidate work?"
After:  "Does ANY API route work on production?"

If the broader test fails, you've found a systemic issue — fix THAT, not the individual symptom.

STEP 5 — Present the Choice

Show the user:

**Symptom-level fix:** [what you've been doing]
**Architecture-level fix:** [what research suggests]
**Effort comparison:** [patch N symptoms vs fix 1 root cause]
**Recommendation:** [which approach and why]

Let the user decide. Don't assume redesign is always better — sometimes patching IS correct.

Red Flags — You're Symptom-Patching If:

Signal	What It Means
"Let me just try one more thing"	You're guessing, not diagnosing
"This fix should definitely work"	You said that last time
"It works locally"	Local ≠ production. Check the deployment pipeline
"I'll add this to the whitelist/allowlist"	You're growing a list instead of questioning the filter
"The code is correct, something else is wrong"	Check if the code is even RUNNING
3+ commits with "fix:" in a row	Pattern detected — step back

The Outsider Test

When running /stepback, pretend you're a new consultant reviewing this system for the first time. Ask:

"What is different about this setup compared to a standard one?"
"What would a fresh install look like?"
"What was added/customized that might be causing this?"

The answer is almost always in the DELTA between standard and custom.

Real-World Example

Session 2026-02-13 — Atlas Website webhook

5 fix attempts over 2 hours: middleware whitelist, parseBody replacement, CSP headers, dead component cleanup, deployment protection check. All were real issues but none were THE issue.

The outsider test revealed: outputFileTracingRoot in next.config.js was set to a hardcoded local path (/home/anombyte/Atlas/Atlas_Website). On Vercel's build server, this path doesn't exist, so Next.js silently generated ZERO serverless functions. No API routes worked — not just the webhook.

One line removed. Everything worked.

The 5 "fixes" were real improvements (cleaner middleware, better CSP, removed dead code) — but they would have taken 10 minutes as planned cleanup, not 2 hours of confused debugging.

Source

git clone https://github.com/anombyte93/atlas-session-lifecycle/blob/main/skills/stepback/SKILL.mdView on GitHub

Overview

Stepback triggers after 2+ fixes for the same class of problem or when several symptoms point to a shared system. It prioritizes architectural reassessment over symptom-patching and is prudent when infrastructure-level configuration is involved.

How This Skill Works

Follow five execution steps: inventory all fixes, identify the common thread among symptoms, conduct architecture research using a dedicated agent (Perplexity) with THREE queries, test the broadest class of features, and present a recommended path for the user to choose. This shifts focus from isolated fixes to root-cause repair and ensures infra/config changes are considered.

When to Use It

Same error returns after 2+ fix attempts (symptom-patching).
Multiple symptoms share a common root system or platform.
Local success but production deployment fails after fixes.
Fixes seem to patch symptoms rather than address root cause (whack-a-mole).
Proactively when changes touch deployment config, build pipelines, or infrastructure.

Quick Start

Step 1: Stop and inventory all fix attempts—list symptoms, outcomes, and assumptions.
Step 2: Find the common thread across all symptoms and identify the shared system.
Step 3: Spawn a research agent (Perplexity) with THREE queries to verify architecture-level causes.

Best Practices

Inventory every fix attempt in the current session with symptom, outcome, and underlying assumption.
Identify the common thread or shared root system across all symptoms.
Spawn a research agent (Perplexity) with THREE queries to investigate architecture-level causes.
Test the broadest class of features first to reveal systemic issues (e.g., any API route works in production).
Present a recommended path (architecture-level fix vs patch) and let the user decide.

Example Use Cases

After multiple fixes to a webhook, 405 persists; Stepback reveals a middleware-level block caused by a route change.
Local tests pass but production shows 500 after fixes; Stepback uncovers deployment/configuration mismatch in the pipeline.
A fix reveals B and then C across services; Stepback identifies a shared authentication/authorization service as the root cause.
Platform-level changes to middleware trigger failures across several components; Stepback advocates architecture-level adjustments.
Modifying next.config.js, vercel.json, or Dockerfile leads to widespread API failures; Stepback directs an architecture review.

Frequently Asked Questions

Add this skill to your agents