qa-run
Scannednpx machina-cli add skill ajaywadhara/agentic-sdlc-plugin/qa-run --openclawArguments: $FEATURE (or "all" for entire suite)
Read CLAUDE.md before doing anything else. Ensure the dev server is running before proceeding.
━━━ BROWSER AGENT (Playwright MCP — runs first) ━━━
Before writing ANY test file, explore the live application using Playwright MCP. This is non-negotiable for web applications. You MUST see the real app first.
STEP A — NAVIGATE EVERY SCREEN: For each screen related to $FEATURE: 1. browser_navigate to the screen URL 2. browser_snapshot — capture the accessibility tree (This gives you the REAL selectors, roles, and accessible names. Never guess selectors. Always get them from the live app.) 3. browser_take_screenshot — save to qa/browser-tests/$FEATURE/ 4. Compare what you see against docs/SCREENS.md and wireframes/ 5. Log any discrepancies immediately
STEP B — TEST EVERY INTERACTION: On each screen: 1. browser_click every button — verify correct result 2. browser_type into every input — verify it accepts input 3. browser_select_option on every dropdown 4. browser_press_key Tab through the page — verify focus order 5. browser_press_key Enter on focused buttons — verify activation 6. For forms: submit with valid data, empty data, and invalid data
STEP C — TEST THE HAPPY PATH LIVE: Read P0 acceptance criteria from docs/PRD.md. Execute each Given/When/Then by actually doing it in the browser: - browser_navigate to start - browser_type / browser_click / browser_select_option to perform actions - browser_verify_text_visible / browser_verify_element_visible for assertions - browser_take_screenshot at each step
STEP D — RESPONSIVE CHECK: For the 3 most important screens: browser_resize width=1440 height=900 → browser_take_screenshot (desktop) browser_resize width=768 height=1024 → browser_take_screenshot (tablet) browser_resize width=375 height=812 → browser_take_screenshot (mobile) Verify: no overflow, no cut-off content, touch targets ≥ 44px on mobile
STEP E — HEALTH CHECK: browser_console_messages — flag any JavaScript errors or warnings browser_network_requests — flag any failed requests (4xx/5xx)
STEP F — GENERATE INITIAL TEST FILES: Use browser_generate_playwright_test to create .spec.ts files from your session. Save to: tests/e2e/$FEATURE-browser.spec.ts These become the foundation that the Engineer Agent refines below.
Output: qa/browser-tests/$FEATURE/exploration.md (Summary of what was found: working elements, broken elements, missing elements, selectors discovered, accessibility tree findings)
━━━ ANALYST AGENT ━━━
Read the source code for $FEATURE. Read qa/plans/ for any existing test coverage on this feature. Read qa/browser-tests/$FEATURE/exploration.md for live browser findings.
Map every testable surface:
- Every user-facing interaction (clicks, inputs, form submissions, keyboard nav)
- Every API call this feature makes and its possible response shapes
- Every UI state: loading, empty, error, success, partial data
- Every data-testid attribute or accessible role present in the DOM
- Every validation rule (client-side and server-side)
- Every route or navigation this feature triggers
Output: qa/plans/$FEATURE.md
━━━ PLANNER AGENT ━━━
Read qa/plans/$FEATURE.md. Assign priority and write a Given/When/Then for each:
P0 — "If this breaks, the product is unusable" (auth flows, data saving, core feature paths) P1 — "If this breaks, a significant feature is degraded" (secondary flows, important edge cases) P2 — "Edge case — good to have covered" (unusual inputs, rare states, nice-to-have validation)
Also include for each screen:
- Empty state scenario (user has no data yet)
- Error state scenario (network fails, server returns 500)
- Mobile viewport scenario (at least for P0 items)
Output: qa/plans/$FEATURE-prioritized.md
━━━ ENGINEER AGENT ━━━
CRITICAL: The dev server must be running. Use Playwright MCP to navigate the actual, running application before writing any test.
For each scenario in qa/plans/$FEATURE-prioritized.md:
- Navigate to the relevant route using Playwright MCP
- Confirm the element you intend to target is visible and accessible
- Note the exact accessible role, label, or testId
- Then write the Playwright test
Write all tests to: tests/e2e/$FEATURE.spec.ts
Playwright rules — these are absolute, no exceptions: ALLOWED: getByRole('button', { name: 'Save' }) ALLOWED: getByLabel('Email address') ALLOWED: getByText('No transactions yet') ALLOWED: getByTestId('transaction-list') FORBIDDEN: page.$('.save-btn') FORBIDDEN: page.$('#submit') FORBIDDEN: page.$x('//button[@class="primary"]') FORBIDDEN: page.waitForTimeout(3000) <- use expect().toBeVisible() instead
Every test must:
- Have a descriptive name explaining what it verifies
- Assert a specific, meaningful outcome (not just "doesn't crash")
- Use proper async/await throughout
- Clean up any data it creates (use beforeEach/afterEach hooks)
━━━ SENTINEL AGENT ━━━
Read tests/e2e/$FEATURE.spec.ts line by line.
BLOCK (stop QA loop, return to Engineer) if any of these exist:
- Any selector containing "." or "#" or "//"
- Any action missing an await keyword
- Any test block with zero assertions (expect() calls)
- Any page.waitForTimeout() greater than 2000ms
- Any test that only navigates and clicks with no assertion
WARN (flag but do not block) for:
- Test names that don't clearly describe the scenario
- Missing afterEach cleanup for data-creating tests
- Tests that could affect each other's state
Output: qa/audits/$FEATURE-audit.md
If blockers found: list exact line numbers. Return to Engineer. If no blockers: proceed.
━━━ EXECUTION ━━━
Run: npx playwright test tests/e2e/$FEATURE.spec.ts --reporter=json Save full output to: qa/runs/$FEATURE-latest.json
━━━ HEALER AGENT (runs only if failures exist) ━━━
For each failed test:
- Read the full error message and attached screenshot
- Navigate to the failing page using Playwright MCP to inspect current state
- Make a determination:
BROKEN TEST (the test is wrong): -> The page structure changed, selector no longer exists, or the expected text changed (not a regression, just drift) -> Fix: update the selector or assertion to match current reality -> Re-run the specific test -> If fixed: continue
CONFIRMED BUG (the application is wrong): -> The feature is not behaving as the PRD acceptance criteria describe -> Do NOT fix the test to hide the bug -> Create: qa/bugs/$FEATURE-[timestamp].md with: - Which test failed - What the expected behaviour is (from PRD) - What the actual behaviour is - Screenshot path - Steps to reproduce -> STOP the QA loop -> Report: "Bug confirmed in $FEATURE. QA loop stopped. Run /build $FEATURE with this bug report to fix."
Maximum 3 fix attempts per test before treating as confirmed bug.
━━━ EXPANDER AGENT (runs only if all tests pass) ━━━
Review qa/plans/$FEATURE-prioritized.md and tests/e2e/$FEATURE.spec.ts.
Find gaps — scenarios not yet covered. Look specifically for:
- What happens when the user submits an empty form?
- What happens at maximum input length (e.g. 10,000 character input)?
- What happens if the user navigates away mid-flow and returns?
- What happens if the user hits browser back/forward?
- What happens on a very slow connection? (use Playwright network throttling)
- What happens if the user is not authenticated and tries this feature?
- What happens with special characters or emoji in text inputs?
Add 3-5 new tests to tests/e2e/$FEATURE.spec.ts. Append the new scenarios to qa/plans/$FEATURE-prioritized.md. Run the full suite again: npx playwright test tests/e2e/$FEATURE.spec.ts
━━━ SNAPSHOT AGENT ━━━
For every page involved in $FEATURE, capture screenshots at three viewports:
- Desktop: 1440 x 900
- Tablet: 768 x 1024
- Mobile: 375 x 812
Save to: qa/visual-baselines/$FEATURE/[screen]-desktop.png qa/visual-baselines/$FEATURE/[screen]-tablet.png qa/visual-baselines/$FEATURE/[screen]-mobile.png
FIRST RUN BEHAVIOUR: These screenshots ARE the baseline. Save them. Document in qa/visual-baselines/$FEATURE/README.md:
- Date baseline was created
- What build/commit this represents
- Any known intentional visual quirks
SUBSEQUENT RUN BEHAVIOUR: Run: npx playwright test --project=visual Compare each screenshot against baseline. If pixel difference > 2%: flag as visual regression. Save diff images to: qa/visual-reports/$FEATURE-[date]-diff.png A visual regression is treated the same as a test failure.
TO INTENTIONALLY UPDATE BASELINE: Run: npx playwright test --project=visual --update-snapshots Commit new baseline files. Document what changed and why in qa/visual-baselines/$FEATURE/README.md.
━━━ QUALITY GATE ━━━
Calculate score:
P0 tests: All must pass. Any P0 failure = score 0 = STOP HERE. P0 passing: 40 points P1 passing: [passing / total] x 30 points P2 passing: [passing / total] x 15 points Visual match: All snapshots match baseline = 15 points Any visual regression = 0 points for this category
TOTAL POSSIBLE: 100 points
Score < 85: FAIL -> Write full report to qa/QUALITY_LOG.md -> Output to user: which tests failed, which snapshots regressed, what the likely causes are -> "Run /build $FEATURE with this report to address failures."
Score >= 85: PASS -> Append to qa/QUALITY_LOG.md: date, feature, score, test count -> "QA passed for $FEATURE. Score: [X]/100. Proceed to CI."
Source
git clone https://github.com/ajaywadhara/agentic-sdlc-plugin/blob/main/skills/qa-run/SKILL.mdView on GitHub Overview
qa-run orchestrates an eight-agent QA loop for web apps, starting with live browser exploration via Playwright MCP and progressing through analysis, planning, testing, auditing, healing, expanding, and snapshot steps. It enforces a quality gate of 85 or higher to pass and outputs initial test artifacts for refinement.
How This Skill Works
Browser Agent uses Playwright MCP to navigate every screen, capture the live accessibility tree, and log real selectors. Analyst reads feature code and exploration notes, then Planner generates Given/When/Then test plans for each screen. The workflow outputs initial Playwright test files and an exploration report to guide downstream refinement.
When to Use It
- Validating a new or updated web feature across all screens ($FEATURE or all).
- Ensuring selectors and accessibility are accurate by testing against the live app.
- Generating and refining end-to-end tests from plan artifacts.
- Performing health checks and audits to surface errors, warnings, or missing elements.
- Expanding coverage to multiple breakpoints and device sizes.
Quick Start
- Step 1: Read CLAUDE.md and ensure the dev server is running.
- Step 2: Run qa-run with a feature ($FEATURE) or 'all'.
- Step 3: Review outputs in qa/browser-tests/$FEATURE/exploration.md and qa/plans/$FEATURE.md, then refine coverage and re-run to meet the 85+ gate.
Best Practices
- Explore the live app with Playwright MCP before writing tests; never guess selectors.
- Capture and compare accessibility trees, roles, and visible text; base tests on real app state.
- Log discrepancies immediately and reference docs/screens/ to resolve gaps.
- Test across 3 device sizes (desktop, tablet, mobile) to ensure responsive reliability.
- Review qa/browser-tests and qa/plans outputs to iteratively improve coverage.
Example Use Cases
- Rolling out a new dashboard feature: qa-run documents live surfaces, generates initial tests, and yields a 92+ quality gate.
- Auth flow validation: planner creates Given/When/Then scenarios for login, signup, and password reset.
- Regression guard: discovery uncovers mismatches between docs and live UI, prompting fixes and updated exploration.md.
- Test file generation: initial .spec.ts files are produced for engineer refinement and faster delivery.
- Responsive QA: 3-device snapshots help fix layout issues before PR merge.