Why must acceptance tests run in the main Claude Code session?

To ensure deterministic results and observe real tool interactions without concurrency issues; sub-agents can miss lifecycle events and advisory messages.

What indicates a pass or fail for a test?

Pass is observed when the hook behavior matches the expected outcome (e.g., blocking an destructive command, showing advisory context, or confirming context in system-reminders). Fail if the behavior deviates or the command executes contrary to expectations.

How do I filter tests to run only certain handlers?

Use the filter parameter or handler-substring syntax (e.g., /acceptance-test blocking-only or /acceptance-test DestructiveGit) to target specific test sets or handlers.

acceptance-test

npx machina-cli add skill Edmonds-Commerce-Limited/claude-code-hooks-daemon/acceptance-test --openclaw

Files (1)

SKILL.md

6.4 KB

/acceptance-test - Acceptance Testing Skill

Description

Execute acceptance tests for hooks daemon handlers via real Claude Code tool calls in the main thread. Tests verify real-world handler behaviour by invoking actual tools (Bash, Write, Edit, Read) and observing hook responses in system-reminders.

CRITICAL: Main Thread Only

Acceptance tests MUST be executed in the main Claude Code session. Sub-agent strategies (parallel Haiku batches) are unreliable and forbidden:

Sub-agents run out of context on large test suites
Sub-agents cannot use Write/Edit tools (PreToolUse:Write tests fail)
Lifecycle events (SessionStart, SessionEnd) only fire in the main session
Advisory messages in system-reminders are only visible to the main session
The v2.9.0 incident proved async agents create race conditions with release gates

The ONLY valid acceptance test is a real tool call in the main thread with the hook response observed directly.

Usage

# Test all handlers (default)
/acceptance-test
/acceptance-test all

# Test only blocking handlers
/acceptance-test blocking-only

# Test only advisory handlers
/acceptance-test advisory-only

# Test only context injection handlers
/acceptance-test context-only

# Test specific handler by name substring
/acceptance-test DestructiveGit
/acceptance-test npm

Parameters

filter (optional): Test filter
- all (default): Run all tests
- blocking-only: Only blocking handlers (test_type=blocking)
- advisory-only: Only advisory handlers (test_type=advisory)
- context-only: Only context injection handlers (test_type=context)
- <handler-substring>: Filter by handler name (e.g., "Git", "npm", "TDD")

What It Does

Restarts daemon to ensure latest code loaded
Generates test playbook from handler code via generate-playbook
Executes each test sequentially in the main thread using real tool calls
Observes hook responses in system-reminders and tool output
Records pass/fail for each test based on observed behaviour
Reports summary with pass/fail/skip counts

Execution Method

Tests are executed one at a time via real Claude Code tool calls:

BLOCKING tests (expected: deny):

# Use Bash tool with the test command
Bash: echo "git reset --hard HEAD"
# Observe: PreToolUse hook blocking error in output
# Result: PASS if blocked, FAIL if command executed

ADVISORY tests (expected: allow with context):

# Use Bash tool with the test command
Bash: echo "git stash"
# Observe: PreToolUse hook additional context in system-reminder
# Result: PASS if advisory shown, FAIL if no advisory

WRITE/EDIT tests (expected: deny or allow):

# Use Write tool with test content
Write: file_path="/tmp/test.py" content="x = 1  # type: ignore"
# Observe: PreToolUse:Write hook blocking error
# Result: PASS if blocked, FAIL if write succeeded

CONTEXT tests (expected: allow with context):

# Verified by observing system-reminders during normal tool use
# SessionStart, PostToolUse, UserPromptSubmit already fire every turn
# Result: PASS if system-reminder shows handler active

Test Types

BLOCKING Tests:

Command is blocked by handler (deny decision)
Verifies destructive operations are prevented
Examples: git reset --hard, rm -rf, sudo pip, sed -i

ADVISORY Tests:

Command succeeds but handler adds advisory context
Verifies informational messages appear in system-reminders
Examples: npm install warnings, git stash warnings, TDD reminders

CONTEXT Tests:

Handler injects context into system-reminder
Verified by observing system-reminders during session
Examples: SessionStart hook active, PostToolUse hook active

Lifecycle Event Handling

Tests for lifecycle events (SessionStart, SessionEnd, PreCompact, Stop, SubagentStop, Notification, PermissionRequest, Status) fire automatically during normal session usage. They are verified by:

SessionStart: Confirmed at session start via system-reminder
PostToolUse: Confirmed after every tool call via system-reminder
UserPromptSubmit: Confirmed on every user message via system-reminder
Others: Confirmed by daemon loading without errors (checked via daemon status and daemon logs)

These are marked PASS (verified via session) rather than individually executed.

Output

On success:

Acceptance Tests Complete!

Results Summary:
   Total tests: 107
   Passed: 62 (direct tool call verification)
   Skipped: 45 (lifecycle events - verified via session/daemon load)
   Failed: 0

All tests passed! Handlers working correctly.

On failure:

Acceptance Tests FAILED!

Results Summary:
   Total tests: 107
   Passed: 59
   Failed: 3
   Skipped: 45

Failed Tests:
   1. RubyQaSuppression - rubocop:disable not blocked
   2. BritishEnglish - advisory not shown
   3. MarkdownOrganization - memory path blocked

Investigation needed - fix handler bugs before release.
Enter FAIL-FAST cycle: TDD fix -> QA -> restart -> retest ALL from beginning.

Time Investment

Full suite: 15-30 minutes (sequential main-thread execution)
Blocking only: 5-10 minutes
Single handler: 1-2 minutes

Sequential execution is slower than parallel but 100% reliable.

Requirements

Daemon running: Must have hooks daemon active
Main thread: Tests MUST run in the main Claude Code session (NOT sub-agents)
Real tool calls: Each test must use actual Bash/Write/Edit/Read tools

Integration with Release Process

This skill is used during the release workflow:

/release
  -> Stage 4: Acceptance Testing Gate (BLOCKING)
     -> Main Claude executes /acceptance-test all
     -> Tests run sequentially in main thread
     -> Each test is a real tool call with observed response
     -> If ANY test fails -> ABORT release, enter FAIL-FAST cycle

See: CLAUDE/development/RELEASING.md for complete release workflow

Documentation

Playbook generation: generate-playbook CLI command Test definitions: Each handler's get_acceptance_tests() method Generating docs: CLAUDE/AcceptanceTests/GENERATING.md

Version

Introduced in: v2.9.0 (Plan 00044) Updated in: v2.10.0 - Switched from parallel sub-agent to main-thread sequential execution

Source

git clone https://github.com/Edmonds-Commerce-Limited/claude-code-hooks-daemon/blob/main/.claude/skills/acceptance-test/SKILL.md

View on GitHub

Overview

This skill runs acceptance tests for hooks daemon handlers by invoking real Claude Code tool calls in the main thread. It restarts the daemon, generates a test playbook from handler code, and observes system-reminder output to determine pass/fail. The process ensures reliable, real-world validation of blocking, advisory, and context-injection handlers.

How This Skill Works

Tests are executed sequentially in the main Claude Code session using actual tool calls. The workflow restarts the daemon to load the latest code, generates a test playbook from the handler code, runs each test one at a time, and records pass/fail based on observed hook responses and tool output in system-reminders.

When to Use It

Test all handlers to validate end-to-end behavior in a fresh environment
Test only blocking handlers to ensure destructive operations are denied
Test only advisory handlers to verify contextual advisories appear in system-reminders
Test only context injection handlers to confirm context is propagated
Test a specific handler by name substring (e.g., 'Git', 'npm')

Quick Start

Step 1: Run /acceptance-test (or /acceptance-test all) to start tests
Step 2: Acceptance workflow restarts the daemon and generates a playbook from handler code
Step 3: Tests run sequentially in the main thread; review system-reminders and results

Best Practices

Always restart the daemon before running acceptance tests to load the latest code
Run tests in the main Claude Code session; sub-agents should not be used for acceptance
Generate the test playbook automatically from handler code using generate-playbook
Observe system-reminders and tool output to determine pass/fail per test
Document any discrepancies and include a clear summary report of pass/fail/skip counts

Example Use Cases

Blocking test: bash tool with 'git reset --hard HEAD' and expect a PreToolUse block
Advisory test: bash tool with 'git stash' to verify advisory context appears in system-reminders
Write test: Write tool attempting to modify /tmp/test.py and expect a block or allowed write per policy
Context test: observe SessionStart and PostToolUse reminders to confirm context injection
Substring filter test: run /acceptance-test DestructiveGit or /acceptance-test npm to target specific handlers

Frequently Asked Questions

Add this skill to your agents