strict-enforcement
Scannednpx machina-cli add skill elb-pr/claudikins-kernel/strict-enforcement --openclawStrict Enforcement Verification Methodology
When to use this skill
Use this skill when you need to:
- Run the
claudikins-kernel:verifycommand - Validate implementation before shipping
- Decide pass/fail verdicts
- Check code integrity after changes
- Enforce cross-command gates
Core Philosophy
"Evidence before assertions. Always." - Verification philosophy
Never claim code works without seeing it work. Tests passing is not enough. Claude must SEE the output.
The Three Laws
- See it working - Screenshots, curl responses, CLI output. Actual evidence.
- Human checkpoint - No auto-shipping. Human reviews evidence and decides.
- Exit code 2 gates - Verification failures block claudikins-kernel:ship. No exceptions.
Verification Phases
Phase 1: Automated Quality Checks
Run the automated checks first. Fast feedback.
| Check | Command Pattern | What It Catches |
|---|---|---|
| Tests | npm test / pytest / cargo test | Logic errors, regressions |
| Lint | npm run lint / ruff / clippy | Style issues, common bugs |
| Types | tsc / mypy / cargo check | Type mismatches, interface drift |
| Build | npm run build / cargo build | Compilation errors, bundling issues |
Flaky Test Detection (C-12):
Test fails?
├── Re-run failed tests
├── Pass 2nd time?
│ └── Yes → STOP: [Accept flakiness] [Fix tests] [Abort]
└── Fail 2nd time?
├── Run isolated
└── Still fail? → STOP: [Fix] [Skip] [Abort]
Phase 2: Output Verification (catastrophiser)
This is the feedback loop that makes Claude's code actually work.
| Project Type | Verification Method | Evidence |
|---|---|---|
| Web app | Start server, screenshot, test flows | Screenshots, console logs |
| API | Curl endpoints, check responses | Status codes, response bodies |
| CLI | Run commands, capture output | stdout, stderr, exit codes |
| Library | Run examples, check results | Output values, test coverage |
| Service | Check logs, verify health endpoint | Log patterns, health responses |
Fallback Hierarchy (A-3):
If primary method unavailable, fall back:
- Start server + screenshot (preferred for web)
- Curl endpoints (preferred for API)
- Run CLI commands (preferred for CLI)
- Run tests only (fallback)
- Code review only (last resort)
Timeout: 30 seconds per verification method (CMD-30).
Phase 3: Code Simplification (Optional)
After verification passes, optionally run cynic for polish.
Prerequisites:
- Phase 2 (catastrophiser) must PASS
- Human approves: "Run cynic for polish pass?"
cynic Rules:
- Preserve exact behaviour (tests MUST still pass)
- Remove unnecessary abstraction
- Improve naming clarity
- Delete dead code
- Flatten nested conditionals
If tests fail after simplification:
- Log failure reasons
- Show human
- Proceed anyway (A-5) with caveat
See cynic-rollback.md for recovery patterns.
Phase 4: Klaus Escalation
If stuck during verification:
Is mcp__claudikins-klaus available? (E-16)
├── No →
│ Offer: [Manual review] [Ask Claude differently] (E-17)
│ Fallback: [Accept with uncertainty] [Max retries, abort] (E-18)
└── Yes →
Spawn klaus via SubagentStop hook
Phase 5: Human Checkpoint
The final gate. Present comprehensive evidence.
Verification Report
-------------------
Tests: ✓ 47/47 passed
Lint: ✓ 0 issues
Types: ✓ 0 errors
Build: ✓ success
Evidence:
- Screenshot: .claude/evidence/login-flow.png
- API test: POST /api/auth → 200 OK
- CLI test: mycli --help → exit 0
[Ready to Ship] [Needs Work] [Accept with Caveats]
Human decides. If approved, set unlock_ship = true.
Rationalizations to Resist
Agents under pressure find excuses. These are all violations:
| Excuse | Reality |
|---|---|
| "Tests pass, that's good enough" | Tests aren't enough. SEE it working. Screenshots, curl, output. |
| "I'll verify after shipping" | Verify BEFORE ship. That's the whole point. |
| "The type checker caught everything" | Types don't catch runtime issues. Get evidence. |
| "Screenshot failed but it probably works" | "Probably" isn't evidence. Fix the screenshot or use fallback. |
| "Human checkpoint is just a formality" | Human checkpoint is the gate. No auto-shipping. |
| "Code review is enough for this change" | Code review is last resort fallback. Try harder. |
| "Tests are flaky, I'll ignore the failure" | Flaky tests hide real failures. Fix or explicitly accept with caveat. |
| "Exit code 2 is too strict" | Exit code 2 exists to block bad ships. Pass properly. |
All of these mean: Get evidence. Human decides. No shortcuts.
Red Flags — STOP and Reassess
If you're thinking any of these, you're about to violate the methodology:
- "It should work because..."
- "The tests pass so..."
- "I'm confident that..."
- "It worked before..."
- "The types check so..."
- "I'll just skip verification this once"
- "Human will approve anyway"
- "Evidence isn't necessary for this change"
All of these mean: STOP. Get evidence. Present to human. Let them decide.
Exit Code 2 Pattern (CRITICAL)
The verify-gate.sh hook enforces the gate:
# Both conditions MUST be true
ALL_PASSED=$(jq -r '.all_checks_passed' "$STATE")
HUMAN_APPROVED=$(jq -r '.human_checkpoint.decision' "$STATE")
if [ "$ALL_PASSED" != "true" ]; then
exit 2 # Blocks claudikins-kernel:ship
fi
if [ "$HUMAN_APPROVED" != "ready_to_ship" ]; then
exit 2 # Blocks claudikins-kernel:ship
fi
File Manifest (C-6):
At verification completion, generate SHA256 hashes of all source files:
find . \( -name '*.ts' -o -name '*.py' -o -name '*.rs' \) \
| xargs sha256sum > .claude/verify-manifest.txt
This lets claudikins-kernel:ship detect if code was modified after verification.
Cross-Command Gate (C-14)
claudikins-kernel:verify requires claudikins-kernel:execute to have completed:
if [ ! -f "$EXECUTE_STATE" ]; then
echo "ERROR: claudikins-kernel:execute has not been run"
exit 2
fi
This enforces the claudikins-kernel:outline → claudikins-kernel:execute → claudikins-kernel:verify → claudikins-kernel:ship flow.
Agent Integration
| Agent | Role | When |
|---|---|---|
| catastrophiser | See code working | Phase 2: Output verification |
| cynic | Polish pass | Phase 3: Simplification (optional) |
Both agents run with context: fork and background: true.
See agent-integration.md for coordination patterns.
State Tracking
verify-state.json
{
"session_id": "verify-2026-01-16-1100",
"execute_session_id": "execute-2026-01-16-1030",
"branch": "execute/task-1-auth-middleware",
"phases": {
"test_suite": { "status": "PASS", "count": 47 },
"lint": { "status": "PASS", "issues": 0 },
"type_check": { "status": "PASS", "errors": 0 },
"output_verification": { "status": "PASS", "agent": "catastrophiser" },
"code_simplification": { "status": "PASS", "agent": "cynic" }
},
"all_checks_passed": true,
"human_checkpoint": {
"decision": "ready_to_ship",
"caveats": []
},
"unlock_ship": true,
"verified_manifest": "sha256:...",
"verified_commit_sha": "abc123..."
}
Anti-Patterns
Don't do these:
- Trusting test results without seeing code run
- Skipping output verification because "tests pass"
- Auto-approving verification without human checkpoint
- Modifying code after verification passes
- Ignoring flaky test warnings
- Proceeding when lint/type checks fail
Edge Case Handling
| Situation | Reference |
|---|---|
| Tests hang or timeout | test-timeout-handling.md |
| Auto-fix breaks code | lint-fix-validation.md |
| Primary verification fails | verification-method-fallback.md |
| Type-check results unclear | type-check-confidence.md |
| cynic breaks tests | cynic-rollback.md |
| Large project state | verify-state-compression.md |
References
Full documentation in this skill's references/ folder:
- verification-checklist.md - Complete verification checklist
- red-flags.md - Common rationalisation patterns
- agent-integration.md - How catastrophiser and cynic coordinate
- advanced-verification.md - Complex verification scenarios
- test-timeout-handling.md - When tests hang (S-13)
- lint-fix-validation.md - Validating auto-fix safety (S-14)
- verification-method-fallback.md - Fallback strategies (S-15)
- type-check-confidence.md - Interpreting type results (S-16)
- cynic-rollback.md - Rolling back failed simplifications (S-17)
- verify-state-compression.md - State management for large projects (S-18)
Source
git clone https://github.com/elb-pr/claudikins-kernel/blob/main/skills/strict-enforcement/SKILL.mdView on GitHub Overview
This skill enforces strict verification by requiring tangible evidence of code working before shipping. It centers on claudikins-kernel:verify, mandates visible outputs (screenshots, curl responses, CLI output), and blocks shipping without human-reviewed evidence. The methodology embodies the three laws: see it working, human checkpoint, and exit-code gates.
How This Skill Works
Begin with Phase 1 automated quality checks (tests, lint, types, build). Phase 2 requires concrete evidence of functionality tailored to the project type (web, API, CLI, library, service). Phase 3 optionally refines code after verification. Phase 4 enforces escalation and exit gates to prevent auto-shipping without human approval.
When to Use It
- Run the claudikins-kernel:verify command to start the verification workflow
- Validate implementation before shipping changes to production
- Decide pass/fail verdicts based on visible evidence rather than tests alone
- Check code integrity after changes and enforce cross-command gates
- Enforce formal gates where failures block claudikins-kernel:ship
Quick Start
- Step 1: Run claudikins-kernel:verify to initiate Phase 1 checks
- Step 2: Collect evidence for the project type (screenshots, curl outputs, logs, CLI output)
- Step 3: Present evidence to a human reviewer; if gaps exist, iterate until evidence passes and gate the ship decision
Best Practices
- Require concrete evidence (screenshots, curl responses, CLI output, logs) before claiming success
- Capture stdout, stderr, and exit codes for all verification steps
- Archive evidence with clear references to PRs and changes for auditing
- Do not auto-ship; enforce a human checkpoint to review evidence
- Apply Phase 1–3 verification consistently across web, API, CLI, library, and service projects
Example Use Cases
- Web app: Start server, capture screenshots, and test user flows to verify live behavior
- API: Curl endpoints and validate status codes and response bodies against expectations
- CLI: Run commands and record stdout/stderr and exit codes to prove functionality
- Library: Run examples and confirm outputs, ensuring test coverage mirrors usage
- Service: Check health endpoints and log patterns to confirm operational readiness