Get the FREE Ultimate OpenClaw Setup Guide →

critique

npx machina-cli add skill synaptiai/agent-capability-standard/critique --openclaw
Files (1)
SKILL.md
12.2 KB

Intent

Systematically analyze a target (plan, code, design, document) to identify failure modes, edge cases, ambiguities, and potential exploits. Produce actionable findings with severity ratings and mitigation recommendations.

Success criteria:

  • All major failure modes identified
  • Edge cases documented with reproduction scenarios
  • Ambiguities flagged with clarification requests
  • Security/exploit risks rated appropriately
  • Each finding has actionable recommendation

Compatible schemas:

  • schemas/output_schema.yaml

Inputs

ParameterRequiredTypeDescription
targetYesstring or objectPlan, code, design, or document to critique
focus_areasNoarraySpecific areas to examine (security, performance, logic, etc.)
severity_thresholdNostringMinimum severity to report: low, medium, high, critical
contextNoobjectAdditional context about the target
adversarialNobooleanEnable adversarial/red-team thinking (default: true)

Procedure

  1. Understand the target: Analyze what is being critiqued

    • Read and comprehend the target fully
    • Identify the purpose and intended behavior
    • Note stated assumptions and constraints
    • Understand success criteria
  2. Identify failure modes: Find ways the target could fail

    • What happens under unexpected inputs?
    • What if dependencies are unavailable?
    • What if timing assumptions are wrong?
    • What if scale exceeds expectations?
  3. Enumerate edge cases: Find boundary conditions

    • Empty inputs, null values, missing fields
    • Maximum/minimum values
    • Concurrent access scenarios
    • Unicode, special characters, injection patterns
  4. Detect ambiguities: Find unclear specifications

    • Terms with multiple interpretations
    • Missing definitions
    • Implicit assumptions
    • Contradictory requirements
  5. Assess exploit paths: Identify security vulnerabilities

    • Input validation gaps
    • Authentication/authorization bypasses
    • Data exposure risks
    • Privilege escalation opportunities
  6. Rate severity: Classify each finding

    • CRITICAL: System compromise or data loss
    • HIGH: Significant functionality or security impact
    • MEDIUM: Moderate impact, workarounds exist
    • LOW: Minor issues, cosmetic or edge-case
  7. Recommend mitigations: Provide actionable fixes

    • Specific changes to address each finding
    • Priority ordering for fixes
    • Estimated effort for each mitigation

Output Contract

Return a structured object:

findings:
  - id: string  # Unique finding identifier
    type: failure_mode | edge_case | ambiguity | exploit | gap
    description: string  # Clear description of the issue
    severity: low | medium | high | critical
    location: string  # Where in target (file:line, section, etc.)
    evidence: string  # Supporting evidence
    reproduction: string | null  # How to reproduce
    recommendation: string  # Suggested fix
risk_summary:
  overall_risk: low | medium | high | critical
  mitigations_possible: boolean
  urgent_issues: array[string]  # Finding IDs requiring immediate attention
  risk_distribution:
    critical: integer
    high: integer
    medium: integer
    low: integer
recommendations:
  - finding_ref: string  # Finding ID
    suggestion: string  # Detailed fix suggestion
    effort: low | medium | high
    priority: integer  # 1 = highest priority
patterns_identified:
  - pattern: string  # Recurring issue pattern
    affected_findings: array[string]
    systemic_fix: string  # How to address pattern
confidence: 0..1
evidence_anchors: ["file:line", "section:name"]
assumptions: []

Field Definitions

FieldTypeDescription
findingsarrayIndividual issues discovered
risk_summaryobjectOverall risk assessment
recommendationsarrayPrioritized fix suggestions
patterns_identifiedarrayRecurring issues with systemic fixes
confidencenumber0.0-1.0 based on analysis completeness
evidence_anchorsarrayReferences to target locations
assumptionsarrayAssumptions made during analysis

Examples

Example 1: Plan Critique

Input:

target:
  id: "deploy-plan-001"
  goal: "Deploy new authentication service"
  steps:
    - id: "step-1"
      action: "Stop old auth service"
    - id: "step-2"
      action: "Deploy new auth service"
    - id: "step-3"
      action: "Run smoke tests"
focus_areas: ["availability", "rollback", "security"]

Output:

findings:
  - id: "CRT-001"
    type: failure_mode
    description: "Zero-downtime deployment not guaranteed. Stopping old service before new is verified creates outage window."
    severity: high
    location: "step-1 to step-2 transition"
    evidence: "step-1 stops service, step-2 starts new - gap exists"
    reproduction: "Execute plan; observe 503 errors between steps"
    recommendation: "Use blue-green deployment: start new service, verify health, then route traffic"
  - id: "CRT-002"
    type: gap
    description: "No rollback step defined. If step-3 smoke tests fail, no recovery procedure."
    severity: critical
    location: "plan.steps - missing rollback"
    evidence: "No rollback_action in any step"
    reproduction: "Smoke tests fail; manual intervention required"
    recommendation: "Add rollback step: 'If step-3 fails, restart old auth service'"
  - id: "CRT-003"
    type: ambiguity
    description: "'Smoke tests' not defined. Unclear what is being tested or pass criteria."
    severity: medium
    location: "step-3"
    evidence: "No verification_criteria specified"
    reproduction: null
    recommendation: "Define specific test cases: auth flow, token validation, error handling"
  - id: "CRT-004"
    type: exploit
    description: "No security verification step. New service could have misconfigured permissions."
    severity: high
    location: "plan - missing security check"
    evidence: "No step validates auth service security posture"
    reproduction: "Deploy service with open admin endpoint; not caught by smoke tests"
    recommendation: "Add security scan step before traffic routing"
risk_summary:
  overall_risk: critical
  mitigations_possible: true
  urgent_issues: ["CRT-002", "CRT-001"]
  risk_distribution:
    critical: 1
    high: 2
    medium: 1
    low: 0
recommendations:
  - finding_ref: "CRT-002"
    suggestion: "Add rollback step with automatic trigger on smoke test failure"
    effort: low
    priority: 1
  - finding_ref: "CRT-001"
    suggestion: "Rewrite as blue-green deployment with health check before cutover"
    effort: medium
    priority: 2
  - finding_ref: "CRT-004"
    suggestion: "Add security scan after deployment, before enabling traffic"
    effort: medium
    priority: 3
  - finding_ref: "CRT-003"
    suggestion: "Define smoke test checklist with pass/fail criteria"
    effort: low
    priority: 4
patterns_identified:
  - pattern: "Missing safety gates"
    affected_findings: ["CRT-001", "CRT-002", "CRT-004"]
    systemic_fix: "Adopt CAVR pattern: checkpoint before each mutation, verify after each step"
confidence: 0.9
evidence_anchors:
  - "plan:step-1:stops-service"
  - "plan:steps:no-rollback"
assumptions:
  - "Auth service is critical for all user operations"
  - "Downtime of auth service blocks all authenticated requests"

Evidence pattern: Each finding references specific plan location; recommendations are actionable.


Example 2: Code Critique (Security Focus)

Input:

target:
  file: "src/api/auth.py"
  code: |
    def login(username, password):
        user = db.query(f"SELECT * FROM users WHERE username='{username}'")
        if user and user.password == password:
            return create_token(user.id)
        return None
focus_areas: ["security"]
adversarial: true

Output:

findings:
  - id: "CRT-001"
    type: exploit
    description: "SQL injection vulnerability. Username is interpolated directly into query string."
    severity: critical
    location: "auth.py:2"
    evidence: "f-string query with unescaped username input"
    reproduction: "username=\"admin'--\", password=\"anything\" bypasses auth"
    recommendation: "Use parameterized query: db.query('SELECT * FROM users WHERE username=?', (username,))"
  - id: "CRT-002"
    type: exploit
    description: "Plain-text password comparison. Passwords stored in cleartext or compared unsafely."
    severity: critical
    location: "auth.py:3"
    evidence: "Direct comparison: user.password == password"
    reproduction: "Database compromise exposes all passwords"
    recommendation: "Use bcrypt or argon2: bcrypt.checkpw(password, user.password_hash)"
  - id: "CRT-003"
    type: failure_mode
    description: "No rate limiting. Brute force attacks possible."
    severity: high
    location: "auth.py:login function"
    evidence: "No attempt tracking or delay mechanism"
    reproduction: "Script 1000 login attempts per second"
    recommendation: "Add rate limiting: max 5 attempts per minute per IP/username"
  - id: "CRT-004"
    type: edge_case
    description: "Empty username/password not handled. Could cause unexpected behavior."
    severity: low
    location: "auth.py:1"
    evidence: "No input validation at function entry"
    reproduction: "login('', '') - behavior undefined"
    recommendation: "Add input validation: if not username or not password: return None"
risk_summary:
  overall_risk: critical
  mitigations_possible: true
  urgent_issues: ["CRT-001", "CRT-002"]
  risk_distribution:
    critical: 2
    high: 1
    medium: 0
    low: 1
recommendations:
  - finding_ref: "CRT-001"
    suggestion: "IMMEDIATE: Replace f-string query with parameterized query"
    effort: low
    priority: 1
  - finding_ref: "CRT-002"
    suggestion: "IMMEDIATE: Migrate to bcrypt password hashing"
    effort: medium
    priority: 2
  - finding_ref: "CRT-003"
    suggestion: "Add rate limiting middleware before login endpoint"
    effort: medium
    priority: 3
  - finding_ref: "CRT-004"
    suggestion: "Add input validation decorator"
    effort: low
    priority: 4
patterns_identified:
  - pattern: "OWASP Top 10 violations"
    affected_findings: ["CRT-001", "CRT-002"]
    systemic_fix: "Security code review and SAST integration in CI"
confidence: 0.95
evidence_anchors:
  - "auth.py:2:sql-injection"
  - "auth.py:3:plaintext-password"
assumptions:
  - "This code handles real user authentication"
  - "Database contains actual user credentials"

Verification

  • All focus areas were examined
  • Severity ratings are justified
  • Each finding has a recommendation
  • Critical issues are highlighted
  • Patterns are identified for systemic fixes

Verification tools: Read (for code/doc analysis), Grep (for pattern search)

Safety Constraints

  • mutation: false
  • requires_checkpoint: false
  • requires_approval: false
  • risk: low

Capability-specific rules:

  • Never test exploits on production systems
  • Report all critical findings immediately
  • Do not disclose security findings to unauthorized parties
  • If finding reveals active exploit, escalate immediately
  • Be thorough but avoid false positives

Composition Patterns

Commonly follows:

  • plan - Critique plans before execution
  • inspect - Understand target before critique
  • retrieve - Gather context for critique

Commonly precedes:

  • plan - Revise plan based on critique findings
  • mitigate - Address identified risks
  • verify - Validate fixes for findings
  • audit - Record critique and responses

Anti-patterns:

  • Never skip critique for high-risk plans
  • Never proceed with critical findings unaddressed
  • Avoid superficial critique of security-sensitive targets

Workflow references:

  • See reference/composition_patterns.md#debug-code-change for critique in debugging
  • See reference/composition_patterns.md#capability-gap-analysis for planning context

Source

git clone https://github.com/synaptiai/agent-capability-standard/blob/main/skills/critique/SKILL.mdView on GitHub

Overview

Systematically analyzes a target (plan, code, or design) to identify failure modes, edge cases, ambiguities, and exploit paths. It yields actionable findings with severity ratings and concrete mitigations. This is essential during reviews, security audits, stress tests, and assumption validation.

How This Skill Works

The agent reads and understands the target, then enumerates failure modes, edge cases, ambiguities, and potential exploits across relevant focus areas. It assigns severities (CRITICAL to LOW), then proposes prioritized mitigations with clear recommendations and effort estimates. Adversarial thinking can be enabled to explore harder-to-find issues.

When to Use It

  • Review a proposal or design before acceptance
  • Audit API security and access controls
  • Stress-test logic under edge-case and timing scenarios
  • Validate assumptions in specifications or documents
  • Evaluate code changes in revisions or pull requests

Quick Start

  1. Step 1: Provide the target plus optional focus areas and severity threshold.
  2. Step 2: Run the critique pass to enumerate failure modes, edge cases, ambiguities, and exploits with evidence and locations.
  3. Step 3: Review findings and apply prioritized mitigations with estimated effort.

Best Practices

  • Define precise focus areas (security, performance, logic, etc.) before critique.
  • Capture reproduction steps for edge cases and failure modes.
  • Rate severity consistently and justify the rationale behind each rating.
  • Flag ambiguities with explicit clarification requests and examples.
  • Provide actionable mitigations with concrete effort estimates and priority.

Example Use Cases

  • Auditing a security proposal for potential exploits and bypasses.
  • Stress-testing a payment processing flow for race conditions and timing bugs.
  • Reviewing API authentication/authorization logic for misconfigurations.
  • Validating data export/import design for edge-case handling and leakage risks.
  • Evaluating code changes in a PR for hidden ambiguities and failure modes.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers