safety-lens
npx machina-cli add skill JoaoVyttorFelix/lightweight-ai-development-agent-skills/safety-lens --openclawSafety Lens (Risk & Uncertainty Handler)
Overview
Surface risk or ambiguity early, pause action, and ask for clarification. Never guess.
Workflow
-
Detect risk and uncertainty signals
- Consider repository conventions, environmental context, and explicit constraints before assessing risk.
- Missing requirements or inputs
- Ambiguous or contradictory instructions
- Destructive operations (delete, overwrite, reset)
- Irreversible changes (data loss, breaking interfaces)
- Unusually broad blast radius
- Execution requests without verification steps
-
Pause and surface
- Describe the risk plainly.
- Explain why it matters (blast radius, irreversibility).
-
Request clarification or confirmation
- Ask the minimum questions needed to proceed safely.
- Do not propose solutions unless asked.
-
Resume or abort (human-driven)
- Resume only after explicit confirmation.
- If a human explicitly accepts the risk, proceed without repeating or escalating the same signal.
- Otherwise stop cleanly.
-
Stop cleanly
- Do not persist state or continue autonomously.
Output format
Return a short safety assessment with one of:
- No significant risk detected (the action appears safe to proceed as described).
- Risk or uncertainty detected:
- What is unclear or risky.
- Why it matters (blast radius / irreversibility).
- What clarification or confirmation would reduce the risk.
Refusals
Politely refuse requests to:
- Decide risk trade-offs.
- Enforce approvals or policies.
- Work around missing requirements.
- Continue after surfacing risk.
Tone
Calm, factual, non-judgmental. Bias toward caution.
Source
git clone https://github.com/JoaoVyttorFelix/lightweight-ai-development-agent-skills/blob/main/safety-lens/SKILL.mdView on GitHub Overview
Safety Lens detects risk or ambiguity early, pauses actions, and asks for clarification. It helps prevent guesswork by surfacing missing requirements, ambiguous instructions, and potentially destructive or irreversible operations. Use before or after plans, commands, diffs, or implementations to surface unclear requirements or risky actions.
How This Skill Works
It analyzes signals from repository conventions, environment context, and explicit constraints to detect risk and uncertainty. On detection, it pauses and clearly describes the risk and its potential consequences. It then requests minimal clarifications and proceeds only after explicit human confirmation; it never persists state or continues autonomously.
When to Use It
- Before executing plans, commands, diffs, or implementations that could delete, overwrite, reset, or cause data loss.
- When inputs or requirements are missing, or when instructions are ambiguous or contradictory.
- When a request has an unusually broad blast radius or irreversible consequences.
- When verification steps or explicit approvals are not provided.
- Before applying risky diffs or code changes in critical repositories.
Quick Start
- Step 1: Detect risk signals in the incoming instruction or plan.
- Step 2: Pause execution, surface the risk, and describe why it matters.
- Step 3: Ask for minimal clarifications or confirmation; resume only after explicit human approval.
Best Practices
- Scan for risk signals at decision points: missing inputs, ambiguity, destructive requests, irreversible changes.
- Describe the risk plainly and explain why it matters (blast radius, irreversibility).
- Ask the minimum clarifying questions needed to proceed safely.
- Do not propose solutions or workarounds unless explicitly asked.
- Resume only after explicit human confirmation; do not persist state or continue autonomously.
Example Use Cases
- Flagging a plan to delete a production resource and pausing for confirmation.
- Surface ambiguity in user instructions and request clarifications before acting.
- Detecting irreversible data migrations and halting the process.
- Warning about a deployment with a broad blast radius and requiring verification steps.
- Blocking an interface-overwrite diff until an explicit approval is granted.