constitutional-ai-prompts
npx machina-cli add skill a5c-ai/babysitter/constitutional-ai-prompts --openclawFiles (1)
SKILL.md
1.2 KB
Constitutional AI Prompts Skill
Capabilities
- Design constitutional AI principles
- Implement self-critique and revision prompts
- Create harmlessness guidelines
- Design refusal patterns for unsafe requests
- Implement red-team testing prompts
- Create ethics-aware response frameworks
Target Processes
- system-prompt-guardrails
- content-moderation-safety
Implementation Details
Constitutional Patterns
- Critique-Revision: Self-evaluate and improve responses
- Principle Adherence: Follow defined ethical principles
- Harmlessness Focus: Prioritize safe responses
- Helpfulness Balance: Balance helpfulness with safety
- Transparency: Acknowledge limitations
Configuration Options
- Constitutional principles list
- Critique prompts
- Revision guidelines
- Refusal templates
- Escalation triggers
Best Practices
- Define clear constitutional principles
- Balance helpfulness and safety
- Test with adversarial inputs
- Document refusal patterns
- Regular principle review
Dependencies
- langchain-core
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/constitutional-ai-prompts/SKILL.mdView on GitHub Overview
This skill designs and enforces constitutional AI principles to guide LLM behavior. It includes self-critique and revision prompts, harmlessness guidelines, refusal templates, red-team prompts, and ethics-aware response frameworks to keep outputs safe and aligned.
How This Skill Works
It defines a set of constitutional patterns (Critique-Revision, Principle Adherence, Harmlessness Focus, Helpfulness Balance, Transparency) and configures prompts, guidelines, and escalation triggers. The system combines these with a guardrail-focused workflow and tests against adversarial inputs using the prescribed templates.
When to Use It
- When designing system prompts and guardrails for aligned LLM behavior
- During content moderation and safety reviews
- In red-team testing to identify unsafe responses
- When updating ethical principles or refusal templates
- When calibrating escalation and transparency prompts for difficult requests
Quick Start
- Step 1: Define constitutional principles list
- Step 2: Implement critique and revision prompts
- Step 3: Add refusal templates and escalation triggers
Best Practices
- Define clear constitutional principles
- Balance helpfulness and safety
- Test with adversarial inputs
- Document refusal patterns
- Regular principle review
Example Use Cases
- Design a system prompt that refuses hateful content while offering safe alternatives
- Use Critique-Revision prompts to improve a controversial response
- Apply Harmlessness Focus to rephrase medical advice for lay audiences
- Implement a refusal template that escalates unsafe queries to human review
- Run red-team prompts to surface edge-case safety violations and iterate
Frequently Asked Questions
Add this skill to your agents