Get the FREE Ultimate OpenClaw Setup Guide →

constitutional-ai-prompts

npx machina-cli add skill a5c-ai/babysitter/constitutional-ai-prompts --openclaw
Files (1)
SKILL.md
1.2 KB

Constitutional AI Prompts Skill

Capabilities

  • Design constitutional AI principles
  • Implement self-critique and revision prompts
  • Create harmlessness guidelines
  • Design refusal patterns for unsafe requests
  • Implement red-team testing prompts
  • Create ethics-aware response frameworks

Target Processes

  • system-prompt-guardrails
  • content-moderation-safety

Implementation Details

Constitutional Patterns

  1. Critique-Revision: Self-evaluate and improve responses
  2. Principle Adherence: Follow defined ethical principles
  3. Harmlessness Focus: Prioritize safe responses
  4. Helpfulness Balance: Balance helpfulness with safety
  5. Transparency: Acknowledge limitations

Configuration Options

  • Constitutional principles list
  • Critique prompts
  • Revision guidelines
  • Refusal templates
  • Escalation triggers

Best Practices

  • Define clear constitutional principles
  • Balance helpfulness and safety
  • Test with adversarial inputs
  • Document refusal patterns
  • Regular principle review

Dependencies

  • langchain-core

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/constitutional-ai-prompts/SKILL.mdView on GitHub

Overview

This skill designs and enforces constitutional AI principles to guide LLM behavior. It includes self-critique and revision prompts, harmlessness guidelines, refusal templates, red-team prompts, and ethics-aware response frameworks to keep outputs safe and aligned.

How This Skill Works

It defines a set of constitutional patterns (Critique-Revision, Principle Adherence, Harmlessness Focus, Helpfulness Balance, Transparency) and configures prompts, guidelines, and escalation triggers. The system combines these with a guardrail-focused workflow and tests against adversarial inputs using the prescribed templates.

When to Use It

  • When designing system prompts and guardrails for aligned LLM behavior
  • During content moderation and safety reviews
  • In red-team testing to identify unsafe responses
  • When updating ethical principles or refusal templates
  • When calibrating escalation and transparency prompts for difficult requests

Quick Start

  1. Step 1: Define constitutional principles list
  2. Step 2: Implement critique and revision prompts
  3. Step 3: Add refusal templates and escalation triggers

Best Practices

  • Define clear constitutional principles
  • Balance helpfulness and safety
  • Test with adversarial inputs
  • Document refusal patterns
  • Regular principle review

Example Use Cases

  • Design a system prompt that refuses hateful content while offering safe alternatives
  • Use Critique-Revision prompts to improve a controversial response
  • Apply Harmlessness Focus to rephrase medical advice for lay audiences
  • Implement a refusal template that escalates unsafe queries to human review
  • Run red-team prompts to surface edge-case safety violations and iterate

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers