What is constitutional AI?

A framework that defines ethical principles to guide AI behavior and ensures adherence through critique, revision, and guardrails.

How do I implement guardrails?

Define principles, craft critique-revision prompts, ready refusal templates, and set escalation triggers.

How do you test safety?

Use adversarial inputs and red-team prompts to reveal gaps and iterate on guidelines.

constitutional-ai-prompts

npx machina-cli add skill a5c-ai/babysitter/constitutional-ai-prompts --openclaw

Files (1)

SKILL.md

1.2 KB

Constitutional AI Prompts Skill

Capabilities

Design constitutional AI principles
Implement self-critique and revision prompts
Create harmlessness guidelines
Design refusal patterns for unsafe requests
Implement red-team testing prompts
Create ethics-aware response frameworks

Target Processes

system-prompt-guardrails
content-moderation-safety

Implementation Details

Constitutional Patterns

Critique-Revision: Self-evaluate and improve responses
Principle Adherence: Follow defined ethical principles
Harmlessness Focus: Prioritize safe responses
Helpfulness Balance: Balance helpfulness with safety
Transparency: Acknowledge limitations

Configuration Options

Constitutional principles list
Critique prompts
Revision guidelines
Refusal templates
Escalation triggers

Best Practices

Define clear constitutional principles
Balance helpfulness and safety
Test with adversarial inputs
Document refusal patterns
Regular principle review

Dependencies

langchain-core

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/constitutional-ai-prompts/SKILL.md

View on GitHub

Overview

This skill designs and enforces constitutional AI principles to guide LLM behavior. It includes self-critique and revision prompts, harmlessness guidelines, refusal templates, red-team prompts, and ethics-aware response frameworks to keep outputs safe and aligned.

How This Skill Works

It defines a set of constitutional patterns (Critique-Revision, Principle Adherence, Harmlessness Focus, Helpfulness Balance, Transparency) and configures prompts, guidelines, and escalation triggers. The system combines these with a guardrail-focused workflow and tests against adversarial inputs using the prescribed templates.

When to Use It

When designing system prompts and guardrails for aligned LLM behavior
During content moderation and safety reviews
In red-team testing to identify unsafe responses
When updating ethical principles or refusal templates
When calibrating escalation and transparency prompts for difficult requests

Quick Start

Step 1: Define constitutional principles list
Step 2: Implement critique and revision prompts
Step 3: Add refusal templates and escalation triggers

Best Practices

Define clear constitutional principles
Balance helpfulness and safety
Test with adversarial inputs
Document refusal patterns
Regular principle review

Example Use Cases

Design a system prompt that refuses hateful content while offering safe alternatives
Use Critique-Revision prompts to improve a controversial response
Apply Harmlessness Focus to rephrase medical advice for lay audiences
Implement a refusal template that escalates unsafe queries to human review
Run red-team prompts to surface edge-case safety violations and iterate

Frequently Asked Questions

Add this skill to your agents