guardrails-ai-setup
npx machina-cli add skill a5c-ai/babysitter/guardrails-ai-setup --openclawguardrails-ai-setup
Configure Guardrails AI validation framework to ensure LLM outputs meet quality, safety, and structural requirements. Implement validators for input sanitization, output format enforcement, and safety constraints.
Overview
Guardrails AI provides:
- Input validation before LLM calls
- Output validation after LLM responses
- Structured output enforcement (JSON, XML, etc.)
- Pre-built validators from Guardrails Hub
- Custom validator creation
- Automatic retry and correction mechanisms
Capabilities
Input Validation
- Sanitize user inputs
- Detect prompt injection attempts
- Validate input formats and lengths
- Check for PII before processing
Output Validation
- Enforce structured output schemas
- Validate content accuracy
- Check for harmful content
- Verify factual consistency
Safety Constraints
- Content moderation
- Toxicity detection
- Bias checking
- Hallucination detection
Integration Features
- LangChain integration
- Streaming support
- Automatic retries
- Correction strategies
Usage
Basic Setup
from guardrails import Guard
from guardrails.hub import ValidJson, ToxicLanguage, DetectPII
# Create guard with validators
guard = Guard().use_many(
ValidJson(),
ToxicLanguage(on_fail="fix"),
DetectPII(on_fail="fix")
)
# Use with LLM
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4")
result = guard(
llm,
prompt="Generate a product description for a laptop",
max_tokens=500
)
print(result.validated_output)
Schema-Based Validation
from guardrails import Guard
from pydantic import BaseModel, Field
from typing import List
class ProductReview(BaseModel):
"""Schema for product review output."""
rating: int = Field(ge=1, le=5, description="Rating from 1-5")
summary: str = Field(max_length=200, description="Brief summary")
pros: List[str] = Field(min_items=1, max_items=5)
cons: List[str] = Field(min_items=1, max_items=5)
recommendation: bool
# Create guard from schema
guard = Guard.from_pydantic(ProductReview)
result = guard(
llm,
prompt="""Analyze this product and provide a structured review:
Product: Wireless Noise-Canceling Headphones
Price: $299
Features: 30hr battery, ANC, Bluetooth 5.3
""",
)
# Result is a validated ProductReview instance
review = result.validated_output
print(f"Rating: {review.rating}")
print(f"Summary: {review.summary}")
Using Guardrails Hub Validators
from guardrails import Guard
from guardrails.hub import (
CompetitorCheck,
ProfanityFree,
ReadingTime,
RestrictToTopic,
SensitiveTopic,
ToxicLanguage,
ValidJson,
ValidLength
)
# Install validators from hub
# guardrails hub install hub://guardrails/toxic_language
# Compose multiple validators
guard = Guard().use_many(
ValidJson(on_fail="reask"),
ToxicLanguage(threshold=0.8, on_fail="fix"),
ProfanityFree(on_fail="fix"),
ValidLength(min=100, max=1000, on_fail="reask"),
RestrictToTopic(
valid_topics=["technology", "software"],
on_fail="reask"
)
)
Custom Validators
from guardrails import Validator, register_validator
from guardrails.validators import ValidationResult
@register_validator(name="custom/no-urls", data_type="string")
class NoURLs(Validator):
"""Validator that checks for URLs in text."""
def validate(self, value: str, metadata: dict) -> ValidationResult:
import re
url_pattern = r'https?://\S+'
if re.search(url_pattern, value):
return ValidationResult(
outcome="fail",
error_message="Text contains URLs which are not allowed",
fix_value=re.sub(url_pattern, "[URL REMOVED]", value)
)
return ValidationResult(outcome="pass")
# Use custom validator
guard = Guard().use(NoURLs(on_fail="fix"))
Prompt Injection Defense
from guardrails import Guard
from guardrails.hub import DetectPromptInjection
# Create input guard for prompt injection
input_guard = Guard().use(
DetectPromptInjection(
on_fail="exception",
threshold=0.9
)
)
def safe_chat(user_input: str) -> str:
# Validate input first
try:
input_guard.validate(user_input)
except Exception as e:
return "I cannot process that request."
# Process safe input
return llm.invoke(user_input)
Integration with NeMo Guardrails
from guardrails import Guard
from nemoguardrails import LLMRails, RailsConfig
# Combine Guardrails AI with NeMo Guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config)
# Use Guardrails AI for structured output
output_guard = Guard.from_pydantic(OutputSchema)
async def guarded_chat(user_input: str) -> dict:
# NeMo handles dialogue safety
response = await rails.generate_async(
messages=[{"role": "user", "content": user_input}]
)
# Guardrails AI validates structure
validated = output_guard.validate(response["content"])
return validated.validated_output
Task Definition
const guardrailsAISetupTask = defineTask({
name: 'guardrails-ai-setup',
description: 'Configure Guardrails AI validation for LLM application',
inputs: {
outputSchema: { type: 'object', required: false },
validators: { type: 'array', required: true },
onFailStrategy: { type: 'string', default: 'reask' }, // 'reask', 'fix', 'exception', 'filter'
maxRetries: { type: 'number', default: 3 },
enableInputValidation: { type: 'boolean', default: true },
enableOutputValidation: { type: 'boolean', default: true }
},
outputs: {
guardConfigured: { type: 'boolean' },
validatorsInstalled: { type: 'array' },
artifacts: { type: 'array' }
},
async run(inputs, taskCtx) {
return {
kind: 'skill',
title: 'Configure Guardrails AI validation',
skill: {
name: 'guardrails-ai-setup',
context: {
outputSchema: inputs.outputSchema,
validators: inputs.validators,
onFailStrategy: inputs.onFailStrategy,
maxRetries: inputs.maxRetries,
enableInputValidation: inputs.enableInputValidation,
enableOutputValidation: inputs.enableOutputValidation,
instructions: [
'Install Guardrails AI package and hub validators',
'Define output schema if structured output needed',
'Configure selected validators with failure strategies',
'Set up input validation for prompt injection defense',
'Configure output validation for content safety',
'Implement retry logic with correction strategies',
'Test validation pipeline with sample inputs/outputs',
'Document validation rules and expected behaviors'
]
}
},
io: {
inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
}
};
}
});
Applicable Processes
- system-prompt-guardrails
- prompt-injection-defense
- content-moderation-safety
- chatbot-design-implementation
External Dependencies
- guardrails-ai Python package
- Guardrails Hub account (for hub validators)
- LLM provider (OpenAI, Anthropic, etc.)
- Optional: NeMo Guardrails for dialogue safety
References
- Guardrails AI GitHub
- Guardrails AI Documentation
- Guardrails Hub
- NVIDIA NeMo Guardrails
- OpenAI Guardrails Python
Related Skills
- SK-SAF-001 content-moderation-api
- SK-SAF-003 nemo-guardrails
- SK-SAF-004 prompt-injection-detector
- SK-SAF-005 pii-redaction
Related Agents
- AG-SAF-001 safety-auditor
- AG-SAF-002 prompt-injection-defender
- AG-PE-001 system-prompt-engineer
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/guardrails-ai-setup/SKILL.mdView on GitHub Overview
Configure the Guardrails AI validation framework to ensure LLM outputs meet quality, safety, and structural requirements. It supports input validation before LLM calls, output validation after responses, and strict enforcement of structured outputs, with pre-built validators from Guardrails Hub and support for custom validators and automatic retry/correction.
How This Skill Works
Create a Guard instance with a stack of validators (e.g., ValidJson, ToxicLanguage, DetectPII). The guard validates inputs before invoking the LLM and validates outputs after, enforcing structured schemas and applying correction strategies or retries as needed. It can integrate with LangChain, support streaming, and handle automatic retries.
When to Use It
- Sanitize and validate user input before passing it to an LLM
- Enforce structured output formats like JSON or XML for downstream systems
- Apply safety checks such as toxicity, PII detection, and content moderation on both input and output
- Leverage LangChain integration for end-to-end workflows and streaming support
- Enable automatic retries or corrections when validation fails
Quick Start
- Step 1: Import Guard and validators and create a Guard instance
- Step 2: Compose validators with use_many(ValidJson(), ToxicLanguage(on_fail='fix'), DetectPII(on_fail='fix'))
- Step 3: Wrap your LLM call with guard(...) and read result.validated_output
Best Practices
- Start with core validators (ValidJson, DetectPII, ToxicLanguage) and iterate by adding more as needed
- Use schema-based validation (Guard.from_pydantic) for critical outputs to ensure correctness
- Tune on_fail behaviors (reask, fix) to balance user experience with safety requirements
- Test validators against edge cases like prompt injections and malformed outputs
- Monitor results and adjust validators/schemas based on real-world feedback
Example Use Cases
- Chat assistant: validate JSON output and detect PII before returning results
- Product review extraction: schema-based validation with a Pydantic model
- Hub validators: profanity-free and topic-restricted responses
- Custom validator NoURLs: block URLs in generated text
- LangChain integration: end-to-end safe workflows with streaming and retries