content-moderation-api
Scannednpx machina-cli add skill a5c-ai/babysitter/content-moderation-api --openclawFiles (1)
SKILL.md
1.2 KB
Content Moderation API Skill
Capabilities
- Integrate OpenAI Moderation API
- Set up Perspective API for toxicity detection
- Configure moderation thresholds
- Implement content filtering pipelines
- Design moderation response handling
- Create moderation logging and reporting
Target Processes
- content-moderation-safety
- system-prompt-guardrails
Implementation Details
Moderation APIs
- OpenAI Moderation: Hate, violence, self-harm, sexual content
- Perspective API: Toxicity, insult, profanity, threat
- Azure Content Safety: Text and image moderation
- LlamaGuard: Open-source safety classifier
Configuration Options
- API credentials and endpoints
- Category thresholds
- Action policies (block, warn, flag)
- Logging configuration
- Fallback behavior
Best Practices
- Set appropriate thresholds
- Handle edge cases gracefully
- Log moderation decisions
- Regular threshold review
- Multi-layer moderation
Dependencies
- openai
- google-cloud-language (Perspective)
- azure-ai-contentsafety
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/content-moderation-api/SKILL.mdView on GitHub Overview
Integrate multiple moderation APIs (OpenAI Moderation, Perspective, Azure Content Safety, and LlamaGuard) to detect and filter unsafe content. It supports configurable thresholds and action policies, plus logging and reporting to support compliance and audits. This makes it easy to enforce safe interactions across chat, forums, and user-generated content.
How This Skill Works
Content is sent to a suite of moderation APIs that classify for hate, violence, self-harm, toxicity, profanity, and more. Based on category thresholds, the system applies actions (block, warn, flag) and routes content to logs and dashboards. It supports fallback behavior and centralized policy management for consistency.
When to Use It
- Moderating user-generated messages in chat applications to block hate speech and self-harm content.
- Enforcing guardrails around system prompts and assistant outputs to prevent unsafe instructions.
- Filtering text and image content with Azure Content Safety to handle multimedia submissions.
- Tuning thresholds per product or community to balance safety and user experience.
- Auditing, logging, and reporting moderation decisions for compliance and reviews.
Quick Start
- Step 1: Acquire API keys and endpoints for OpenAI Moderation, Perspective API, Azure Content Safety, and (optionally) LlamaGuard.
- Step 2: Configure category thresholds and action policies (block, warn, flag), logging, and fallback behavior.
- Step 3: Integrate into your moderation pipeline, enable multi-layer moderation, and test with edge-case content samples.
Best Practices
- Set appropriate thresholds
- Handle edge cases gracefully
- Log moderation decisions
- Regular threshold review
- Multi-layer moderation
Example Use Cases
- Moderating comments on a social app or forum
- Moderation in a customer support chatbot
- Detecting self-harm, violence, or hate in messages
- Reviewing user reviews or feedback for policy violations
- Image and text moderation for user-submitted media via Azure Content Safety
Frequently Asked Questions
Add this skill to your agents