Get the FREE Ultimate OpenClaw Setup Guide →

content-moderation-api

Scanned
npx machina-cli add skill a5c-ai/babysitter/content-moderation-api --openclaw
Files (1)
SKILL.md
1.2 KB

Content Moderation API Skill

Capabilities

  • Integrate OpenAI Moderation API
  • Set up Perspective API for toxicity detection
  • Configure moderation thresholds
  • Implement content filtering pipelines
  • Design moderation response handling
  • Create moderation logging and reporting

Target Processes

  • content-moderation-safety
  • system-prompt-guardrails

Implementation Details

Moderation APIs

  1. OpenAI Moderation: Hate, violence, self-harm, sexual content
  2. Perspective API: Toxicity, insult, profanity, threat
  3. Azure Content Safety: Text and image moderation
  4. LlamaGuard: Open-source safety classifier

Configuration Options

  • API credentials and endpoints
  • Category thresholds
  • Action policies (block, warn, flag)
  • Logging configuration
  • Fallback behavior

Best Practices

  • Set appropriate thresholds
  • Handle edge cases gracefully
  • Log moderation decisions
  • Regular threshold review
  • Multi-layer moderation

Dependencies

  • openai
  • google-cloud-language (Perspective)
  • azure-ai-contentsafety

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/content-moderation-api/SKILL.mdView on GitHub

Overview

Integrate multiple moderation APIs (OpenAI Moderation, Perspective, Azure Content Safety, and LlamaGuard) to detect and filter unsafe content. It supports configurable thresholds and action policies, plus logging and reporting to support compliance and audits. This makes it easy to enforce safe interactions across chat, forums, and user-generated content.

How This Skill Works

Content is sent to a suite of moderation APIs that classify for hate, violence, self-harm, toxicity, profanity, and more. Based on category thresholds, the system applies actions (block, warn, flag) and routes content to logs and dashboards. It supports fallback behavior and centralized policy management for consistency.

When to Use It

  • Moderating user-generated messages in chat applications to block hate speech and self-harm content.
  • Enforcing guardrails around system prompts and assistant outputs to prevent unsafe instructions.
  • Filtering text and image content with Azure Content Safety to handle multimedia submissions.
  • Tuning thresholds per product or community to balance safety and user experience.
  • Auditing, logging, and reporting moderation decisions for compliance and reviews.

Quick Start

  1. Step 1: Acquire API keys and endpoints for OpenAI Moderation, Perspective API, Azure Content Safety, and (optionally) LlamaGuard.
  2. Step 2: Configure category thresholds and action policies (block, warn, flag), logging, and fallback behavior.
  3. Step 3: Integrate into your moderation pipeline, enable multi-layer moderation, and test with edge-case content samples.

Best Practices

  • Set appropriate thresholds
  • Handle edge cases gracefully
  • Log moderation decisions
  • Regular threshold review
  • Multi-layer moderation

Example Use Cases

  • Moderating comments on a social app or forum
  • Moderation in a customer support chatbot
  • Detecting self-harm, violence, or hate in messages
  • Reviewing user reviews or feedback for policy violations
  • Image and text moderation for user-submitted media via Azure Content Safety

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers