pii-redaction
npx machina-cli add skill a5c-ai/babysitter/pii-redaction --openclawFiles (1)
SKILL.md
1.3 KB
PII Redaction Skill
Capabilities
- Detect personally identifiable information
- Implement redaction strategies
- Configure detection for various PII types
- Design reversible anonymization
- Implement compliance logging
- Create audit trails for PII handling
Target Processes
- content-moderation-safety
- system-prompt-guardrails
Implementation Details
PII Types
- Names: Person names, usernames
- Contact: Email, phone, address
- Financial: Credit cards, bank accounts
- Government IDs: SSN, passport, driver's license
- Medical: Health information
- Custom: Organization-specific PII
Redaction Methods
- Masking ([REDACTED])
- Pseudonymization (fake values)
- Tokenization (reversible)
- Encryption
Configuration Options
- PII types to detect
- Detection sensitivity
- Redaction method
- Language support
- Custom patterns
Best Practices
- Comprehensive PII coverage
- Regular pattern updates
- Audit logging
- Compliance alignment
- Testing with diverse data
Dependencies
- presidio-analyzer
- presidio-anonymizer
- spacy
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/pii-redaction/SKILL.mdView on GitHub Overview
The pii-redaction skill detects personally identifiable information in conversational data and applies configurable redaction strategies to protect user privacy. It supports reversible anonymization, audit logging, and language-aware configuration to satisfy privacy and compliance needs.
How This Skill Works
It uses a PII detector to flag sensitive data across defined types, then applies a configured redaction method such as masking, pseudonymization, tokenization, or encryption. Redaction results are logged for compliance, and the system can be tailored to language support and custom patterns for organization-specific PII.
When to Use It
- Moderate user-generated chat content to redact emails, phone numbers, and names before storage or display.
- Configure system prompts and guardrails to prevent PII leakage to users or agents.
- Store or transmit chat logs for regulatory compliance with audit trails.
- Handle financial or health-related conversations by applying redaction or anonymization to sensitive fields.
- Test and validate redaction coverage with diverse data patterns and languages to ensure robustness.
Quick Start
- Step 1: Define PII types to detect and set the detection sensitivity ( Names, Contact, Financial, Government IDs, Medical, Custom ).
- Step 2: Choose a redaction method ( masking, pseudonymization, tokenization, or encryption ) and wire it into the redaction pipeline.
- Step 3: Enable compliance logging and run tests with diverse data to verify coverage, reversibility where configured, and audit trails.
Best Practices
- Comprehensive PII coverage by enumerating all relevant types (Names, Contact, Financial, Government IDs, Medical, Custom).
- Regular pattern updates to keep detection effective against new formats and aliases.
- Audit logging for every redaction event and reversible anonymization activity.
- Compliance alignment with regional data protection laws and internal policies.
- Testing with diverse data, languages, and edge cases to ensure reliable performance.
Example Use Cases
- Redacting emails and phone numbers from customer support transcripts before analytics or storage.
- Masking names and usernames in content moderation logs to protect user identity.
- Pseudonymizing patient identifiers in medical chatbot conversations for research or support.
- Tokenizing or encrypting sensitive fields like account numbers in logs for secure storage.
- Defining and applying custom patterns for organization-specific PII such as employee IDs.
Frequently Asked Questions
Add this skill to your agents