llm-classifier
npx machina-cli add skill a5c-ai/babysitter/llm-classifier --openclawFiles (1)
SKILL.md
1.2 KB
LLM Classifier Skill
Capabilities
- Implement zero-shot classification with LLMs
- Design few-shot classification prompts
- Configure structured output for labels
- Implement confidence scoring
- Design classification taxonomies
- Handle multi-label classification
Target Processes
- intent-classification-system
- dialogue-flow-design
Implementation Details
Classification Patterns
- Zero-Shot: No examples, description-based
- Few-Shot: Example-based classification
- Structured Output: JSON schema for labels
- Chain-of-Thought: Reasoning before classification
- Ensemble: Multiple prompts/models
Configuration Options
- LLM model selection
- Label descriptions
- Example selection strategy
- Output format specification
- Confidence calibration
Best Practices
- Clear label descriptions
- Representative examples
- Consistent output format
- Calibrate confidence scores
- Test with edge cases
Dependencies
- langchain-core
- LLM provider
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/llm-classifier/SKILL.mdView on GitHub Overview
The LLM Classifier Skill enables zero-shot or few-shot intent labeling with structured JSON output and confidence scores. It supports multi-label classification, customizable label descriptions, and ensemble patterns, making it suitable for flexible dialogue routing and intent detection in complex conversations.
How This Skill Works
It constructs prompts in zero-shot mode (no examples) or few-shot mode (with labeled examples), optionally using chain-of-thought and ensemble prompts. Outputs are a JSON schema of labels with optional confidence scores, enabling reliable downstream routing and decision logic.
When to Use It
- Designing an intent classifier for a conversational agent with a flexible or expanding taxonomy.
- Need multi-label classification where a single utterance maps to multiple intents (e.g., greeting + product inquiry).
- Calibrating confidence scores to determine when to trigger automated responses vs. human review.
- Comparing different LLM providers or prompts by enforcing a consistent output format across models.
- Designing a dialogue flow that relies on labeled intents and their descriptions for routing decisions.
Quick Start
- Step 1: Define a taxonomy with clear label descriptions and decide if multi-label is required.
- Step 2: Choose zero-shot or few-shot pattern, craft prompts, and specify the JSON output and confidence fields.
- Step 3: Run with LangChain-core, test on edge cases, calibrate confidence, and iterate on prompts and labels.
Best Practices
- Create clear, unambiguous label descriptions to improve consistency across prompts.
- Use representative examples that cover common and edge-case phrases for robust few-shot prompts.
- Maintain a consistent JSON output format to simplify downstream parsing and routing.
- Calibrate confidence scores and set actionable thresholds for automation vs. escalation.
- Test with edge cases and conflicting intents to ensure reliable disambiguation.
Example Use Cases
- E-commerce chatbot routes: classify as 'order-status', 'returns', or 'shipping-info' to direct to the right workflow.
- Multi-label utterance: an input is tagged with both 'greeting' and 'product-question' for contextual handling.
- Edge-case classification: user mentions 'refund' in a complaint—classify with high confidence and escalate if needed.
- Ensemble scenario: combine prompts from multiple models to stabilize frequent intents like 'billing' or 'tech-support'.
- Confidence-calibrated routing: low-confidence intents trigger human review while high-confidence intents proceed automatically.
Frequently Asked Questions
Add this skill to your agents