What is the difference between zero-shot and few-shot classification in this skill?

Zero-shot relies on label descriptions without examples, while few-shot includes labeled examples to guide predictions and improve accuracy.

How is the output formatted and used?

Output is a structured JSON schema listing labels (and optional confidences) to enable deterministic downstream routing and decision logic.

How do I calibrate confidence scores effectively?

Tune the confidence calibration method, adjust thresholds, and use representative, edge-case examples in prompts to reflect real-world uncertainty.

llm-classifier

npx machina-cli add skill a5c-ai/babysitter/llm-classifier --openclaw

Files (1)

SKILL.md

1.2 KB

LLM Classifier Skill

Capabilities

Implement zero-shot classification with LLMs
Design few-shot classification prompts
Configure structured output for labels
Implement confidence scoring
Design classification taxonomies
Handle multi-label classification

Target Processes

intent-classification-system
dialogue-flow-design

Implementation Details

Classification Patterns

Zero-Shot: No examples, description-based
Few-Shot: Example-based classification
Structured Output: JSON schema for labels
Chain-of-Thought: Reasoning before classification
Ensemble: Multiple prompts/models

Configuration Options

LLM model selection
Label descriptions
Example selection strategy
Output format specification
Confidence calibration

Best Practices

Clear label descriptions
Representative examples
Consistent output format
Calibrate confidence scores
Test with edge cases

Dependencies

langchain-core
LLM provider

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/llm-classifier/SKILL.md

View on GitHub

Overview

The LLM Classifier Skill enables zero-shot or few-shot intent labeling with structured JSON output and confidence scores. It supports multi-label classification, customizable label descriptions, and ensemble patterns, making it suitable for flexible dialogue routing and intent detection in complex conversations.

How This Skill Works

It constructs prompts in zero-shot mode (no examples) or few-shot mode (with labeled examples), optionally using chain-of-thought and ensemble prompts. Outputs are a JSON schema of labels with optional confidence scores, enabling reliable downstream routing and decision logic.

When to Use It

Designing an intent classifier for a conversational agent with a flexible or expanding taxonomy.
Need multi-label classification where a single utterance maps to multiple intents (e.g., greeting + product inquiry).
Calibrating confidence scores to determine when to trigger automated responses vs. human review.
Comparing different LLM providers or prompts by enforcing a consistent output format across models.
Designing a dialogue flow that relies on labeled intents and their descriptions for routing decisions.

Quick Start

Step 1: Define a taxonomy with clear label descriptions and decide if multi-label is required.
Step 2: Choose zero-shot or few-shot pattern, craft prompts, and specify the JSON output and confidence fields.
Step 3: Run with LangChain-core, test on edge cases, calibrate confidence, and iterate on prompts and labels.

Best Practices

Create clear, unambiguous label descriptions to improve consistency across prompts.
Use representative examples that cover common and edge-case phrases for robust few-shot prompts.
Maintain a consistent JSON output format to simplify downstream parsing and routing.
Calibrate confidence scores and set actionable thresholds for automation vs. escalation.
Test with edge cases and conflicting intents to ensure reliable disambiguation.

Example Use Cases

E-commerce chatbot routes: classify as 'order-status', 'returns', or 'shipping-info' to direct to the right workflow.
Multi-label utterance: an input is tagged with both 'greeting' and 'product-question' for contextual handling.
Edge-case classification: user mentions 'refund' in a complaint—classify with high confidence and escalate if needed.
Ensemble scenario: combine prompts from multiple models to stabilize frequent intents like 'billing' or 'tech-support'.
Confidence-calibrated routing: low-confidence intents trigger human review while high-confidence intents proceed automatically.

Frequently Asked Questions

Add this skill to your agents