What should I configure first?

Provide a trainset of 10-50 labeled examples, a validation metric, and reasonable max_bootstrapped_demos and max_labeled_demos; optionally set metric_threshold and teacher_settings.

How do I deploy the optimized program?

Use compiled_qa and its save method (compile, then compiled_qa.save) to persist the optimized demos for production.

dspy-bootstrap-fewshot

Scanned

npx machina-cli add skill OmidZamani/dspy-skills/dspy-bootstrap-fewshot --openclaw

Files (1)

SKILL.md

5.2 KB

DSPy Bootstrap Few-Shot Optimizer

Goal

Automatically generate and select optimal few-shot demonstrations for your DSPy program using a teacher model.

When to Use

You have 10-50 labeled examples
Manual example selection is tedious or suboptimal
You want demonstrations with reasoning traces
Quick optimization without extensive compute

Related Skills

For more data (200+ examples): dspy-miprov2-optimizer
For agentic systems: dspy-gepa-reflective
Measure improvements: dspy-evaluation-suite

Inputs

Input	Type	Description
`program`	`dspy.Module`	Your DSPy program to optimize
`trainset`	`list[dspy.Example]`	Training examples
`metric`	`callable`	Evaluation function
`metric_threshold`	`float`	Numerical threshold for accepting demos (optional)
`max_bootstrapped_demos`	`int`	Max teacher-generated demos (default: 4)
`max_labeled_demos`	`int`	Max direct labeled demos (default: 16)
`max_rounds`	`int`	Max bootstrapping attempts per example (default: 1)
`teacher_settings`	`dict`	Configuration for teacher model (optional)

Outputs

Output	Type	Description
`compiled_program`	`dspy.Module`	Optimized program with demos

Workflow

Phase 1: Setup

import dspy
from dspy.teleprompt import BootstrapFewShot

# Configure LMs
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

Phase 2: Define Program and Metric

class QA(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.generate(question=question)

def validate_answer(example, pred, trace=None):
    return example.answer.lower() in pred.answer.lower()

Phase 3: Compile

optimizer = BootstrapFewShot(
    metric=validate_answer,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
    teacher_settings={'lm': dspy.LM("openai/gpt-4o")}
)

compiled_qa = optimizer.compile(QA(), trainset=trainset)

Phase 4: Use and Save

# Use optimized program
result = compiled_qa(question="What is photosynthesis?")

# Save for production (state-only, recommended)
compiled_qa.save("qa_optimized.json", save_program=False)

Production Example

import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ProductionQA(dspy.Module):
    def __init__(self):
        self.cot = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question: str):
        try:
            return self.cot(question=question)
        except Exception as e:
            logger.error(f"Generation failed: {e}")
            return dspy.Prediction(answer="Unable to answer")

def robust_metric(example, pred, trace=None):
    if not pred.answer or pred.answer == "Unable to answer":
        return 0.0
    return float(example.answer.lower() in pred.answer.lower())

def optimize_with_bootstrap(trainset, devset):
    """Full optimization pipeline with validation."""
    
    # Baseline
    baseline = ProductionQA()
    evaluator = Evaluate(devset=devset, metric=robust_metric, num_threads=4)
    baseline_score = evaluator(baseline)
    logger.info(f"Baseline: {baseline_score:.2%}")
    
    # Optimize
    optimizer = BootstrapFewShot(
        metric=robust_metric,
        max_bootstrapped_demos=4,
        max_labeled_demos=4
    )
    
    compiled = optimizer.compile(baseline, trainset=trainset)
    optimized_score = evaluator(compiled)
    logger.info(f"Optimized: {optimized_score:.2%}")
    
    if optimized_score > baseline_score:
        compiled.save("production_qa.json", save_program=False)
        return compiled
    
    logger.warning("Optimization didn't improve; keeping baseline")
    return baseline

Best Practices

Quality over quantity - 10 excellent examples beat 100 noisy ones
Use stronger teacher - GPT-4 as teacher for GPT-3.5 student
Validate with held-out set - Always test on unseen data
Start with 4 demos - More isn't always better

Limitations

Requires labeled training data
Teacher model costs can add up
May not generalize to very different inputs
Limited exploration compared to MIPROv2

Official Documentation

DSPy Documentation: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
BootstrapFewShot API: https://dspy.ai/api/optimizers/BootstrapFewShot/
Optimization Guide: https://dspy.ai/learn/optimization/optimizers/

Source

git clone https://github.com/OmidZamani/dspy-skills/blob/master/skills/dspy-bootstrap-fewshot/SKILL.mdView on GitHub

Overview

This DSPy skill automatically generates and selects optimal few-shot demonstrations for your DSPy program using a teacher model. It targets 10-50 labeled examples and aims to deliver demonstrations with reasoning traces while minimizing compute.

How This Skill Works

Configure an LM, define your DSPy program and a validation metric, then run BootstrapFewShot to generate and evaluate candidate demos. The best demos are compiled into a program via compiled_program for deployment or saving.

When to Use It

You have 10-50 labeled examples
Manual example selection is tedious or suboptimal
You want demonstrations with reasoning traces
Need quick optimization without extensive compute
Need automatic demonstration generation for a DSPy program

Quick Start

Step 1: Configure LM and prepare trainset, metric, and options
Step 2: Instantiate BootstrapFewShot with metric and demo limits
Step 3: Call compile to get compiled_qa and save if needed

Best Practices

Use a metric that aligns with final task success and set metric_threshold if appropriate
Start with conservative counts (max_bootstrapped_demos and max_labeled_demos) and adjust based on results
Ensure trainset covers diverse cases to avoid bias in demos
Enable reasoning traces in demonstrations when interpretability matters
Validate improvements with a separate devset or evaluation suite

Example Use Cases

QA DSPy program improved by bootstrapping 4 demos to boost accuracy
Code-generation DSPy workflow enhanced with reasoning-trace demonstrations
Visual QA tasks optimized with 10-50 labeled examples and bootstrapped demos
Production pipeline uses a robust_metric and saves the optimized program
Teacher model configured via teacher_settings to balance latency and quality

Frequently Asked Questions

Add this skill to your agents

dspy-bootstrap-fewshot

DSPy Bootstrap Few-Shot Optimizer

Goal

When to Use

Related Skills

Inputs

Outputs

Workflow

Phase 1: Setup

Phase 2: Define Program and Metric

Phase 3: Compile

Phase 4: Use and Save

Production Example

Best Practices

Limitations

Official Documentation

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is dspy-bootstrap-fewshot?

What should I configure first?

How do I deploy the optimized program?