How many training examples are recommended for MIPROv2 optimization?

Typically 200+ examples are recommended; for smaller datasets, consider related skills like dspy-bootstrap-fewshot.

How many optimization trials should I run?

40+ trials are suggested for production-grade optimization; use heavy presets for maximum performance if compute allows (100+ trials).

What is the output of the optimizer?

The optimizer returns a compiled_program, a fully optimized dspy.Module ready for deployment or evaluation.

dspy-miprov2-optimizer

Use Caution

npx machina-cli add skill OmidZamani/dspy-skills/dspy-miprov2-optimizer --openclaw

Files (1)

SKILL.md

5.5 KB

DSPy MIPROv2 Optimizer

Goal

Jointly optimize instructions and few-shot demonstrations using Bayesian Optimization for maximum performance.

When to Use

You have 200+ training examples
You can afford longer optimization runs (40+ trials)
You need state-of-the-art performance
Both instructions and demos need tuning

Related Skills

For limited data (10-50 examples): dspy-bootstrap-fewshot
For agentic systems: dspy-gepa-reflective
Measure improvements: dspy-evaluation-suite

Inputs

Input	Type	Description
`program`	`dspy.Module`	Program to optimize
`trainset`	`list[dspy.Example]`	200+ training examples
`metric`	`callable`	Evaluation function
`auto`	`str`	"light", "medium", or "heavy"
`num_trials`	`int`	Optimization trials (40+)

Outputs

Output	Type	Description
`compiled_program`	`dspy.Module`	Fully optimized program

Workflow

Three-Stage Process

Bootstrap - Generate candidate demonstrations
Propose - Create grounded instruction candidates
Search - Bayesian optimization over combinations

Phase 1: Setup

import dspy
from dspy.teleprompt import MIPROv2

lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)

Phase 2: Define Program

class RAGAgent(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

Phase 3: Optimize

from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=dspy.evaluate.answer_exact_match,
    auto="medium",  # Balanced optimization
    num_threads=24
)

compiled = optimizer.compile(RAGAgent(), trainset=trainset)

Auto Presets

Preset	Trials	Use Case
`"light"`	~10	Quick iteration
`"medium"`	~40	Production optimization
`"heavy"`	~100+	Maximum performance

Production Example

import dspy
from dspy.teleprompt import MIPROv2
from dspy.evaluate import Evaluate
import json
import logging

logger = logging.getLogger(__name__)

class ReActAgent(dspy.Module):
    def __init__(self, tools):
        self.react = dspy.ReAct("question -> answer", tools=tools)
    
    def forward(self, question):
        return self.react(question=question)

def search_tool(query: str) -> list[str]:
    """Search knowledge base."""
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [r['long_text'] for r in results]

def optimize_agent(trainset, devset):
    """Full MIPROv2 optimization pipeline."""
    
    agent = ReActAgent(tools=[search_tool])
    
    # Baseline evaluation
    evaluator = Evaluate(
        devset=devset,
        metric=dspy.evaluate.answer_exact_match,
        num_threads=8
    )
    baseline = evaluator(agent)
    logger.info(f"Baseline: {baseline:.2%}")
    
    # MIPROv2 optimization
    optimizer = MIPROv2(
        metric=dspy.evaluate.answer_exact_match,
        auto="medium",
        num_threads=24,
        # Custom settings
        num_candidates=15,
        max_bootstrapped_demos=4,
        max_labeled_demos=8
    )
    
    compiled = optimizer.compile(agent, trainset=trainset)
    optimized = evaluator(compiled)
    logger.info(f"Optimized: {optimized:.2%}")
    
    # Save with metadata
    compiled.save("agent_mipro.json")
    
    metadata = {
        "baseline_score": baseline,
        "optimized_score": optimized,
        "improvement": optimized - baseline,
        "num_train": len(trainset),
        "num_dev": len(devset)
    }
    
    with open("optimization_metadata.json", "w") as f:
        json.dump(metadata, f, indent=2)
    
    return compiled, metadata

Instruction-Only Mode

from dspy.teleprompt import MIPROv2

# Disable demos for pure instruction optimization
optimizer = MIPROv2(
    metric=metric,
    auto="medium",
    max_bootstrapped_demos=0,
    max_labeled_demos=0
)

Best Practices

Data quantity matters - 200+ examples for best results
Use auto presets - Start with "medium", adjust based on results
Parallel threads - Use num_threads=24 or higher if available
Monitor costs - Track API usage during optimization
Save intermediate - Bayesian search saves progress

Limitations

High computational cost (many LLM calls)
Requires substantial training data
Optimization time: hours for "heavy" preset
Memory intensive for large candidate sets

Official Documentation

DSPy Documentation: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
MIPROv2 API: https://dspy.ai/api/optimizers/MIPROv2/
Optimizers Guide: https://dspy.ai/learn/optimization/optimizers/

Source

git clone https://github.com/OmidZamani/dspy-skills/blob/master/skills/dspy-miprov2-optimizer/SKILL.mdView on GitHub

Overview

This skill jointly optimizes DSPy instructions and few-shot demonstrations using Bayesian optimization via MIPROv2 to squeeze maximum performance. It targets 200+ training examples and longer runs to reach state-of-the-art results, delivering a fully optimized compiled_program.

How This Skill Works

It uses a three-stage workflow: Bootstrap generates candidate demonstrations, Propose creates grounded instruction candidates, and Search runs Bayesian optimization over combinations. You configure MIPROv2 with a metric, an auto preset, and multi-threading, then call compile on your DSPy program with the provided trainset to produce compiled_program. The process yields a fully optimized DSPy.Module ready for evaluation and deployment.

When to Use It

You have 200+ training examples and want scalable optimization.
You can afford longer optimization runs (40+ trials) for higher performance.
You need state-of-the-art DSPy performance on a complex task.
Both instructions and demos require tuning for the best results.
You want to maximize performance from a DSPy program with substantial training data.

Quick Start

Step 1: Setup MIPROv2 with your LM (e.g., configure your language model and dspy).
Step 2: Define your DSPy program class (e.g., RAGAgent with retrieve and generate components).
Step 3: Run optimization: create MIPROv2 with metric and auto, then compile with your trainset to obtain compiled.

Best Practices

Ensure your trainset contains 200+ high-quality examples before starting.
Choose an auto preset (light, medium, or heavy) aligned to compute budget.
Allocate enough trials (40+ for production-grade optimization).
Run a baseline evaluation and compare against the optimized compiled_program.
Monitor the metric you pass to MIPROv2 (e.g., answer_exact_match) for meaningful gains.

Example Use Cases

Optimizing a RAGAgent by jointly tuning its retrieve+generate prompts using 200+ examples.
Tuning a ReActAgent setup with tools and a designed demo set in production.
Running a 40+ trial MIPROv2 optimization to maximize answer_exact_match on a QA task.
Using auto='heavy' and 24+ threads for maximum-performance tuning on large datasets.
Evaluating improvements with an explicit baseline versus the MIPROv2-optimized program.

Frequently Asked Questions

Add this skill to your agents