dspy-miprov2-optimizer
Use Cautionnpx machina-cli add skill OmidZamani/dspy-skills/dspy-miprov2-optimizer --openclawDSPy MIPROv2 Optimizer
Goal
Jointly optimize instructions and few-shot demonstrations using Bayesian Optimization for maximum performance.
When to Use
- You have 200+ training examples
- You can afford longer optimization runs (40+ trials)
- You need state-of-the-art performance
- Both instructions and demos need tuning
Related Skills
- For limited data (10-50 examples): dspy-bootstrap-fewshot
- For agentic systems: dspy-gepa-reflective
- Measure improvements: dspy-evaluation-suite
Inputs
| Input | Type | Description |
|---|---|---|
program | dspy.Module | Program to optimize |
trainset | list[dspy.Example] | 200+ training examples |
metric | callable | Evaluation function |
auto | str | "light", "medium", or "heavy" |
num_trials | int | Optimization trials (40+) |
Outputs
| Output | Type | Description |
|---|---|---|
compiled_program | dspy.Module | Fully optimized program |
Workflow
Three-Stage Process
- Bootstrap - Generate candidate demonstrations
- Propose - Create grounded instruction candidates
- Search - Bayesian optimization over combinations
Phase 1: Setup
import dspy
from dspy.teleprompt import MIPROv2
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)
Phase 2: Define Program
class RAGAgent(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.generate(context=context, question=question)
Phase 3: Optimize
from dspy.teleprompt import MIPROv2
optimizer = MIPROv2(
metric=dspy.evaluate.answer_exact_match,
auto="medium", # Balanced optimization
num_threads=24
)
compiled = optimizer.compile(RAGAgent(), trainset=trainset)
Auto Presets
| Preset | Trials | Use Case |
|---|---|---|
"light" | ~10 | Quick iteration |
"medium" | ~40 | Production optimization |
"heavy" | ~100+ | Maximum performance |
Production Example
import dspy
from dspy.teleprompt import MIPROv2
from dspy.evaluate import Evaluate
import json
import logging
logger = logging.getLogger(__name__)
class ReActAgent(dspy.Module):
def __init__(self, tools):
self.react = dspy.ReAct("question -> answer", tools=tools)
def forward(self, question):
return self.react(question=question)
def search_tool(query: str) -> list[str]:
"""Search knowledge base."""
results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
return [r['long_text'] for r in results]
def optimize_agent(trainset, devset):
"""Full MIPROv2 optimization pipeline."""
agent = ReActAgent(tools=[search_tool])
# Baseline evaluation
evaluator = Evaluate(
devset=devset,
metric=dspy.evaluate.answer_exact_match,
num_threads=8
)
baseline = evaluator(agent)
logger.info(f"Baseline: {baseline:.2%}")
# MIPROv2 optimization
optimizer = MIPROv2(
metric=dspy.evaluate.answer_exact_match,
auto="medium",
num_threads=24,
# Custom settings
num_candidates=15,
max_bootstrapped_demos=4,
max_labeled_demos=8
)
compiled = optimizer.compile(agent, trainset=trainset)
optimized = evaluator(compiled)
logger.info(f"Optimized: {optimized:.2%}")
# Save with metadata
compiled.save("agent_mipro.json")
metadata = {
"baseline_score": baseline,
"optimized_score": optimized,
"improvement": optimized - baseline,
"num_train": len(trainset),
"num_dev": len(devset)
}
with open("optimization_metadata.json", "w") as f:
json.dump(metadata, f, indent=2)
return compiled, metadata
Instruction-Only Mode
from dspy.teleprompt import MIPROv2
# Disable demos for pure instruction optimization
optimizer = MIPROv2(
metric=metric,
auto="medium",
max_bootstrapped_demos=0,
max_labeled_demos=0
)
Best Practices
- Data quantity matters - 200+ examples for best results
- Use auto presets - Start with "medium", adjust based on results
- Parallel threads - Use
num_threads=24or higher if available - Monitor costs - Track API usage during optimization
- Save intermediate - Bayesian search saves progress
Limitations
- High computational cost (many LLM calls)
- Requires substantial training data
- Optimization time: hours for "heavy" preset
- Memory intensive for large candidate sets
Official Documentation
- DSPy Documentation: https://dspy.ai/
- DSPy GitHub: https://github.com/stanfordnlp/dspy
- MIPROv2 API: https://dspy.ai/api/optimizers/MIPROv2/
- Optimizers Guide: https://dspy.ai/learn/optimization/optimizers/
Source
git clone https://github.com/OmidZamani/dspy-skills/blob/master/skills/dspy-miprov2-optimizer/SKILL.mdView on GitHub Overview
This skill jointly optimizes DSPy instructions and few-shot demonstrations using Bayesian optimization via MIPROv2 to squeeze maximum performance. It targets 200+ training examples and longer runs to reach state-of-the-art results, delivering a fully optimized compiled_program.
How This Skill Works
It uses a three-stage workflow: Bootstrap generates candidate demonstrations, Propose creates grounded instruction candidates, and Search runs Bayesian optimization over combinations. You configure MIPROv2 with a metric, an auto preset, and multi-threading, then call compile on your DSPy program with the provided trainset to produce compiled_program. The process yields a fully optimized DSPy.Module ready for evaluation and deployment.
When to Use It
- You have 200+ training examples and want scalable optimization.
- You can afford longer optimization runs (40+ trials) for higher performance.
- You need state-of-the-art DSPy performance on a complex task.
- Both instructions and demos require tuning for the best results.
- You want to maximize performance from a DSPy program with substantial training data.
Quick Start
- Step 1: Setup MIPROv2 with your LM (e.g., configure your language model and dspy).
- Step 2: Define your DSPy program class (e.g., RAGAgent with retrieve and generate components).
- Step 3: Run optimization: create MIPROv2 with metric and auto, then compile with your trainset to obtain compiled.
Best Practices
- Ensure your trainset contains 200+ high-quality examples before starting.
- Choose an auto preset (light, medium, or heavy) aligned to compute budget.
- Allocate enough trials (40+ for production-grade optimization).
- Run a baseline evaluation and compare against the optimized compiled_program.
- Monitor the metric you pass to MIPROv2 (e.g., answer_exact_match) for meaningful gains.
Example Use Cases
- Optimizing a RAGAgent by jointly tuning its retrieve+generate prompts using 200+ examples.
- Tuning a ReActAgent setup with tools and a designed demo set in production.
- Running a 40+ trial MIPROv2 optimization to maximize answer_exact_match on a QA task.
- Using auto='heavy' and 24+ threads for maximum-performance tuning on large datasets.
- Evaluating improvements with an explicit baseline versus the MIPROv2-optimized program.