What is BootstrapFinetune?

An experimental DSPy optimizer that distills a teacher program into fine-tuned student weights for production efficiency.

What inputs are required?

The teacher program (dspy.Module), a training set (list[dspy.Example]), an optional metric (callable), and training hyperparameters (train_kwargs).

How do I deploy the finetuned model?

Save the finetuned model using finetuned.save and then load it by recreating the architecture (e.g., loaded = TeacherQA(); loaded.load(path)); run inference with loaded.

dspy-finetune-bootstrap

Flagged

{"isSafe":false,"isSuspicious":true,"riskLevel":"medium","findings":[{"category":"suspicious_url","severity":"medium","description":"Uses an IP-based external retriever endpoint over HTTP (http://20.102.90.50:2017/wiki17_abstracts). Non-HTTPS and uses an IP address, which can be vulnerable to MITM and may indicate external data retrieval. Review legitimacy, data handling, and access controls.","evidence":"colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')"}],"summary":"The content contains a suspicious external URL: an IP-based HTTP endpoint used for a retriever. No explicit malicious commands or payloads detected, but the external endpoint warrants review for security and data privacy considerations."}

npx machina-cli add skill OmidZamani/dspy-skills/dspy-finetune-bootstrap --openclaw

Files (1)

SKILL.md

7.2 KB

DSPy BootstrapFinetune Optimizer

Goal

Distill a DSPy program into fine-tuned model weights for efficient production deployment.

When to Use

You have a working DSPy program with a large model
Need to reduce inference costs
Want faster responses (smaller model)
Deploying to resource-constrained environments

Inputs

Input	Type	Description
`program`	`dspy.Module`	Teacher program to distill
`trainset`	`list[dspy.Example]`	Training examples
`metric`	`callable`	Validation metric (optional)
`train_kwargs`	`dict`	Training hyperparameters

Outputs

Output	Type	Description
`finetuned_program`	`dspy.Module`	Program with fine-tuned weights
`model_path`	`str`	Path to saved model

Workflow

Phase 1: Prepare Teacher Program

import dspy

# Configure with strong teacher model
dspy.configure(lm=dspy.LM("openai/gpt-4o"))

class TeacherQA(dspy.Module):
    def __init__(self):
        self.cot = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.cot(question=question)

Phase 2: Enable Experimental Features & Generate Training Traces

BootstrapFinetune is experimental and requires enabling the flag:

import dspy
from dspy.teleprompt import BootstrapFinetune

# Enable experimental features
dspy.settings.experimental = True

optimizer = BootstrapFinetune(
    metric=lambda gold, pred, trace=None: gold.answer.lower() in pred.answer.lower(),
    train_kwargs={
        'learning_rate': 5e-5,
        'num_train_epochs': 3,
        'per_device_train_batch_size': 4,
        'warmup_ratio': 0.1
    }
)

Phase 3: Fine-tune Student Model

finetuned = optimizer.compile(
    TeacherQA(),
    trainset=trainset
)

Phase 4: Deploy

# Save the fine-tuned model (saves state-only by default)
finetuned.save("finetuned_qa_model.json")

# Load and use (must recreate architecture first)
loaded = TeacherQA()
loaded.load("finetuned_qa_model.json")
result = loaded(question="What is machine learning?")

Production Example

import dspy
from dspy.teleprompt import BootstrapFinetune
from dspy.evaluate import Evaluate
import logging
import os

# Enable experimental features
dspy.settings.experimental = True

logger = logging.getLogger(__name__)

class ClassificationSignature(dspy.Signature):
    """Classify text into categories."""
    text: str = dspy.InputField()
    label: str = dspy.OutputField(desc="Category: positive, negative, neutral")

class TextClassifier(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(ClassificationSignature)
    
    def forward(self, text):
        return self.classify(text=text)

def classification_metric(gold, pred, trace=None):
    """Exact label match."""
    gold_label = gold.label.lower().strip()
    pred_label = pred.label.lower().strip() if pred.label else ""
    return gold_label == pred_label

def finetune_classifier(trainset, devset, output_dir="./finetuned_model"):
    """Full fine-tuning pipeline."""
    
    # Configure teacher (strong model)
    dspy.configure(lm=dspy.LM("openai/gpt-4o"))
    
    teacher = TextClassifier()
    
    # Evaluate teacher
    evaluator = Evaluate(devset=devset, metric=classification_metric, num_threads=8)
    teacher_score = evaluator(teacher)
    logger.info(f"Teacher score: {teacher_score:.2%}")

    # Fine-tune (train_kwargs passed to constructor)
    optimizer = BootstrapFinetune(
        metric=classification_metric,
        train_kwargs={
            'learning_rate': 2e-5,
            'num_train_epochs': 3,
            'per_device_train_batch_size': 8,
            'gradient_accumulation_steps': 2,
            'warmup_ratio': 0.1,
            'weight_decay': 0.01,
            'logging_steps': 10,
            'save_strategy': 'epoch',
            'output_dir': output_dir
        }
    )

    finetuned = optimizer.compile(
        teacher,
        trainset=trainset
    )
    
    # Evaluate fine-tuned model
    student_score = evaluator(finetuned)
    logger.info(f"Student score: {student_score:.2%}")

    # Save (state-only as JSON)
    finetuned.save(os.path.join(output_dir, "final_model.json"))

    return {
        "teacher_score": teacher_score,
        "student_score": student_score,
        "model_path": os.path.join(output_dir, "final_model.json")
    }

# For RAG fine-tuning
class RAGClassifier(dspy.Module):
    """RAG pipeline that can be fine-tuned."""
    
    def __init__(self, num_passages=3):
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.classify = dspy.ChainOfThought("context, text -> label")
    
    def forward(self, text):
        context = self.retrieve(text).passages
        return self.classify(context=context, text=text)

def finetune_rag_classifier(trainset, devset):
    """Fine-tune a RAG-based classifier."""

    # Configure retriever and LM
    colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
    dspy.configure(
        lm=dspy.LM("openai/gpt-4o"),
        rm=colbert
    )

    rag = RAGClassifier()

    # Fine-tune (train_kwargs in constructor)
    optimizer = BootstrapFinetune(
        metric=classification_metric,
        train_kwargs={
            'learning_rate': 1e-5,
            'num_train_epochs': 5
        }
    )

    finetuned = optimizer.compile(
        rag,
        trainset=trainset
    )

    return finetuned

Training Arguments Reference

Argument	Description	Typical Value
`learning_rate`	Learning rate	1e-5 to 5e-5
`num_train_epochs`	Training epochs	3-5
`per_device_train_batch_size`	Batch size	4-16
`gradient_accumulation_steps`	Gradient accumulation	2-8
`warmup_ratio`	Warmup proportion	0.1
`weight_decay`	L2 regularization	0.01
`max_grad_norm`	Gradient clipping	1.0

Best Practices

Strong teacher - Use GPT-4 or Claude as teacher
Quality data - Teacher traces are only as good as training examples
Validate improvement - Compare student to teacher on held-out set
Start with more epochs - Fine-tuning often needs 3-5 epochs
Monitor overfitting - Track validation loss during training

Limitations

Requires access to model weights (not API-only models)
Training requires GPU resources
Student may not match teacher quality on all inputs
Fine-tuning takes hours/days depending on data size
Model size reduction may cause capability loss

Official Documentation

DSPy Documentation: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
BootstrapFinetune API: https://dspy.ai/api/optimizers/BootstrapFinetune/
Fine-tuning Guide: https://dspy.ai/tutorials/classification_finetuning/

Source

git clone https://github.com/OmidZamani/dspy-skills/blob/master/skills/dspy-finetune-bootstrap/SKILL.mdView on GitHub

Overview

This skill distills a DSPy program into fine-tuned weights using BootstrapFinetune. It enables production-ready, smaller models with lower inference costs while preserving performance through teacher-student training and distillation.

How This Skill Works

Provide a teacher DSPy module and a trainset, then run the BootstrapFinetune optimizer with a validation metric and train hyperparameters. The process compiles a student model, returning finetuned_program and model_path for deployment.

When to Use It

You have a DSPy program with a large model and want a lighter student version
You need to reduce inference costs without retraining from scratch
You want faster responses from a smaller model in production
Deploying to resource-constrained environments such as edge or mobile
You aim to apply model distillation / teacher-student training to a DSPy program

Quick Start

Step 1: Prepare a teacher DSPy program and enable experimental BootstrapFinetune features (set dspy.settings.experimental = True)
Step 2: Instantiate BootstrapFinetune with a metric and train_kwargs tuned for your data
Step 3: Compile the finetuned model with Teacher and trainset (finetuned = optimizer.compile(TeacherQA(), trainset=trainset)); finetuned.save(finetuned_model.json)

Best Practices

Use a strong teacher model (e.g., a capable LM) and ensure task-aligned training traces
Enable experimental BootstrapFinetune features by setting dspy.settings.experimental = True
Tune train_kwargs (learning_rate, num_train_epochs, per_device_train_batch_size, warmup_ratio) to your data
Provide a meaningful validation metric to guide distillation and monitor overfitting
Save the finetuned model to a clear path and validate loading/deployment workflows

Example Use Cases

Distill a DSPy QA classifier from a large teacher to a compact student for on-device QA
Distill a text classification DSPy module to reduce latency in production
Benchmark speedups and accuracy between teacher and finetuned student
Save finetuned model as a deployment-ready file and load it for inference
Use Evaluate to compare teacher versus finetuned student performance

Frequently Asked Questions

Add this skill to your agents