Get the FREE Ultimate OpenClaw Setup Guide →

dspy-haystack-integration

Scanned
npx machina-cli add skill OmidZamani/dspy-skills/dspy-haystack-integration --openclaw
Files (1)
SKILL.md
5.5 KB

DSPy + Haystack Integration

Goal

Use DSPy's optimization capabilities to automatically improve prompts in Haystack pipelines.

When to Use

  • You have existing Haystack pipelines
  • Manual prompt tuning is tedious
  • Need data-driven prompt optimization
  • Want to combine Haystack components with DSPy optimization

Inputs

InputTypeDescription
haystack_pipelinePipelineExisting Haystack pipeline
trainsetlist[dspy.Example]Training examples
metriccallableEvaluation function

Outputs

OutputTypeDescription
optimized_promptstrDSPy-optimized prompt
optimized_pipelinePipelineUpdated Haystack pipeline

Workflow

Phase 1: Build Initial Haystack Pipeline

from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

# Setup document store
doc_store = InMemoryDocumentStore()
doc_store.write_documents(documents)

# Initial generic prompt
initial_prompt = """
Context: {{context}}
Question: {{question}}
Answer:
"""

# Build pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store))
pipeline.add_component("prompt_builder", PromptBuilder(template=initial_prompt))
pipeline.add_component("generator", OpenAIGenerator(model="gpt-4o-mini"))

pipeline.connect("retriever", "prompt_builder.context")
pipeline.connect("prompt_builder", "generator")

Phase 2: Create DSPy RAG Module

import dspy

class HaystackRAG(dspy.Module):
    """DSPy module wrapping Haystack retriever."""
    
    def __init__(self, retriever, k=3):
        super().__init__()
        self.retriever = retriever
        self.k = k
        self.generate = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # Use Haystack retriever
        results = self.retriever.run(query=question, top_k=self.k)
        context = [doc.content for doc in results['documents']]
        
        # Use DSPy for generation
        pred = self.generate(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

Phase 3: Define Custom Metric

from haystack.components.evaluators import SASEvaluator

# Haystack semantic evaluator
sas_evaluator = SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2")

def mixed_metric(example, pred, trace=None):
    """Combine semantic accuracy with conciseness."""
    
    # Semantic similarity (Haystack SAS)
    sas_result = sas_evaluator.run(
        ground_truth_answers=[example.answer],
        predicted_answers=[pred.answer]
    )
    semantic_score = sas_result['score']
    
    # Conciseness penalty
    word_count = len(pred.answer.split())
    conciseness = 1.0 if word_count <= 20 else max(0, 1 - (word_count - 20) / 50)
    
    return 0.7 * semantic_score + 0.3 * conciseness

Phase 4: Optimize with DSPy

from dspy.teleprompt import BootstrapFewShot

lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Create DSPy module with Haystack retriever
rag_module = HaystackRAG(retriever=pipeline.get_component("retriever"))

# Optimize
optimizer = BootstrapFewShot(
    metric=mixed_metric,
    max_bootstrapped_demos=4,
    max_labeled_demos=4
)

compiled = optimizer.compile(rag_module, trainset=trainset)

Phase 5: Extract and Apply Optimized Prompt

After optimization, extract the optimized prompt and apply it to your Haystack pipeline.

See Prompt Extraction Guide for detailed steps on:

  • Extracting prompts from compiled DSPy modules
  • Mapping DSPy demos to Haystack templates
  • Building optimized Haystack pipelines

Production Example

For a complete production-ready implementation, see HaystackDSPyOptimizer.

This class provides:

  • Wrapper for Haystack retrievers in DSPy modules
  • Automatic optimization with BootstrapFewShot
  • Prompt extraction and Haystack pipeline rebuilding
  • Complete usage example with document store setup

Best Practices

  1. Match retrievers - Use same retriever in DSPy module as Haystack pipeline
  2. Custom metrics - Combine Haystack evaluators with DSPy optimization
  3. Prompt extraction - Carefully map DSPy demos to Haystack template format
  4. Test both - Validate DSPy module AND final Haystack pipeline

Limitations

  • Prompt template conversion can be tricky
  • Some Haystack features don't map directly to DSPy
  • Requires maintaining two codebases initially
  • Complex pipelines may need custom integration

Official Documentation

Source

git clone https://github.com/OmidZamani/dspy-skills/blob/master/skills/dspy-haystack-integration/SKILL.mdView on GitHub

Overview

This skill integrates DSPy with Haystack to automatically improve prompts and optimize Haystack pipelines. It takes an existing Haystack pipeline, a training set of DSPy examples, and a scoring metric to produce an optimized prompt and an updated pipeline, enabling data-driven, concise prompts within RAG workflows.

How This Skill Works

Start with an existing Haystack pipeline and define a training set and metric. Build a DSPy module that wraps the Haystack retriever and generates answers, then define a custom metric combining semantic quality with conciseness. Run the DSPy optimization to output an optimized_prompt and an updated optimized_pipeline that blends Haystack components with DSPy-generated prompts.

When to Use It

  • You have existing Haystack pipelines and want to improve prompts end-to-end.
  • Manual prompt tuning is tedious, error-prone, or inconsistent.
  • You want data-driven prompt optimization guided by a trainset and a defined metric.
  • You need to blend Haystack components (retriever, generator) with a DSPy-optimized prompt.
  • You want automatic prompt improvement integrated into an existing Haystack pipeline.

Quick Start

  1. Step 1: Prepare haystack_pipeline, a trainset of dspy.Example, and a scoring metric.
  2. Step 2: Build a DSPy module wrapping the Haystack retriever and a generator; define a mixed metric.
  3. Step 3: Run the DSPy optimization to obtain optimized_prompt and updated optimized_pipeline.

Best Practices

  • Use a representative trainset of dspy.Example that covers diverse questions and contexts.
  • Define a clear metric that balances semantic relevance with conciseness.
  • Start with a sensible initial_prompt to bootstrap the optimization process.
  • Iterate in small increments and monitor for overfitting to the trainset.
  • Thoroughly validate the optimized_pipeline on a held-out dataset before deployment.

Example Use Cases

  • Haystack QA pipeline for customer support with DSPy-optimized prompts.
  • Academic literature QA pipeline using data-driven prompt refinement.
  • E-commerce product FAQ retrieval with concise, accurate answers.
  • Legal document search improved by optimized prompts in RAG setup.
  • Technical support chatbot leveraging DSPy to tune Haystack prompts.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers