Actionable Side Information: diagnostic feedback that guides the LLM proposer during reflection.

How do I choose a mode?

Single-Task solves one hard problem; Multi-Task handles a batch with cross-transfer; Generalization builds a skill that transfers to unseen problems. Seeded mode uses seed_candidate; seedless describes what you need.

dspy-optimize-anything

Flagged

{"isSafe":false,"isSuspicious":true,"riskLevel":"high","findings":[{"category":"shell_command","severity":"high","description":"Arbitrary code execution: The evaluator executes user-supplied code via Python's exec and via subprocess.run with python -c candidate, enabling potential remote code execution, file system access, network calls, etc.","evidence":"proc = subprocess.run(\n [\"python\", \"-c\", candidate],\n capture_output=True, text=True, timeout=30,\n )\n\n# And in agent evaluation:\nexec(candidate, exec_globals)\nagent_fn = exec_globals.get(\"solve\")"},{"category":"system_harm","severity":"high","description":"Unrestricted execution of candidate code could lead to disk writes, network access, privilege escalation, or other harm since there is no sandboxing or input isolation.","evidence":"proc = subprocess.run([\"python\", \"-c\", candidate], ...)\n\nexec(candidate, exec_globals)"},{"category":"prompt_injection","severity":"medium","description":"The system design describes Actionable Side Information (ASI) and uses ASI to guide the LLM proposer during reflection. If an attacker crafts candidate outputs or ASI content, it could influence evaluation/prompt flow.","evidence":"def evaluate(candidate: str) -> tuple[float, dict]:\n score, diagnostic = run_my_system(candidate)\n oa.log(f\"Error: {diagnostic}\") # captured as ASI\n return score\n\nASI can include open-ended text, structured data, multi-objectives (via `scores`), or images (via `gepa.Image`) for vision-capable LLMs."}],"summary":"The skill content includes explicit execution of untrusted candidate code (exec and subprocess.run with arbitrary Python code) without sandboxing. This creates high-risk pathways for remote code execution and potential system harm. It also describes passing ASI back into the LLM loop, which could be exploited if candidates influence the evaluation flow. Recommend sandboxing/untrusted-code restrictions, timeouts, resource limits, and avoid executing arbitrary code on untrusted input."}

npx machina-cli add skill OmidZamani/dspy-skills/dspy-optimize-anything --openclaw

Files (1)

SKILL.md

9.0 KB

GEPA optimize_anything

Goal

Optimize any artifact representable as text — code, prompts, agent architectures, vector graphics, configurations — using a single declarative API powered by GEPA's reflective evolutionary search.

When to Use

Beyond prompt optimization — optimizing code, configs, SVGs, scheduling policies, etc.
Single hard problems — circle packing, kernel generation, algorithm discovery
Batch related problems — CUDA kernels, code generation tasks with cross-transfer
Generalization — agent skills, policies, or prompts that must transfer to unseen inputs
When you can express quality as a score and provide diagnostic feedback (ASI)

Inputs

Input	Type	Description
`seed_candidate`	`str \| dict[str, str] \| None`	Starting artifact text, or `None` for seedless mode
`evaluator`	`Callable`	Returns score (higher=better), optionally with ASI dict
`dataset`	`list \| None`	Training examples (for multi-task and generalization modes)
`valset`	`list \| None`	Validation set (for generalization mode)
`objective`	`str \| None`	Natural language description of what to optimize for
`background`	`str \| None`	Domain knowledge and constraints
`config`	`GEPAConfig \| None`	Engine, reflection, and tracking settings

Outputs

Output	Type	Description
`result.best_candidate`	`str \| dict`	Best optimized artifact

Workflow

Phase 1: Install

pip install gepa

Phase 2: Define Evaluator with ASI

The evaluator scores a candidate and returns Actionable Side Information (ASI) — diagnostic feedback that guides the LLM proposer during reflection.

Simple evaluator (score only):

import gepa.optimize_anything as oa

def evaluate(candidate: str) -> float:
    score, diagnostic = run_my_system(candidate)
    oa.log(f"Error: {diagnostic}")  # captured as ASI
    return score

Rich evaluator (score + structured ASI):

def evaluate(candidate: str) -> tuple[float, dict]:
    result = execute_code(candidate)
    return result.score, {
        "Error": result.stderr,
        "Output": result.stdout,
        "Runtime": f"{result.time_ms:.1f}ms",
    }

ASI can include open-ended text, structured data, multi-objectives (via scores), or images (via gepa.Image) for vision-capable LLMs.

Phase 3: Choose Optimization Mode

Mode 1 — Single-Task Search: Solve one hard problem. No dataset needed.

result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
)

Mode 2 — Multi-Task Search: Solve a batch of related problems with cross-transfer.

result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    dataset=tasks,
)

Mode 3 — Generalization: Build a skill/prompt/policy that transfers to unseen problems.

result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    dataset=train,
    valset=val,
)

Seedless mode: Describe what you need instead of providing a seed.

result = oa.optimize_anything(
    evaluator=evaluate,
    objective="Generate a Python function `reverse()` that reverses a string.",
)

Phase 4: Use Results

print(result.best_candidate)

Production Example

import gepa.optimize_anything as oa
from gepa import Image
import logging

logger = logging.getLogger(__name__)

# ---------- SVG optimization with VLM feedback ----------

GOAL = "a pelican riding a bicycle"
VLM = "vertex_ai/gemini-3-flash-preview"

VISUAL_ASPECTS = [
    {"id": "overall",     "criteria": f"Rate overall quality of this SVG ({GOAL}). SCORE: X/10"},
    {"id": "anatomy",     "criteria": "Rate pelican accuracy: beak, pouch, plumage. SCORE: X/10"},
    {"id": "bicycle",     "criteria": "Rate bicycle: wheels, frame, handlebars, pedals. SCORE: X/10"},
    {"id": "composition", "criteria": "Rate how convincingly the pelican rides the bicycle. SCORE: X/10"},
]

def evaluate(candidate, example):
    """Render SVG, score with a VLM, return (score, ASI)."""
    image = render_image(candidate["svg_code"])  # via cairosvg
    score, feedback = get_vlm_score_feedback(VLM, image, example["criteria"])

    return score, {
        "RenderedSVG": Image(base64_data=image, media_type="image/png"),
        "Feedback": feedback,
    }

result = oa.optimize_anything(
    seed_candidate={"svg_code": "<svg>...</svg>"},
    evaluator=evaluate,
    dataset=VISUAL_ASPECTS,
    background=f"Optimize SVG source code depicting '{GOAL}'. "
               "Improve anatomy, composition, and visual quality.",
)

logger.info(f"Best SVG:\n{result.best_candidate['svg_code']}")


# ---------- Code optimization (single-task) ----------

def evaluate_solver(candidate: str) -> tuple[float, dict]:
    """Evaluate a Python solver for a mathematical optimization problem."""
    import subprocess, json

    proc = subprocess.run(
        ["python", "-c", candidate],
        capture_output=True, text=True, timeout=30,
    )

    if proc.returncode != 0:
        oa.log(f"Runtime error: {proc.stderr}")
        return 0.0, {"Error": proc.stderr}

    try:
        output = json.loads(proc.stdout)
        return output["score"], {
            "Output": output.get("solution"),
            "Runtime": f"{output.get('time_ms', 0):.1f}ms",
        }
    except (json.JSONDecodeError, KeyError) as e:
        oa.log(f"Parse error: {e}")
        return 0.0, {"Error": str(e), "Stdout": proc.stdout}

result = oa.optimize_anything(
    evaluator=evaluate_solver,
    objective="Write a Python solver for the bin packing problem that "
              "minimizes the number of bins. Output JSON with 'score' and 'solution'.",
    background="Use first-fit-decreasing as a starting heuristic. "
               "Higher score = fewer bins used.",
)

print(result.best_candidate)


# ---------- Agent architecture generalization ----------

def evaluate_agent(candidate: str, example: dict) -> tuple[float, dict]:
    """Run an agent architecture on a task and score it."""
    exec_globals = {}
    exec(candidate, exec_globals)
    agent_fn = exec_globals.get("solve")

    if agent_fn is None:
        return 0.0, {"Error": "No `solve` function defined"}

    try:
        prediction = agent_fn(example["input"])
        correct = prediction == example["expected"]
        score = 1.0 if correct else 0.0
        feedback = "Correct" if correct else (
            f"Expected '{example['expected']}', got '{prediction}'"
        )
        return score, {"Prediction": prediction, "Feedback": feedback}
    except Exception as e:
        return 0.0, {"Error": str(e)}

result = oa.optimize_anything(
    seed_candidate="def solve(input):\n    return input",
    evaluator=evaluate_agent,
    dataset=train_tasks,
    valset=val_tasks,
    background="Discover a Python agent function `solve(input)` that "
               "generalizes across unseen reasoning tasks.",
)

print(result.best_candidate)

Integration with DSPy

optimize_anything complements DSPy's built-in optimizers. Use DSPy optimizers (GEPA, MIPROv2, BootstrapFewShot) for DSPy programs, and optimize_anything for arbitrary text artifacts outside DSPy:

import dspy
import gepa.optimize_anything as oa

# DSPy program optimization (use dspy.GEPA)
optimizer = dspy.GEPA(
    metric=gepa_metric,
    reflection_lm=dspy.LM("openai/gpt-4o"),
    auto="medium",
)
compiled = optimizer.compile(agent, trainset=trainset)

# Non-DSPy artifact optimization (use optimize_anything)
result = oa.optimize_anything(
    seed_candidate=my_config_yaml,
    evaluator=eval_config,
    background="Optimize Kubernetes scheduling policy for cost.",
)

Best Practices

Rich ASI — The more diagnostic feedback you provide, the better the proposer can reason about improvements
Use oa.log() — Route prints to the proposer as ASI instead of stdout
Structured returns — Return (score, dict) tuples for multi-faceted diagnostics
Seedless for exploration — Use objective= when the solution space is large and unfamiliar
Background context — Provide domain knowledge via background= to constrain the search
Generalization mode — Always provide valset when the artifact must transfer to unseen inputs
Images as ASI — Use gepa.Image to pass rendered outputs to vision-capable LLMs

Limitations

Requires the gepa package (pip install gepa)
Evaluator must be deterministic or low-variance for stable optimization
Compute cost scales with number of candidates explored
Single-task mode does not generalize; use mode 3 with valset for transfer
Currently powered by GEPA backend; API is backend-agnostic for future strategies

Source

git clone https://github.com/OmidZamani/dspy-skills/blob/master/skills/dspy-optimize-anything/SKILL.mdView on GitHub

Overview

dspy-optimize-anything exposes GEPA's optimize_anything API to optimize any text artifact—code, prompts, agent architectures, configurations, and more. It supports single hard problems, batch transfer, and generalization, using reflective evolutionary search guided by actionable diagnostic feedback (ASI).

How This Skill Works

Install GEPA, implement an evaluator that returns a score and optional ASI, and choose an optimization mode (Single-Task, Multi-Task, or Generalization). The optimizer iteratively proposes artifacts, evaluates them, and uses the feedback to steer toward higher-quality results, returning result.best_candidate.

When to Use It

Beyond prompt optimization — optimizing code, configs, SVGs, scheduling policies, etc.
Single hard problems — circle packing, kernel generation, algorithm discovery
Batch related problems — CUDA kernels, code generation tasks with cross-transfer
Generalization — agent skills, policies, or prompts that transfer to unseen inputs
When you can express quality as a score and provide diagnostic feedback (ASI)

Quick Start

Step 1: Install GEPA and import the optimizer
Step 2: Implement evaluate(candidate) returning score or (score, ASI)
Step 3: Run oa.optimize_anything with seed_candidate or in seedless mode and choose a mode

Best Practices

Clearly define objective and constraints in objective/background
Build a rich evaluator that returns a score and optional ASI (structured when possible)
Use seed_candidate for targeted problems; try seedless mode for needs-based prompts
Choose the right mode (Single-Task, Multi-Task, Generalization) based on problem type and data
Monitor ASI feedback and incorporate it to guide reflection and proposals

Example Use Cases

SVG optimization with VLM feedback
CUDA kernel optimization via multi-task search
Circle packing solver developed as a single hard problem
Code generation tasks with cross-transfer across related problems
Generalizable prompts/policies that transfer to unseen inputs

Frequently Asked Questions

Add this skill to your agents

dspy-optimize-anything

GEPA optimize_anything

Goal

When to Use

Inputs

Outputs

Workflow

Phase 1: Install

Phase 2: Define Evaluator with ASI

Phase 3: Choose Optimization Mode

Phase 4: Use Results

Production Example

Integration with DSPy

Best Practices

Limitations

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What can be optimized?

What is ASI?

How do I choose a mode?