dspy-optimize-anything
Flagged{"isSafe":false,"isSuspicious":true,"riskLevel":"high","findings":[{"category":"shell_command","severity":"high","description":"Arbitrary code execution: The evaluator executes user-supplied code via Python's exec and via subprocess.run with python -c candidate, enabling potential remote code execution, file system access, network calls, etc.","evidence":"proc = subprocess.run(\n [\"python\", \"-c\", candidate],\n capture_output=True, text=True, timeout=30,\n )\n\n# And in agent evaluation:\nexec(candidate, exec_globals)\nagent_fn = exec_globals.get(\"solve\")"},{"category":"system_harm","severity":"high","description":"Unrestricted execution of candidate code could lead to disk writes, network access, privilege escalation, or other harm since there is no sandboxing or input isolation.","evidence":"proc = subprocess.run([\"python\", \"-c\", candidate], ...)\n\nexec(candidate, exec_globals)"},{"category":"prompt_injection","severity":"medium","description":"The system design describes Actionable Side Information (ASI) and uses ASI to guide the LLM proposer during reflection. If an attacker crafts candidate outputs or ASI content, it could influence evaluation/prompt flow.","evidence":"def evaluate(candidate: str) -> tuple[float, dict]:\n score, diagnostic = run_my_system(candidate)\n oa.log(f\"Error: {diagnostic}\") # captured as ASI\n return score\n\nASI can include open-ended text, structured data, multi-objectives (via `scores`), or images (via `gepa.Image`) for vision-capable LLMs."}],"summary":"The skill content includes explicit execution of untrusted candidate code (exec and subprocess.run with arbitrary Python code) without sandboxing. This creates high-risk pathways for remote code execution and potential system harm. It also describes passing ASI back into the LLM loop, which could be exploited if candidates influence the evaluation flow. Recommend sandboxing/untrusted-code restrictions, timeouts, resource limits, and avoid executing arbitrary code on untrusted input."}
npx machina-cli add skill OmidZamani/dspy-skills/dspy-optimize-anything --openclawGEPA optimize_anything
Goal
Optimize any artifact representable as text — code, prompts, agent architectures, vector graphics, configurations — using a single declarative API powered by GEPA's reflective evolutionary search.
When to Use
- Beyond prompt optimization — optimizing code, configs, SVGs, scheduling policies, etc.
- Single hard problems — circle packing, kernel generation, algorithm discovery
- Batch related problems — CUDA kernels, code generation tasks with cross-transfer
- Generalization — agent skills, policies, or prompts that must transfer to unseen inputs
- When you can express quality as a score and provide diagnostic feedback (ASI)
Inputs
| Input | Type | Description |
|---|---|---|
seed_candidate | str | dict[str, str] | None | Starting artifact text, or None for seedless mode |
evaluator | Callable | Returns score (higher=better), optionally with ASI dict |
dataset | list | None | Training examples (for multi-task and generalization modes) |
valset | list | None | Validation set (for generalization mode) |
objective | str | None | Natural language description of what to optimize for |
background | str | None | Domain knowledge and constraints |
config | GEPAConfig | None | Engine, reflection, and tracking settings |
Outputs
| Output | Type | Description |
|---|---|---|
result.best_candidate | str | dict | Best optimized artifact |
Workflow
Phase 1: Install
pip install gepa
Phase 2: Define Evaluator with ASI
The evaluator scores a candidate and returns Actionable Side Information (ASI) — diagnostic feedback that guides the LLM proposer during reflection.
Simple evaluator (score only):
import gepa.optimize_anything as oa
def evaluate(candidate: str) -> float:
score, diagnostic = run_my_system(candidate)
oa.log(f"Error: {diagnostic}") # captured as ASI
return score
Rich evaluator (score + structured ASI):
def evaluate(candidate: str) -> tuple[float, dict]:
result = execute_code(candidate)
return result.score, {
"Error": result.stderr,
"Output": result.stdout,
"Runtime": f"{result.time_ms:.1f}ms",
}
ASI can include open-ended text, structured data, multi-objectives (via scores), or images (via gepa.Image) for vision-capable LLMs.
Phase 3: Choose Optimization Mode
Mode 1 — Single-Task Search: Solve one hard problem. No dataset needed.
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
)
Mode 2 — Multi-Task Search: Solve a batch of related problems with cross-transfer.
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
dataset=tasks,
)
Mode 3 — Generalization: Build a skill/prompt/policy that transfers to unseen problems.
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
dataset=train,
valset=val,
)
Seedless mode: Describe what you need instead of providing a seed.
result = oa.optimize_anything(
evaluator=evaluate,
objective="Generate a Python function `reverse()` that reverses a string.",
)
Phase 4: Use Results
print(result.best_candidate)
Production Example
import gepa.optimize_anything as oa
from gepa import Image
import logging
logger = logging.getLogger(__name__)
# ---------- SVG optimization with VLM feedback ----------
GOAL = "a pelican riding a bicycle"
VLM = "vertex_ai/gemini-3-flash-preview"
VISUAL_ASPECTS = [
{"id": "overall", "criteria": f"Rate overall quality of this SVG ({GOAL}). SCORE: X/10"},
{"id": "anatomy", "criteria": "Rate pelican accuracy: beak, pouch, plumage. SCORE: X/10"},
{"id": "bicycle", "criteria": "Rate bicycle: wheels, frame, handlebars, pedals. SCORE: X/10"},
{"id": "composition", "criteria": "Rate how convincingly the pelican rides the bicycle. SCORE: X/10"},
]
def evaluate(candidate, example):
"""Render SVG, score with a VLM, return (score, ASI)."""
image = render_image(candidate["svg_code"]) # via cairosvg
score, feedback = get_vlm_score_feedback(VLM, image, example["criteria"])
return score, {
"RenderedSVG": Image(base64_data=image, media_type="image/png"),
"Feedback": feedback,
}
result = oa.optimize_anything(
seed_candidate={"svg_code": "<svg>...</svg>"},
evaluator=evaluate,
dataset=VISUAL_ASPECTS,
background=f"Optimize SVG source code depicting '{GOAL}'. "
"Improve anatomy, composition, and visual quality.",
)
logger.info(f"Best SVG:\n{result.best_candidate['svg_code']}")
# ---------- Code optimization (single-task) ----------
def evaluate_solver(candidate: str) -> tuple[float, dict]:
"""Evaluate a Python solver for a mathematical optimization problem."""
import subprocess, json
proc = subprocess.run(
["python", "-c", candidate],
capture_output=True, text=True, timeout=30,
)
if proc.returncode != 0:
oa.log(f"Runtime error: {proc.stderr}")
return 0.0, {"Error": proc.stderr}
try:
output = json.loads(proc.stdout)
return output["score"], {
"Output": output.get("solution"),
"Runtime": f"{output.get('time_ms', 0):.1f}ms",
}
except (json.JSONDecodeError, KeyError) as e:
oa.log(f"Parse error: {e}")
return 0.0, {"Error": str(e), "Stdout": proc.stdout}
result = oa.optimize_anything(
evaluator=evaluate_solver,
objective="Write a Python solver for the bin packing problem that "
"minimizes the number of bins. Output JSON with 'score' and 'solution'.",
background="Use first-fit-decreasing as a starting heuristic. "
"Higher score = fewer bins used.",
)
print(result.best_candidate)
# ---------- Agent architecture generalization ----------
def evaluate_agent(candidate: str, example: dict) -> tuple[float, dict]:
"""Run an agent architecture on a task and score it."""
exec_globals = {}
exec(candidate, exec_globals)
agent_fn = exec_globals.get("solve")
if agent_fn is None:
return 0.0, {"Error": "No `solve` function defined"}
try:
prediction = agent_fn(example["input"])
correct = prediction == example["expected"]
score = 1.0 if correct else 0.0
feedback = "Correct" if correct else (
f"Expected '{example['expected']}', got '{prediction}'"
)
return score, {"Prediction": prediction, "Feedback": feedback}
except Exception as e:
return 0.0, {"Error": str(e)}
result = oa.optimize_anything(
seed_candidate="def solve(input):\n return input",
evaluator=evaluate_agent,
dataset=train_tasks,
valset=val_tasks,
background="Discover a Python agent function `solve(input)` that "
"generalizes across unseen reasoning tasks.",
)
print(result.best_candidate)
Integration with DSPy
optimize_anything complements DSPy's built-in optimizers. Use DSPy optimizers (GEPA, MIPROv2, BootstrapFewShot) for DSPy programs, and optimize_anything for arbitrary text artifacts outside DSPy:
import dspy
import gepa.optimize_anything as oa
# DSPy program optimization (use dspy.GEPA)
optimizer = dspy.GEPA(
metric=gepa_metric,
reflection_lm=dspy.LM("openai/gpt-4o"),
auto="medium",
)
compiled = optimizer.compile(agent, trainset=trainset)
# Non-DSPy artifact optimization (use optimize_anything)
result = oa.optimize_anything(
seed_candidate=my_config_yaml,
evaluator=eval_config,
background="Optimize Kubernetes scheduling policy for cost.",
)
Best Practices
- Rich ASI — The more diagnostic feedback you provide, the better the proposer can reason about improvements
- Use
oa.log()— Route prints to the proposer as ASI instead of stdout - Structured returns — Return
(score, dict)tuples for multi-faceted diagnostics - Seedless for exploration — Use
objective=when the solution space is large and unfamiliar - Background context — Provide domain knowledge via
background=to constrain the search - Generalization mode — Always provide
valsetwhen the artifact must transfer to unseen inputs - Images as ASI — Use
gepa.Imageto pass rendered outputs to vision-capable LLMs
Limitations
- Requires the
gepapackage (pip install gepa) - Evaluator must be deterministic or low-variance for stable optimization
- Compute cost scales with number of candidates explored
- Single-task mode does not generalize; use mode 3 with
valsetfor transfer - Currently powered by GEPA backend; API is backend-agnostic for future strategies
Source
git clone https://github.com/OmidZamani/dspy-skills/blob/master/skills/dspy-optimize-anything/SKILL.mdView on GitHub Overview
dspy-optimize-anything exposes GEPA's optimize_anything API to optimize any text artifact—code, prompts, agent architectures, configurations, and more. It supports single hard problems, batch transfer, and generalization, using reflective evolutionary search guided by actionable diagnostic feedback (ASI).
How This Skill Works
Install GEPA, implement an evaluator that returns a score and optional ASI, and choose an optimization mode (Single-Task, Multi-Task, or Generalization). The optimizer iteratively proposes artifacts, evaluates them, and uses the feedback to steer toward higher-quality results, returning result.best_candidate.
When to Use It
- Beyond prompt optimization — optimizing code, configs, SVGs, scheduling policies, etc.
- Single hard problems — circle packing, kernel generation, algorithm discovery
- Batch related problems — CUDA kernels, code generation tasks with cross-transfer
- Generalization — agent skills, policies, or prompts that transfer to unseen inputs
- When you can express quality as a score and provide diagnostic feedback (ASI)
Quick Start
- Step 1: Install GEPA and import the optimizer
- Step 2: Implement evaluate(candidate) returning score or (score, ASI)
- Step 3: Run oa.optimize_anything with seed_candidate or in seedless mode and choose a mode
Best Practices
- Clearly define objective and constraints in objective/background
- Build a rich evaluator that returns a score and optional ASI (structured when possible)
- Use seed_candidate for targeted problems; try seedless mode for needs-based prompts
- Choose the right mode (Single-Task, Multi-Task, Generalization) based on problem type and data
- Monitor ASI feedback and incorporate it to guide reflection and proposals
Example Use Cases
- SVG optimization with VLM feedback
- CUDA kernel optimization via multi-task search
- Circle packing solver developed as a single hard problem
- Code generation tasks with cross-transfer across related problems
- Generalizable prompts/policies that transfer to unseen inputs