What happens if the tools return no meaningful evidence?

We provide a brief 3-5 sentence explanation of what's missing and what would be needed to proceed.

Should I always provide query_id instead of SQL?

Prefer query_id when available to avoid truncation and preserve context; if you provide SQL, evidence collection will use it.

How are recommendations prioritized?

Ranked by Impact, Risk, and Effort; start with low-risk rewrites, and all proposed changes are validated with validate_sql before applying.

sql-optimization

Scanned

npx machina-cli add skill FrankChen021/datastoria/sql-optimization --openclaw

Files (1)

SKILL.md

3.4 KB

SQL Optimization Skill

Workflow is evidence-driven: collect evidence with tools, then recommend based on evidence only.

Pre-flight Check

HAS SQL: Conversation contains a SQL query -> Go to WORKFLOW step 2 (Collect Evidence).
HAS QUERY_ID: Conversation contains query_id -> Go to WORKFLOW step 2 (Call collect_sql_optimization_evidence immediately).
DISCOVERY REQUEST: User asks to optimize the slowest/heaviest queries but does not provide SQL/query_id -> Go to WORKFLOW step 1 (Discovery).
NEITHER: Output ONLY a concise 1-sentence request for the SQL query or query_id (e.g. "Please provide the SQL query or query_id you'd like to optimize."). Do NOT ask for any other details (like version, table sizes, etc.). Then include the following UI trigger block in the response (must be present and unchanged; place it at the end of the reply):

{ "type": "optimization_skill_input" }

Discovery

When candidates need to be found from system.query_log (slowest, most expensive, user-scoped, etc.), load the clickhouse-system-queries skill and follow its reference for system.query_log. Do NOT write ad-hoc SQL against system.query_log from this skill.
Extract query_id from the discovery results for the next step (evidence collection).

Time Filtering

time_window: Relative minutes from now (e.g., 60 = last hour).
time_range: Absolute range { from: "ISO date", to: "ISO date" }.
When calling collect_sql_optimization_evidence after discovery, you MUST pass the same time_window or time_range used in discovery.

Workflow

Discovery (if needed): Load clickhouse-system-queries skill and use its system.query_log reference to find candidates. Extract query_id from the results.
Collect Evidence: Call collect_sql_optimization_evidence with query_id (preferred) or sql (and same time params if coming from discovery).
Analyze: Review evidence for optimization opportunities.
Recommendations: Rank by Impact/Risk/Effort. Prefer low-risk query rewrites first.
Validate: Use validate_sql for any proposed SQL changes. Add inline comments (-- comment) to highlight key changes.

Table Schema Evidence

Use table_schema fields: columns, engine, partition_key, primary_key, sorting_key, secondary_indexes.
Suggest secondary indexes only when evidence shows frequent WHERE filters on selective columns and the index type fits the predicate.
- Use minmax for range predicates on sorted columns.
- Use set for low-cardinality equality filters.
- Use bloom_filter for high-cardinality equality filters (e.g., trace_id, user_id).
- Use tokenbf_v1 for frequent token-based text search.

Rules

Do NOT recommend based on assumptions. If evidence is missing, collect it with tools.
If tools return NO meaningful evidence, output only a brief 3-5 sentence message explaining what's missing.
Always validate proposed SQL with validate_sql before recommending.
If discovery results include both query text and query_id, prefer query_id to avoid truncation issues.
If the SQL appears incomplete (truncated/ellipsized/ends mid-clause), use query_id instead of sql.
When both query_id and SQL are available, prefer query_id to reduce tokens and avoid truncation issues.

Source

git clone https://github.com/FrankChen021/datastoria/blob/master/src/lib/ai/skills/sql-optimization/SKILL.mdView on GitHub

Overview

SQL Optimization Skill follows an evidence-first workflow to identify bottlenecks. It collects evidence from tooling and system logs, then recommends changes strictly based on that evidence, with validation before application.

How This Skill Works

The workflow uses discovery to identify candidates, then collects evidence via collect_sql_optimization_evidence using a provided SQL or query_id. Then it analyzes the evidence to surface optimization opportunities and returns recommendations ranked by impact, risk, and effort, with changes validated by validate_sql before applying.

When to Use It

When a user asks to optimize slowest/heaviest queries but does not provide SQL (trigger Discovery).
When a SQL query is provided for optimization.
When a query_id is provided, evidence collection starts immediately.
When you need to identify candidates from system.query_log via Discovery and extract query_id.
When you have proposed SQL changes and need to validate them before applying.

Quick Start

Step 1: Provide the SQL query or query_id you'd like to optimize.
Step 2: System collects evidence using collect_sql_optimization_evidence (use the same time_window/time_range as discovery if applicable).
Step 3: Review the validated recommendations and apply changes after they pass validate_sql.

Best Practices

Follow the pre-flight checks and trigger Discovery only when needed.
Keep time filters consistent: pass the same time_window or time_range to evidence collection as used in discovery.
Prefer query_id over SQL when both are available to avoid truncation issues.
Annotate any proposed SQL with inline comments (-- comment) highlighting key changes.
Rank recommendations by Impact, Risk, and Effort and start with low-risk rewrites; always validate with validate_sql.

Example Use Cases

Slow analytics query over a large partitioned table; discovery yields a query_id; evidence flags missing partition pruning and suggests a filtered date range.
Query with high-cardinality user_id filter; evidence recommends bloom_filter on user_id to speed lookups.
Repeated full-table scans on a large table; evidence indicates need for partition pruning and selective column access.
JOIN-heavy query causing excessive data shuffles; evidence supports reordering joins or adding a materialized view.
Query plan regressed after a schema change; a validated rewrite improves performance without breaking results.

Frequently Asked Questions

Add this skill to your agents