sql-optimization
Scannednpx machina-cli add skill FrankChen021/datastoria/sql-optimization --openclawFiles (1)
SKILL.md
3.4 KB
SQL Optimization Skill
Workflow is evidence-driven: collect evidence with tools, then recommend based on evidence only.
Pre-flight Check
- HAS SQL: Conversation contains a SQL query -> Go to WORKFLOW step 2 (Collect Evidence).
- HAS QUERY_ID: Conversation contains query_id -> Go to WORKFLOW step 2 (Call
collect_sql_optimization_evidenceimmediately). - DISCOVERY REQUEST: User asks to optimize the slowest/heaviest queries but does not provide SQL/query_id -> Go to WORKFLOW step 1 (Discovery).
- NEITHER: Output ONLY a concise 1-sentence request for the SQL query or query_id (e.g. "Please provide the SQL query or query_id you'd like to optimize."). Do NOT ask for any other details (like version, table sizes, etc.). Then include the following UI trigger block in the response (must be present and unchanged; place it at the end of the reply):
{ "type": "optimization_skill_input" }
Discovery
- When candidates need to be found from
system.query_log(slowest, most expensive, user-scoped, etc.), load theclickhouse-system-queriesskill and follow its reference forsystem.query_log. Do NOT write ad-hoc SQL againstsystem.query_logfrom this skill. - Extract
query_idfrom the discovery results for the next step (evidence collection).
Time Filtering
time_window: Relative minutes from now (e.g., 60 = last hour).time_range: Absolute range{ from: "ISO date", to: "ISO date" }.- When calling
collect_sql_optimization_evidenceafter discovery, you MUST pass the same time_window or time_range used in discovery.
Workflow
- Discovery (if needed): Load
clickhouse-system-queriesskill and use itssystem.query_logreference to find candidates. Extractquery_idfrom the results. - Collect Evidence: Call
collect_sql_optimization_evidencewith query_id (preferred) or sql (and same time params if coming from discovery). - Analyze: Review evidence for optimization opportunities.
- Recommendations: Rank by Impact/Risk/Effort. Prefer low-risk query rewrites first.
- Validate: Use
validate_sqlfor any proposed SQL changes. Add inline comments (-- comment) to highlight key changes.
Table Schema Evidence
- Use table_schema fields: columns, engine, partition_key, primary_key, sorting_key, secondary_indexes.
- Suggest secondary indexes only when evidence shows frequent WHERE filters on selective columns and the index type fits the predicate.
- Use
minmaxfor range predicates on sorted columns. - Use
setfor low-cardinality equality filters. - Use
bloom_filterfor high-cardinality equality filters (e.g., trace_id, user_id). - Use
tokenbf_v1for frequent token-based text search.
- Use
Rules
- Do NOT recommend based on assumptions. If evidence is missing, collect it with tools.
- If tools return NO meaningful evidence, output only a brief 3-5 sentence message explaining what's missing.
- Always validate proposed SQL with
validate_sqlbefore recommending. - If discovery results include both query text and query_id, prefer query_id to avoid truncation issues.
- If the SQL appears incomplete (truncated/ellipsized/ends mid-clause), use
query_idinstead of sql. - When both
query_idand SQL are available, preferquery_idto reduce tokens and avoid truncation issues.
Source
git clone https://github.com/FrankChen021/datastoria/blob/master/src/lib/ai/skills/sql-optimization/SKILL.mdView on GitHub Overview
SQL Optimization Skill follows an evidence-first workflow to identify bottlenecks. It collects evidence from tooling and system logs, then recommends changes strictly based on that evidence, with validation before application.
How This Skill Works
The workflow uses discovery to identify candidates, then collects evidence via collect_sql_optimization_evidence using a provided SQL or query_id. Then it analyzes the evidence to surface optimization opportunities and returns recommendations ranked by impact, risk, and effort, with changes validated by validate_sql before applying.
When to Use It
- When a user asks to optimize slowest/heaviest queries but does not provide SQL (trigger Discovery).
- When a SQL query is provided for optimization.
- When a query_id is provided, evidence collection starts immediately.
- When you need to identify candidates from system.query_log via Discovery and extract query_id.
- When you have proposed SQL changes and need to validate them before applying.
Quick Start
- Step 1: Provide the SQL query or query_id you'd like to optimize.
- Step 2: System collects evidence using collect_sql_optimization_evidence (use the same time_window/time_range as discovery if applicable).
- Step 3: Review the validated recommendations and apply changes after they pass validate_sql.
Best Practices
- Follow the pre-flight checks and trigger Discovery only when needed.
- Keep time filters consistent: pass the same time_window or time_range to evidence collection as used in discovery.
- Prefer query_id over SQL when both are available to avoid truncation issues.
- Annotate any proposed SQL with inline comments (-- comment) highlighting key changes.
- Rank recommendations by Impact, Risk, and Effort and start with low-risk rewrites; always validate with validate_sql.
Example Use Cases
- Slow analytics query over a large partitioned table; discovery yields a query_id; evidence flags missing partition pruning and suggests a filtered date range.
- Query with high-cardinality user_id filter; evidence recommends bloom_filter on user_id to speed lookups.
- Repeated full-table scans on a large table; evidence indicates need for partition pruning and selective column access.
- JOIN-heavy query causing excessive data shuffles; evidence supports reordering joins or adding a materialized view.
- Query plan regressed after a schema change; a validated rewrite improves performance without breaking results.
Frequently Asked Questions
Add this skill to your agents