c
Volcengine Observability Cls
Scanned@cinience
npx machina-cli add skill @cinience/volcengine-observability-cls --openclawFiles (1)
SKILL.md
755 B
volcengine-observability-cls
Run structured log investigations and summarize actionable findings.
Execution Checklist
- Confirm project/logset/topic and time window.
- Build query with filters, parse fields, and aggregations.
- Execute and summarize top errors and anomaly dimensions.
- Return follow-up actions and reusable query templates.
Output Requirements
- Include query statement.
- Include affected services and counts.
- Include concrete remediation suggestions.
References
references/sources.md
Overview
Volcengine Observability CLS enables structured log investigations and summarizes actionable findings. It emphasizes error analysis, time-range queries, and aggregation dashboards to support incident diagnostics and root-cause exploration.
How This Skill Works
Confirm project/logset/topic and the time window. Build a query with filters, parsed fields, and aggregations; then execute and summarize top errors and anomaly dimensions, delivering follow-up actions and reusable templates.
When to Use It
- Investigate a sudden spike in errors or latency over a defined time window.
- Diagnose an incident using logs to identify root causes and affected services.
- Compare error distributions across services to spot anomaly dimensions.
- Create aggregation dashboards that visualize error counts by code, service, and region.
- Prepare remediation recommendations with concrete steps and templates for future runs.
Quick Start
- Step 1: Confirm project/logset/topic and time window.
- Step 2: Build query with filters, parse fields, and set aggregations.
- Step 3: Execute the query and summarize top errors, anomaly dimensions, and remediation actions.
Best Practices
- Always confirm project, logset, topic, and the exact time window before querying.
- Include relevant parsers for fields you need (service name, error code, host).
- Use precise filters and reason through anomaly dimensions to limit noise.
- Capture the query statement in outputs and reference it for audits.
- Provide concrete remediation suggestions and save reusable query templates.
Example Use Cases
- Incident: surge of 5xx errors on auth-service between 12:15-12:45 UTC.
- Time-range analysis of cart-service logs to identify a bottleneck causing delays.
- Aggregation dashboard snippet showing top error codes by service and region.
- Root-cause hypothesis generation with counts of error types and affected endpoints.
- Remediation plan: adjust log sampling rate and implement retry logic.
Frequently Asked Questions
Add this skill to your agents