log-analysis
Scannednpx machina-cli add skill agenticdevops/devops-execution-engine/log-analysis --openclawFiles (1)
SKILL.md
6.3 KB
Log Analysis
Techniques for analyzing logs across different platforms and formats.
When to Use This Skill
Use this skill when:
- Investigating errors or issues
- Searching for patterns in logs
- Correlating events across systems
- Building log queries
Universal Patterns
Search Basics
# Simple search
grep "ERROR" app.log
# Case insensitive
grep -i "error" app.log
# With line numbers
grep -n "ERROR" app.log
# With context (3 lines before/after)
grep -C 3 "ERROR" app.log
grep -B 3 -A 3 "ERROR" app.log
# Count occurrences
grep -c "ERROR" app.log
Multiple Patterns
# OR - match any
grep -E "ERROR|WARN|FATAL" app.log
# AND - match all (same line)
grep "ERROR" app.log | grep "database"
# NOT - exclude pattern
grep "ERROR" app.log | grep -v "expected"
Time-Based Filtering
# Last hour (if timestamp is in log)
grep "$(date '+%Y-%m-%d %H')" app.log
# Date range
awk '/2024-01-15 10:00/,/2024-01-15 11:00/' app.log
# Recent entries (tail)
tail -1000 app.log | grep "ERROR"
Kubernetes Logs
Pod Logs
# Current logs
kubectl logs <pod>
# Previous container (after crash)
kubectl logs <pod> --previous
# Follow live
kubectl logs -f <pod>
# Last N lines
kubectl logs --tail=100 <pod>
# Since time
kubectl logs --since=1h <pod>
# All containers in pod
kubectl logs <pod> --all-containers
# By label
kubectl logs -l app=nginx --all-containers
Multi-Pod Logs
# All pods with label
kubectl logs -l app=myapp --all-containers --prefix
# Stern (better multi-pod tailing)
stern myapp -n namespace
# With regex
stern "myapp-.*" --since 1h
Search in Logs
# Grep in kubectl logs
kubectl logs <pod> | grep -i error
# With timestamps
kubectl logs --timestamps <pod> | grep "ERROR"
# Recent errors
kubectl logs --since=1h <pod> | grep -E "ERROR|Exception"
Docker Logs
# Basic logs
docker logs <container>
# Follow
docker logs -f <container>
# Tail
docker logs --tail 100 <container>
# Since time
docker logs --since 1h <container>
# With timestamps
docker logs -t <container>
# Search
docker logs <container> 2>&1 | grep "ERROR"
JSON Logs (jq)
Basic Parsing
# Pretty print
cat log.json | jq .
# Extract field
cat log.json | jq '.message'
# Multiple fields
cat log.json | jq '{time: .timestamp, msg: .message}'
Filtering
# Filter by field value
cat log.json | jq 'select(.level == "error")'
# Contains string
cat log.json | jq 'select(.message | contains("database"))'
# Multiple conditions
cat log.json | jq 'select(.level == "error" and .service == "api")'
JSONL (JSON Lines)
# Each line is JSON
cat logs.jsonl | jq -c 'select(.level == "error")'
# Extract field from each line
cat logs.jsonl | jq -r '.message'
# Count by level
cat logs.jsonl | jq -r '.level' | sort | uniq -c | sort -rn
CloudWatch Logs
# Tail logs
aws logs tail /aws/lambda/function-name --follow
# Since time
aws logs tail /aws/lambda/function-name --since 1h
# Filter pattern
aws logs tail /aws/lambda/function-name --filter-pattern "ERROR"
CloudWatch Insights
# Start query
aws logs start-query \
--log-group-name /aws/lambda/function-name \
--start-time $(date -d '1 hour ago' +%s) \
--end-time $(date +%s) \
--query-string '
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50
'
# Get results
aws logs get-query-results --query-id <query-id>
Common Patterns
Error Aggregation
# Top error messages
grep "ERROR" app.log | sort | uniq -c | sort -rn | head -20
# Errors per hour
grep "ERROR" app.log | awk '{print $1, $2}' | cut -d: -f1 | uniq -c
Response Time Analysis
# Extract response times (assuming format: "response_time=123ms")
grep -oP 'response_time=\K\d+' app.log | \
awk '{sum+=$1; count++} END {print "avg:", sum/count, "count:", count}'
# Slow requests (>1000ms)
grep -P 'response_time=\d{4,}' app.log
Status Code Analysis
# Count by status code
grep -oP 'status=\K\d+' app.log | sort | uniq -c | sort -rn
# 5xx errors
grep -P 'status=5\d\d' app.log
IP/User Analysis
# Top IPs
grep -oP '\d+\.\d+\.\d+\.\d+' access.log | sort | uniq -c | sort -rn | head -10
# Requests per user
grep -oP 'user=\K\S+' app.log | sort | uniq -c | sort -rn
Correlation Techniques
By Request ID
# Find all logs for a request
grep "request_id=abc123" *.log
# Across pods
kubectl logs -l app=myapp --all-containers | grep "request_id=abc123"
By Timestamp
# Events around a specific time
awk '/2024-01-15 10:30:4/,/2024-01-15 10:30:5/' app.log
Across Services
# Find related events
for service in api worker database; do
echo "=== $service ==="
grep "order_id=12345" /var/log/$service.log
done
Log Formats
Apache/Nginx Access Logs
# Status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn
# Response times (if configured)
awk '{print $NF}' access.log | sort -n | tail -20
# Top URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -10
Syslog
# By service
grep "sshd" /var/log/syslog
# Failed logins
grep "Failed password" /var/log/auth.log
# By severity
grep -E "(error|crit|alert|emerg)" /var/log/syslog
Quick Reference
# Find errors in last hour
grep "$(date '+%Y-%m-%d %H')" app.log | grep -i error
# Top 10 error messages
grep -i error app.log | sort | uniq -c | sort -rn | head -10
# JSON logs: filter and format
cat logs.jsonl | jq 'select(.level=="error") | "\(.timestamp) \(.message)"'
# Kubernetes: errors across all pods
kubectl logs -l app=myapp --all-containers --since=1h | grep -i error
# AWS CloudWatch: recent errors
aws logs tail /aws/lambda/func --since 1h --filter-pattern "ERROR"
Related Skills
- k8s-debug: For Kubernetes-specific log analysis
- docker-ops: For Docker log management
- incident-response: For correlating logs during incidents
Source
git clone https://github.com/agenticdevops/devops-execution-engine/blob/main/skills/log-analysis/SKILL.mdView on GitHub Overview
Log Analysis provides cross-platform techniques for inspecting logs across formats and systems. It covers universal patterns (search basics, multi-pattern queries, time filtering) and per-platform tips for Kubernetes, Docker, CloudWatch, and JSON logs.
How This Skill Works
It relies on common CLI tools such as grep, awk, and jq to perform searches, filtering, and field extraction. Platform-specific sections show how to gather logs (kubectl logs, docker logs, aws logs) and apply structured queries to uncover errors and patterns.
When to Use It
- Investigating errors or issues
- Searching for patterns in logs
- Correlating events across systems
- Building log queries
- Auditing security or access events
Quick Start
- Step 1: Identify the log source (local file, Kubernetes pod, Docker container, or JSON logs) and choose the appropriate tool (grep, kubectl logs, docker logs, or jq).
- Step 2: Run a basic search for ERROR or another keyword (e.g., grep -i 'error' app.log or kubectl logs <pod> | grep -i error).
- Step 3: Narrow results with time filters, context, or aggregation (tail -n, --since, or jq filters) and validate findings by repeating the query on related sources.
Best Practices
- Start with simple searches, then add context with -C, -B, or -A to view surrounding lines
- Use -E for multiple patterns and pipe for AND/NOT semantics (grep -E 'ERROR|WARN|FATAL')
- Filter by time with explicit timestamps or range expressions (date ranges, --since, or awk range)
- Parse JSON logs with jq to normalize fields before querying (cat log.json | jq '.message')
- Validate findings by cross-referencing across sources (e.g., pod, container, and host logs) and reproduce the query
Example Use Cases
- grep "ERROR" app.log
- kubectl logs <pod> | grep -i error
- docker logs <container> 2>&1 | grep "ERROR"
- cat log.json | jq '.message'
- aws logs tail /aws/lambda/function-name --filter-pattern "ERROR"
Frequently Asked Questions
Add this skill to your agents