What tools and bins are required?

The skill relies on grep, awk, and jq, with platform-specific tools like kubectl and AWS CLI as needed.

Can I analyze JSON logs effectively?

Yes. Use jq to parse and filter fields, extract messages, and perform multi-field queries (e.g., selecting by level or service).

How do I filter results by time across platforms?

Use time-based approaches shown: date-based grep/awk ranges, kubectl logs --since, docker logs --since, or CloudWatch filters to limit results to a window.

log-analysis

Scanned

npx machina-cli add skill agenticdevops/devops-execution-engine/log-analysis --openclaw

Files (1)

SKILL.md

6.3 KB

Log Analysis

Techniques for analyzing logs across different platforms and formats.

When to Use This Skill

Use this skill when:

Investigating errors or issues
Searching for patterns in logs
Correlating events across systems
Building log queries

Universal Patterns

Search Basics

# Simple search
grep "ERROR" app.log

# Case insensitive
grep -i "error" app.log

# With line numbers
grep -n "ERROR" app.log

# With context (3 lines before/after)
grep -C 3 "ERROR" app.log
grep -B 3 -A 3 "ERROR" app.log

# Count occurrences
grep -c "ERROR" app.log

Multiple Patterns

# OR - match any
grep -E "ERROR|WARN|FATAL" app.log

# AND - match all (same line)
grep "ERROR" app.log | grep "database"

# NOT - exclude pattern
grep "ERROR" app.log | grep -v "expected"

Time-Based Filtering

# Last hour (if timestamp is in log)
grep "$(date '+%Y-%m-%d %H')" app.log

# Date range
awk '/2024-01-15 10:00/,/2024-01-15 11:00/' app.log

# Recent entries (tail)
tail -1000 app.log | grep "ERROR"

Kubernetes Logs

Pod Logs

# Current logs
kubectl logs <pod>

# Previous container (after crash)
kubectl logs <pod> --previous

# Follow live
kubectl logs -f <pod>

# Last N lines
kubectl logs --tail=100 <pod>

# Since time
kubectl logs --since=1h <pod>

# All containers in pod
kubectl logs <pod> --all-containers

# By label
kubectl logs -l app=nginx --all-containers

Multi-Pod Logs

# All pods with label
kubectl logs -l app=myapp --all-containers --prefix

# Stern (better multi-pod tailing)
stern myapp -n namespace

# With regex
stern "myapp-.*" --since 1h

Search in Logs

# Grep in kubectl logs
kubectl logs <pod> | grep -i error

# With timestamps
kubectl logs --timestamps <pod> | grep "ERROR"

# Recent errors
kubectl logs --since=1h <pod> | grep -E "ERROR|Exception"

Docker Logs

# Basic logs
docker logs <container>

# Follow
docker logs -f <container>

# Tail
docker logs --tail 100 <container>

# Since time
docker logs --since 1h <container>

# With timestamps
docker logs -t <container>

# Search
docker logs <container> 2>&1 | grep "ERROR"

JSON Logs (jq)

Basic Parsing

# Pretty print
cat log.json | jq .

# Extract field
cat log.json | jq '.message'

# Multiple fields
cat log.json | jq '{time: .timestamp, msg: .message}'

Filtering

# Filter by field value
cat log.json | jq 'select(.level == "error")'

# Contains string
cat log.json | jq 'select(.message | contains("database"))'

# Multiple conditions
cat log.json | jq 'select(.level == "error" and .service == "api")'

JSONL (JSON Lines)

# Each line is JSON
cat logs.jsonl | jq -c 'select(.level == "error")'

# Extract field from each line
cat logs.jsonl | jq -r '.message'

# Count by level
cat logs.jsonl | jq -r '.level' | sort | uniq -c | sort -rn

CloudWatch Logs

# Tail logs
aws logs tail /aws/lambda/function-name --follow

# Since time
aws logs tail /aws/lambda/function-name --since 1h

# Filter pattern
aws logs tail /aws/lambda/function-name --filter-pattern "ERROR"

CloudWatch Insights

# Start query
aws logs start-query \
  --log-group-name /aws/lambda/function-name \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, @message
    | filter @message like /ERROR/
    | sort @timestamp desc
    | limit 50
  '

# Get results
aws logs get-query-results --query-id <query-id>

Common Patterns

Error Aggregation

# Top error messages
grep "ERROR" app.log | sort | uniq -c | sort -rn | head -20

# Errors per hour
grep "ERROR" app.log | awk '{print $1, $2}' | cut -d: -f1 | uniq -c

Response Time Analysis

# Extract response times (assuming format: "response_time=123ms")
grep -oP 'response_time=\K\d+' app.log | \
  awk '{sum+=$1; count++} END {print "avg:", sum/count, "count:", count}'

# Slow requests (>1000ms)
grep -P 'response_time=\d{4,}' app.log

Status Code Analysis

# Count by status code
grep -oP 'status=\K\d+' app.log | sort | uniq -c | sort -rn

# 5xx errors
grep -P 'status=5\d\d' app.log

IP/User Analysis

# Top IPs
grep -oP '\d+\.\d+\.\d+\.\d+' access.log | sort | uniq -c | sort -rn | head -10

# Requests per user
grep -oP 'user=\K\S+' app.log | sort | uniq -c | sort -rn

Correlation Techniques

By Request ID

# Find all logs for a request
grep "request_id=abc123" *.log

# Across pods
kubectl logs -l app=myapp --all-containers | grep "request_id=abc123"

By Timestamp

# Events around a specific time
awk '/2024-01-15 10:30:4/,/2024-01-15 10:30:5/' app.log

Across Services

# Find related events
for service in api worker database; do
  echo "=== $service ==="
  grep "order_id=12345" /var/log/$service.log
done

Log Formats

Apache/Nginx Access Logs

# Status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn

# Response times (if configured)
awk '{print $NF}' access.log | sort -n | tail -20

# Top URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -10

Syslog

# By service
grep "sshd" /var/log/syslog

# Failed logins
grep "Failed password" /var/log/auth.log

# By severity
grep -E "(error|crit|alert|emerg)" /var/log/syslog

Quick Reference

# Find errors in last hour
grep "$(date '+%Y-%m-%d %H')" app.log | grep -i error

# Top 10 error messages
grep -i error app.log | sort | uniq -c | sort -rn | head -10

# JSON logs: filter and format
cat logs.jsonl | jq 'select(.level=="error") | "\(.timestamp) \(.message)"'

# Kubernetes: errors across all pods
kubectl logs -l app=myapp --all-containers --since=1h | grep -i error

# AWS CloudWatch: recent errors
aws logs tail /aws/lambda/func --since 1h --filter-pattern "ERROR"

Related Skills

k8s-debug: For Kubernetes-specific log analysis
docker-ops: For Docker log management
incident-response: For correlating logs during incidents

Source

git clone https://github.com/agenticdevops/devops-execution-engine/blob/main/skills/log-analysis/SKILL.mdView on GitHub

Overview

Log Analysis provides cross-platform techniques for inspecting logs across formats and systems. It covers universal patterns (search basics, multi-pattern queries, time filtering) and per-platform tips for Kubernetes, Docker, CloudWatch, and JSON logs.

How This Skill Works

It relies on common CLI tools such as grep, awk, and jq to perform searches, filtering, and field extraction. Platform-specific sections show how to gather logs (kubectl logs, docker logs, aws logs) and apply structured queries to uncover errors and patterns.

When to Use It

Investigating errors or issues
Searching for patterns in logs
Correlating events across systems
Building log queries
Auditing security or access events

Quick Start

Step 1: Identify the log source (local file, Kubernetes pod, Docker container, or JSON logs) and choose the appropriate tool (grep, kubectl logs, docker logs, or jq).
Step 2: Run a basic search for ERROR or another keyword (e.g., grep -i 'error' app.log or kubectl logs <pod> | grep -i error).
Step 3: Narrow results with time filters, context, or aggregation (tail -n, --since, or jq filters) and validate findings by repeating the query on related sources.

Best Practices

Start with simple searches, then add context with -C, -B, or -A to view surrounding lines
Use -E for multiple patterns and pipe for AND/NOT semantics (grep -E 'ERROR|WARN|FATAL')
Filter by time with explicit timestamps or range expressions (date ranges, --since, or awk range)
Parse JSON logs with jq to normalize fields before querying (cat log.json | jq '.message')
Validate findings by cross-referencing across sources (e.g., pod, container, and host logs) and reproduce the query

Example Use Cases

grep "ERROR" app.log
kubectl logs <pod> | grep -i error
docker logs <container> 2>&1 | grep "ERROR"
cat log.json | jq '.message'
aws logs tail /aws/lambda/function-name --filter-pattern "ERROR"

Frequently Asked Questions

Add this skill to your agents