How can I minimize performance impact when adding telemetry?

Write non-blocking, asynchronous log writes; sample or batch writes if needed; keep the log payload compact and ensure critical path work remains fast.

How do agents query logs by a specific request?

Include a per-request requestId in all logs and provide examples like grep "requestId.*abc123" logs/app.json | jq . to fetch relevant entries.

How should development-only access to logs be secured?

Gate the dev log endpoint behind environment checks (e.g., NODE_ENV=development) and restrict access to trusted agents or IPs; document access in CLAUDE.md/AGENTS.md

agent-telemetry

Scanned

npx machina-cli add skill petekp/claude-code-setup/agent-telemetry --openclaw

Files (1)

SKILL.md

7.1 KB

Agent Telemetry

Make application runtime behavior queryable by coding agents through structured logging and telemetry endpoints.

Core Problem

Coding agents debugging issues often can't answer "what actually happened at runtime?" because:

Logs don't exist, or are unstructured console.log noise
Logs exist but there's no documented way for agents to query them
Agent docs (CLAUDE.md, AGENTS.md) don't mention how to access telemetry

Workflow

Phase 1: Audit Current State

Determine what telemetry already exists.

1. Check for logging infrastructure:

# Find logging configuration and usage
grep -r "winston\|pino\|bunyan\|log4j\|slog\|Logger\|logging\.config" --include="*.{ts,js,py,rb,go,rs}" -l .

# Find log output configuration
grep -r "LOG_LEVEL\|LOG_FORMAT\|LOG_FILE\|OTEL_\|SENTRY_DSN" .env* config/ -l 2>/dev/null

2. Check for existing telemetry endpoints:

# Health/debug/metrics endpoints
grep -r "health\|metrics\|debug\|status\|readiness\|liveness" --include="*.{ts,js,py,rb,go}" -l src/ app/ 2>/dev/null

3. Check agent docs for log access instructions:

# Do agent docs mention logs?
grep -ri "log\|telemetry\|debug\|observ" CLAUDE.md AGENTS.md .claude/*.md .cursor/*.md 2>/dev/null

4. Classify the result:

Finding	Action
No structured logging exists	Go to Phase 2
Logging exists but no agent access	Go to Phase 3
Logging + access exists but undocumented	Go to Phase 4
Everything in place	Validate and suggest improvements

Phase 2: Add Structured Logging

If no structured logging exists, add it. See references/logging-setup.md for framework-specific patterns.

Principles:

Use structured JSON logs, not string interpolation
Include correlation IDs for request tracing
Log at boundaries: incoming requests, outgoing calls, errors, state transitions
Use consistent field names: timestamp, level, message, requestId, userId, duration, error

Where to add logging (priority order):

Request/response middleware (every request gets logged)
Error handlers (unhandled errors get captured with context)
External service calls (DB queries, API calls, queue operations)
Business logic decision points (state transitions, authorization decisions)

Minimum viable logging — add a request logger middleware that captures:

{timestamp, level, requestId, method, path, statusCode, duration, userId?}

This single addition makes most debugging possible.

Phase 3: Expose Logs to Agents

Agents need a way to query logs without SSH access or cloud console dashboards. Provide at least one of:

Option A: Log file (simplest) Write structured logs to a known file path agents can read directly.

# Agent reads recent errors
tail -100 logs/app.json | jq 'select(.level == "error")'

# Agent reads logs for a specific request
grep "requestId.*abc123" logs/app.json | jq .

Option B: Dev log endpoint (recommended for web apps) Add a development-only endpoint that returns recent log entries with filtering.

GET /__dev/logs?level=error&last=50
GET /__dev/logs?path=/api/users&last=20
GET /__dev/logs?requestId=abc-123

This endpoint must:

Only be available in development (NODE_ENV=development or equivalent)
Return JSON array of log entries
Support filtering by level, path, timerange, requestId
Limit response size (default 100 entries)

See references/dev-endpoint.md for implementation patterns by framework.

Option C: CLI query tool Wrap log access in a script agents can execute:

# Query recent errors
./scripts/query-logs.sh --level error --last 50

# Query by request path
./scripts/query-logs.sh --path /api/users --since "5 minutes ago"

Choose based on project context:

Project Type	Best Option
Next.js / Express / Rails with local dev	Option B (dev endpoint)
CLI tool or background worker	Option A (log file)
Docker-based development	Option A (mounted log volume) or Option C
Monorepo with multiple services	Option C (unified query script)

Phase 4: Document in Agent Docs

This is critical. Without documentation, agents won't know telemetry exists.

Update CLAUDE.md (or equivalent agent doc) with a Debugging section:

## Debugging

### Querying Application Logs

Structured JSON logs are available at [location].

**Quick commands:**

```bash
# View recent errors
[command to view errors]

# View logs for a specific endpoint
[command to filter by path]

# View logs for a specific request
[command to filter by request ID]

# View logs from the last N minutes
[command to filter by time]

Log format:

{
  "timestamp": "ISO-8601",
  "level": "info|warn|error",
  "message": "Human-readable description",
  "requestId": "correlation-id",
  "method": "GET",
  "path": "/api/resource",
  "statusCode": 200,
  "duration": 45
}

Common debugging workflows:

User reports error → query by time range and error level
Flaky test → query by endpoint path during test run
Performance issue → query by path, sort by duration


**Key rules for the documentation:**
- Include copy-pasteable commands (agents execute, not read)
- Show the log schema so agents know what fields to filter on
- List 3-4 common debugging workflows with exact commands
- Mention where log config lives for agents that need to adjust log levels

### Phase 5: Validate

Test the full loop:

1. **Trigger a request** — hit an endpoint or run an operation
2. **Query the logs** — use the documented method to find the log entry
3. **Verify agent usability** — can an agent find the relevant log in <3 commands?
4. **Check error capture** — trigger an error and verify it appears with full context

If any step fails, iterate on the logging or documentation.

## Anti-Patterns

| Anti-Pattern | Why It's Bad | Do Instead |
|-------------|-------------|-----------|
| `console.log("here")` | No structure, no context, no filtering | Structured JSON with consistent fields |
| Logs only in cloud dashboard | Agents can't access Datadog/CloudWatch | Local file or dev endpoint |
| Log everything at debug level | Too noisy, can't find signal | Log at boundaries, use appropriate levels |
| Logging sensitive data | PII in logs is a liability | Redact tokens, passwords, PII |
| No request correlation | Can't trace a request across log lines | Add requestId to every log entry |
| Docs say "check the logs" with no how | Agent doesn't know where or how | Exact commands with examples |

Source

git clone https://github.com/petekp/claude-code-setup/blob/main/skills/agent-telemetry/SKILL.mdView on GitHub

Overview

Agent telemetry makes runtime behavior queryable by coding agents through structured logging and telemetry endpoints. It enables agents to understand what happened in production without direct log access, improving debugging speed and observability.

How This Skill Works

Start with an audit of existing logging and telemetry, then implement Phase 2: adopt structured JSON logs with fields like timestamp, level, message, requestId, userId, duration, and error; log at critical boundaries: incoming requests, outbound calls, errors, and state transitions. Finally, expose logs to agents via a log file (e.g., logs/app.json) or a development-only endpoint such as /__dev/logs that returns filtered, recent log entries for agent queries.

When to Use It

When asked to add telemetry or make logs accessible to agents
When debugging is difficult because logs are unstructured or noisy
When agent docs (CLAUDE.md, AGENTS.md) lack instructions for querying application logs
When setting up logging infrastructure for a new or existing web application
When an agent needs to understand runtime behavior but has no way to query logs

Quick Start

Step 1: Audit current state using Phase 1 commands to locate logging config, telemetry endpoints, and agent documentation
Step 2: If no structured logging exists, implement Phase 2 by adding a request/response middleware and emitting logs as JSON with fields like timestamp, level, requestId, method, path, statusCode, duration, userId
Step 3: Expose logs to agents via a log file or a development-only endpoint (secured to development) and verify agent access with sample queries

Best Practices

Use structured JSON logs with consistent fields: timestamp, level, message, requestId, userId, duration, error
Log at key boundaries: request/response, external service calls, errors, and state transitions
Include correlation IDs (requestId) for end-to-end tracing and cross-service correlation
Expose logs via a log file or a development-only endpoint gated by environment (e.g., NODE_ENV=development)
Document how agents can query logs in CLAUDE.md and AGENTS.md and provide clear examples

Example Use Cases

Add a request logger middleware in a Node/Express app that outputs JSON lines for every incoming HTTP request with fields like timestamp, level, requestId, method, path, statusCode, duration, userId
Instrument services with open telemetry and structured JSON logs across services to maintain consistent field names and correlation IDs
Create a development-only endpoint like GET /__dev/logs to fetch recent logs with filtering by level, path, timerange, or requestId
Use tail -100 logs/app.json | jq 'select(.level == "error")' to surface recent errors for a given incident
Update CLAUDE.md and AGENTS.md to describe how agents should access logs and how to query telemetry data

Frequently Asked Questions

Add this skill to your agents