What is the core goal of CodeRecon?

To build a layered, implementable understanding of a codebase’s architecture from overview to details, enabling secure audits and rapid onboarding.

Which artifacts does CodeRecon produce?

A living Technology Map, architecture notes, and a phased drill-down of modules, functions, and security-critical logic.

Who should use CodeRecon?

Security auditors, code reviewers, and developers who need fast, reliable context on unfamiliar codebases.

Code Recon

npx machina-cli add skill eugenepyvovarov/mcpbundler-agent-skills-marketplace/code-recon --openclaw

Files (1)

SKILL.md

14.4 KB

CodeRecon - Deep Architectural Context Building

Build comprehensive architectural understanding through ultra-granular code analysis. Designed for security auditors, code reviewers, and developers who need to rapidly understand unfamiliar codebases before diving deep.

Overview

CodeRecon is a systematic approach to codebase reconnaissance that builds layered understanding from high-level architecture down to implementation details. Inspired by Trail of Bits' audit-context-building methodology.

Why CodeRecon?

Before you can find vulnerabilities, you need to understand:

How the system is architected
Where data flows
What the trust boundaries are
Where security-critical logic lives

This skill provides a structured methodology for building that context efficiently.

The Recon Pyramid

                    ┌─────────────┐
                    │   DETAILS   │  ← Implementation specifics
                   ─┼─────────────┼─
                  / │  FUNCTIONS  │  ← Key function analysis
                 /  ─┼─────────────┼─
                /   │   MODULES   │  ← Component relationships
               /    ─┼─────────────┼─
              /     │ ARCHITECTURE│  ← System structure
             /      ─┼─────────────┼─
            /       │   OVERVIEW  │  ← High-level understanding
           ─────────┴─────────────┴─────────

Start broad, go deep systematically.

Phase 1: Overview Reconnaissance

1.1 Project Identification

Gather basic project information:

# Check for documentation
ls -la README* ARCHITECTURE* SECURITY* CHANGELOG* docs/

# Identify build system
ls package.json Cargo.toml go.mod pyproject.toml Makefile

# Check for tests
ls -la test* spec* *_test* __tests__/

# Identify CI/CD
ls -la .github/workflows/ .gitlab-ci.yml Jenkinsfile .circleci/

1.2 Technology Stack Detection

# Language distribution
find . -type f -name "*.py" | wc -l
find . -type f -name "*.js" -o -name "*.ts" | wc -l
find . -type f -name "*.go" | wc -l
find . -type f -name "*.rs" | wc -l
find . -type f -name "*.sol" | wc -l

# Framework indicators
grep -r "from flask" --include="*.py" | head -1
grep -r "from django" --include="*.py" | head -1
grep -r "express\|fastify" --include="*.js" | head -1
grep -r "anchor_lang" --include="*.rs" | head -1

1.3 Dependency Analysis

# Python dependencies
cat requirements.txt pyproject.toml setup.py 2>/dev/null | grep -E "^\s*[a-zA-Z]"

# Node.js dependencies
cat package.json | jq '.dependencies, .devDependencies'

# Rust dependencies
cat Cargo.toml | grep -A 100 "\[dependencies\]"

# Go dependencies
cat go.mod | grep -E "^\s+[a-z]"

1.4 Create Technology Map

## Technology Map: [PROJECT NAME]

### Languages
| Language | Files | Lines | Primary Use |
|----------|-------|-------|-------------|
| Python | 150 | 25K | Backend API |
| TypeScript | 80 | 12K | Frontend |
| Solidity | 12 | 2K | Smart Contracts |

### Key Dependencies
| Package | Version | Purpose | Security Notes |
|---------|---------|---------|----------------|
| fastapi | 0.100.0 | Web framework | Recent CVEs: None |
| web3.py | 6.0.0 | Blockchain client | Check signing |
| pyjwt | 2.8.0 | JWT handling | Verify alg checks |

### Infrastructure
- Database: PostgreSQL 15
- Cache: Redis 7
- Message Queue: RabbitMQ
- Container: Docker + K8s

Phase 2: Architecture Mapping

2.1 Directory Structure Analysis

# Top-level structure
tree -L 2 -d

# Identify entry points
find . -name "main.py" -o -name "app.py" -o -name "index.ts" -o -name "main.go"

# Identify config
find . -name "config*" -o -name "settings*" -o -name ".env*"

2.2 Component Identification

Look for common patterns:

project/
├── api/           # HTTP endpoints
├── auth/          # Authentication
├── core/          # Business logic
├── db/            # Database layer
├── models/        # Data models
├── services/      # External services
├── utils/         # Utilities
├── workers/       # Background jobs
└── tests/         # Test suite

2.3 Create Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                        CLIENTS                              │
│              (Web, Mobile, API Consumers)                   │
└─────────────────────────┬───────────────────────────────────┘
                          │ HTTPS
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                      API GATEWAY                            │
│                   (Rate Limiting, Auth)                     │
└─────────────────────────┬───────────────────────────────────┘
                          │
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐   ┌──────────┐   ┌──────────┐
    │  Auth    │   │  Core    │   │  Admin   │
    │ Service  │   │  API     │   │  API     │
    └────┬─────┘   └────┬─────┘   └────┬─────┘
         │              │              │
         └──────────────┼──────────────┘
                        │
          ┌─────────────┼─────────────┐
          ▼             ▼             ▼
    ┌──────────┐  ┌──────────┐  ┌──────────┐
    │ Database │  │  Cache   │  │ External │
    │ (Postgres)│  │ (Redis)  │  │  APIs    │
    └──────────┘  └──────────┘  └──────────┘

2.4 Trust Boundary Identification

Map where trust levels change:

## Trust Boundaries

### Boundary 1: Internet → API Gateway
- **Type:** Network boundary
- **Controls:** TLS, Rate limiting, WAF
- **Risks:** DDoS, Injection, Auth bypass

### Boundary 2: API Gateway → Services
- **Type:** Authentication boundary
- **Controls:** JWT validation, Role checks
- **Risks:** Token forgery, Privilege escalation

### Boundary 3: Services → Database
- **Type:** Data access boundary
- **Controls:** Query parameterization, Connection pooling
- **Risks:** SQL injection, Data leakage

### Boundary 4: Services → External APIs
- **Type:** Third-party integration
- **Controls:** API keys, Request signing
- **Risks:** SSRF, Secret exposure

Phase 3: Module Deep Dive

3.1 Entry Point Analysis

For each entry point type:

# HTTP Routes - map all endpoints
grep -rn "@app.route\|@router\|@api_view" --include="*.py"
grep -rn "app.(get|post|put|delete)\|router.(get|post)" --include="*.ts"

# CLI Commands
grep -rn "@click.command\|argparse\|clap" --include="*.py" --include="*.rs"

# Event Handlers
grep -rn "@consumer\|@handler\|on_message" --include="*.py"

3.2 Create Entry Point Map

## Entry Points

### HTTP API
| Method | Path | Handler | Auth | Input |
|--------|------|---------|------|-------|
| POST | /api/login | auth.login | None | JSON body |
| GET | /api/users | users.list | JWT | Query params |
| POST | /api/transfer | tx.transfer | JWT + 2FA | JSON body |
| GET | /admin/logs | admin.logs | Admin JWT | Query params |

### WebSocket
| Event | Handler | Auth | Data |
|-------|---------|------|------|
| connect | ws.connect | JWT | None |
| message | ws.message | Session | JSON |

### Background Jobs
| Queue | Handler | Trigger | Data Source |
|-------|---------|---------|-------------|
| emails | email.send | API call | Database |
| reports | report.gen | Cron | Database |

3.3 Data Flow Tracing

For each critical endpoint, trace data flow:

POST /api/transfer
       │
       ▼
┌──────────────────┐
│ Request Parser   │ ← Validate JSON schema
│ (validation.py)  │
└────────┬─────────┘
         │ TransferRequest
         ▼
┌──────────────────┐
│ Auth Middleware  │ ← Verify JWT, extract user
│ (middleware.py)  │
└────────┬─────────┘
         │ User context
         ▼
┌──────────────────┐
│ Transfer Service │ ← Business logic
│ (transfer.py)    │
└────────┬─────────┘
         │
    ┌────┴────┐
    ▼         ▼
┌────────┐ ┌────────┐
│ DB     │ │External│
│ Write  │ │ API    │
└────────┘ └────────┘

Phase 4: Function-Level Analysis

4.1 Security-Critical Function Identification

Search for security-sensitive operations:

# Authentication
grep -rn "def login\|def authenticate\|def verify_token" --include="*.py"
grep -rn "function login\|authenticate\|verifyToken" --include="*.ts"

# Authorization
grep -rn "def is_authorized\|def check_permission\|@requires_role" --include="*.py"

# Cryptography
grep -rn "encrypt\|decrypt\|hash\|sign\|verify" --include="*.py"
grep -rn "crypto\.\|bcrypt\|argon2" --include="*.py"

# Database
grep -rn "execute\|query\|cursor" --include="*.py"
grep -rn "\.query\|\.execute\|\.raw" --include="*.ts"

# File Operations
grep -rn "open\(.*\)\|read\|write\|unlink" --include="*.py"

4.2 Function Documentation Template

For each critical function:

### Function: `transfer_funds()`

**Location:** `services/transfer.py:45`

**Purpose:** Execute fund transfer between accounts

**Parameters:**
| Name | Type | Source | Validation |
|------|------|--------|------------|
| from_account | str | JWT claim | UUID format |
| to_account | str | Request body | UUID format, exists check |
| amount | Decimal | Request body | > 0, <= balance |

**Returns:** TransferResult

**Side Effects:**
- Writes to `transactions` table
- Calls external payment API
- Emits `transfer_completed` event

**Security Considerations:**
- Requires authenticated user
- Rate limited to 10/minute
- Amount validated against balance
- Audit logged

**Potential Risks:**
- Race condition if concurrent transfers?
- What if external API fails mid-transfer?

4.3 Call Graph Analysis

transfer_funds()
├── validate_request()
│   └── check_uuid_format()
├── get_user_balance()
│   └── db.query()
├── check_rate_limit()
│   └── redis.get()
├── execute_transfer()     ← CRITICAL
│   ├── db.begin_transaction()
│   ├── update_balance()   ← State change
│   ├── external_api.send() ← External call
│   └── db.commit()
└── emit_event()

Phase 5: Detail Reconnaissance

5.1 Configuration Analysis

# Find all config loading
grep -rn "os.environ\|getenv\|config\." --include="*.py"
grep -rn "process.env\|config\." --include="*.ts"

# Check for hardcoded secrets
grep -rn "password\s*=\|secret\s*=\|api_key\s*=" --include="*.py"
grep -rn "-----BEGIN\|sk-\|pk_live_" .

5.2 Error Handling Review

# Find exception handling
grep -rn "except.*:" --include="*.py" -A 2
grep -rn "catch\s*(" --include="*.ts" -A 2

# Find error responses
grep -rn "return.*error\|raise.*Error" --include="*.py"

5.3 Logging Analysis

# Find logging statements
grep -rn "logger\.\|logging\.\|console\.log" --include="*.py" --include="*.ts"

# Check what's being logged
grep -rn "log.*password\|log.*token\|log.*secret" --include="*.py"

Output: Context Document

Template

# [PROJECT NAME] - Security Context Document

## Executive Summary
[2-3 sentences on what this system does]

## Technology Stack
[From Phase 1]

## Architecture
[Diagram from Phase 2]

## Trust Boundaries
[From Phase 2.4]

## Entry Points
[Table from Phase 3.2]

## Critical Functions
[Analysis from Phase 4]

## Data Flows
[Diagrams from Phase 3.3]

## Security Controls
| Control | Implementation | Location | Notes |
|---------|----------------|----------|-------|
| Authentication | JWT | middleware/auth.py | RS256 signing |
| Authorization | RBAC | decorators/auth.py | Role-based |
| Input Validation | Pydantic | schemas/*.py | Type checking |
| Encryption | AES-256-GCM | utils/crypto.py | At-rest |

## Areas Requiring Focus
1. [High-risk area 1]
2. [High-risk area 2]
3. [High-risk area 3]

## Open Questions
- [ ] How is X handled when Y?
- [ ] What happens if Z fails?

Quick Start Commands

# Full recon script
./scripts/recon.sh /path/to/project

# Generate entry point map
./scripts/map-endpoints.sh /path/to/project

# Create call graph
./scripts/callgraph.sh /path/to/project

Skill Files

code-recon/
├── SKILL.md                        # This file
├── resources/
│   ├── recon-checklist.md          # Comprehensive checklist
│   └── question-bank.md            # Questions to answer
├── examples/
│   ├── web-app-recon/              # Web application example
│   └── smart-contract-recon/       # Smart contract example
├── templates/
│   └── context-document.md         # Output template
└── docs/
    └── advanced-techniques.md      # Deep dive techniques

Guidelines

Top-down approach - Start broad, go narrow
Document everything - Your notes are the deliverable
Question assumptions - Verify what docs say vs. what code does
Focus on trust boundaries - That's where bugs live
Time-box phases - Don't get stuck in the weeds early
Iterate - Revisit earlier phases as you learn more

Source

git clone https://github.com/eugenepyvovarov/mcpbundler-agent-skills-marketplace/blob/main/code-recon/SKILL.mdView on GitHub

Overview

CodeRecon is a systematic approach to codebase reconnaissance that builds layered understanding from high-level architecture down to implementation details. It is designed for security auditors, code reviewers, and developers who need to rapidly understand unfamiliar codebases before diving deep. Inspired by Trail of Bits' audit-context-building methodology, it emphasizes data flows, trust boundaries, and security-critical logic.

How This Skill Works

The method uses the Recon Pyramid to start with an overview and gradually descend into modules, functions, and details. It starts with Phase 1: Overview Reconnaissance (project identity, tech stack, dependencies, and a technology map), then Phase 2: Architecture Mapping (directory structure and component patterns). Practitioners collect commands and patterns to build a living map of the system.

When to Use It

Starting a security audit on an unfamiliar codebase to locate data flows and trust boundaries
Preparing for a code review of a large or monorepo with multiple tech stacks
Onboarding a new engineer who needs rapid contextual understanding of system architecture
Assessing security-critical logic before refactoring or adding new features
Documenting architecture and dependencies for compliance, risk assessment, or handover

Quick Start

Step 1: Run Phase 1 reconnaissance to identify project, tech stack, and dependencies
Step 2: Create a Technology Map and log key architecture decisions
Step 3: Move to Phase 2 Architecture Mapping to drill into directories and components

Best Practices

Start with the Overview Recon to establish context before touching code
Systematically map architecture using the Recon Pyramid (architecture -> modules -> functions -> details)
Combine static checks with a technology map and dependency analysis
Validate findings against known security goals: data flows, trust boundaries, access controls
Keep the technology map living: update it after major changes and during onboarding

Example Use Cases

Auditing a microservices platform to identify inter-service data flows
Mapping a legacy monolith to understand data provenance and security boundaries
Scanning a Python/JS project to locate authentication and authorization logic
Preparing a Security Risk Assessment report with a technology map
Onboarding an engineer by sharing a concise architectural context document

Frequently Asked Questions

Add this skill to your agents