Get the FREE Ultimate OpenClaw Setup Guide →

binary-re-static-analysis

npx machina-cli add skill aiskillstore/marketplace/binary-re-static-analysis --openclaw
Files (1)
SKILL.md
9.8 KB

Static Analysis (Phases 2-3)

Purpose

Understand binary structure and logic without execution. Map functions, trace data flow, decompile critical code.

When to Use

  • After triage has established architecture and ABI
  • To understand specific functions identified as interesting
  • When dynamic analysis is impractical or risky
  • To build hypotheses before dynamic verification

Pre-Analysis: Compare Known I/O First

CRITICAL: Before diving into disassembly, check if known inputs/outputs exist.

⚠️ REQUIRES HUMAN APPROVAL - Get explicit approval before any execution, even for I/O comparison.

# SAFE: Use emulation for cross-arch binaries (after human approval)
# ARM32:
qemu-arm -L /usr/arm-linux-gnueabihf -- ./binary < input.txt > actual.txt

# ARM64:
qemu-aarch64 -L /usr/aarch64-linux-gnu -- ./binary < input.txt > actual.txt

# Docker-based (macOS/cross-arch - see dynamic-analysis Option D):
docker run --rm --platform linux/arm/v7 -v ~/samples:/work:ro \
  arm32v7/debian:bullseye-slim sh -c '/work/binary < /work/input.txt' > actual.txt

# x86-64 native (still requires approval):
./binary < input.txt > actual.txt

# Compare outputs:
diff expected.txt actual.txt
cmp -l expected.txt actual.txt | head -20  # Byte-level differences

# Record findings:
# - Where does output first diverge?
# - Does file size match? (logic bug vs truncation)
# - What pattern appears in corruption?

This step often reveals the bug category before any code analysis.


Two-Stage Approach

Stage 1 (Light): Function enumeration, strings, imports - fast, broad coverage Stage 2 (Deep): Targeted decompilation, CFG analysis - slow, focused

Stage 1: Light Analysis (radare2)

Analysis Depth Selection

Binary SizeCommandTradeoff
< 500KBaaaFull analysis, may be slow
500KB - 5MBaa; aacFunctions + all call targets
> 5MBaa + targeted af @addrFast, manual depth control

Session Setup

# Launch r2 with controlled analysis
r2 -q0 -e scr.color=false -e anal.timeout=120 -e anal.maxsize=67108864 binary

# Inside r2 (choose based on binary size):
aa       # Basic analysis
aac      # Also analyze all call targets (recommended for most binaries)

Critical settings:

  • anal.timeout=120 - Prevent runaway analysis
  • anal.maxsize=67108864 - 64MB max function size
  • Use aa; aac for medium binaries, aaa only for small ones

Handling Unanalyzed Call Targets

If axtj returns empty for known imports:

# The import may be called indirectly or analysis was too shallow
# Option 1: Deeper analysis
aac   # Analyze all calls

# Option 2: Manually create function at call target
af @0x8048abc

# Option 3: Search for references to import address
axtj @sym.imp.connect

Function Enumeration

# All functions as JSON
aflj

# Filter by name pattern
aflj~main
aflj~init
aflj~network
aflj~send
aflj~recv

# Function count
afl~?

Cross-Reference Analysis

# Who calls this function?
axtj @sym.imp.connect

# What does this function call?
axfj @sym.main

# Data references to address
axtj @0x12345

String-Function Correlation

# Find which function contains a string
izj~api.vendor.com
# Note the vaddr, then find containing function
afi @0xVADDR

# Or search and map
"/j api"    # Search for string
axtj @@hit* # Xrefs to all hits

Import/Export Mapping

# Imports with addresses
iij

# Exports with addresses
iEj

# Symbols (if not stripped)
isj

Quick Disassembly

# Disassemble function as JSON
pdfj @sym.main

# Disassemble N instructions from address
pdj 20 @0x8400

# Print function summary
afi @sym.main

Stage 2: Deep Analysis

r2ghidra Availability Check

Before attempting decompilation, verify r2ghidra is installed:

# Check if r2ghidra is available
r2 -qc 'pdg?' - 2>/dev/null | grep -q Usage && echo "r2ghidra OK" || echo "SKIP: r2ghidra not installed"

# If missing, install with:
r2pm -ci r2ghidra

If r2ghidra unavailable: Rely on disassembly (pdf) and cross-reference analysis (axt/axf).

Targeted Decompilation (r2ghidra)

# Decompile specific function
pdgj @sym.target_function

# Or named function
pdgj @sym.main

Ghidra Headless (Large Binaries)

For complex functions or when r2ghidra struggles:

# Create analysis project and run script
analyzeHeadless /tmp/ghidra_proj proj \
  -import binary \
  -overwrite \
  -processor ARM:LE:32:v7 \
  -postScript ExportDecompilation.java sym.target_function \
  -deleteProject

Processor strings:

  • ARM 32-bit: ARM:LE:32:v7 or ARM:LE:32:Cortex
  • ARM 64-bit: AARCH64:LE:64:v8A
  • x86_64: x86:LE:64:default
  • MIPS LE: MIPS:LE:32:default
  • MIPS BE: MIPS:BE:32:default

Control Flow Analysis

# Basic blocks in function
afbj @sym.main

# Function call graph (dot format)
agCd @sym.main > callgraph.dot

# Control flow graph
agfd @sym.main > cfg.dot

Data Structure Recovery

# Analyze local variables
afvj @sym.main

# Stack frame layout
afvd @sym.main

# Global data references
adrj

Analysis Patterns

Pattern: Network Function Tracing

# Find all network-related calls
axtj @sym.imp.socket
axtj @sym.imp.connect
axtj @sym.imp.send
axtj @sym.imp.recv
axtj @sym.imp.SSL_read
axtj @sym.imp.SSL_write

# Trace caller chain
for func in $(aflj | jq -r '.[].name'); do
  axfj @$func | grep -q "socket\|connect" && echo $func
done

Pattern: Configuration File Analysis

# Find file operations
axtj @sym.imp.open
axtj @sym.imp.fopen

# Trace string arguments
"/j /etc"
"/j .conf"
"/j .json"

# Check what functions reference these paths

Pattern: Crypto Identification

# Common crypto imports
axtj @sym.imp.EVP_EncryptInit
axtj @sym.imp.AES_encrypt
axtj @sym.imp.SHA256

# Hardcoded keys (check strings near crypto calls)
izj | jq '.strings[] | select(.length == 16 or .length == 32)'

r2 JSON Commands Reference

CommandOutputUse Case
afljFunctions listMap code structure
axtj @addrXrefs TO addressWho uses this?
axfj @addrXrefs FROM addressWhat does it call?
pdfj @addrDisassemblyUnderstand instructions
pdgj @addrDecompilationPseudo-C output
afbj @addrBasic blocksControl flow
izjData stringsConfiguration, URLs
iijImportsExternal dependencies
iEjExportsPublic interface
afvj @addrLocal variablesStack analysis

Output Format

Record analysis findings as structured facts:

{
  "functions_analyzed": [
    {
      "name": "sub_8400",
      "address": "0x8400",
      "size": 256,
      "calls": ["socket", "connect", "send"],
      "called_by": ["main", "init_network"],
      "strings_referenced": ["api.vendor.com"],
      "hypothesis": "network_initialization"
    }
  ],
  "call_graph": {
    "main": ["init_config", "init_network", "main_loop"],
    "init_network": ["sub_8400", "SSL_CTX_new"]
  },
  "data_flow": [
    {
      "source": "config_file_read",
      "through": ["parse_config", "extract_url"],
      "sink": "connect_to_server"
    }
  ]
}

Knowledge Journaling

After static analysis, record findings for episodic memory:

[BINARY-RE:static] {filename} (sha256: {hash})

Functions analyzed: {count}
Decompilation performed: {yes|no}

Key functions:
  FACT: Function at {addr} calls {imports} (source: r2 axfj)
  FACT: Function at {addr} references string "{string}" (source: r2 axtj)
  FACT: Function {name} appears to {purpose} (source: decompilation)

Cross-references:
  FACT: {caller} calls {callee} (source: r2 axtj)

HYPOTHESIS UPDATE: {refined theory} (confidence: {new_value})
  Supporting: {fact_ids}
  Contradicting: {fact_ids}

New questions:
  QUESTION: {discovered unknown}

Answered questions:
  RESOLVED: {question} → {answer}

Example Journal Entry

[BINARY-RE:static] thermostat_daemon (sha256: a1b2c3d4...)

Functions analyzed: 47
Decompilation performed: yes (function 0x8400)

Key functions:
  FACT: Function 0x8400 calls curl_easy_perform, curl_easy_setopt (source: r2 axfj)
  FACT: Function 0x8400 references string "api.thermco.com/telemetry" (source: r2 axtj)
  FACT: Function 0x9200 parses JSON using jsmn library (source: decompilation)
  FACT: Function 0x10800 is main loop, calls 0x8400 after sleep(30) (source: r2 pdf)

Cross-references:
  FACT: main calls init_config (0x9000) then main_loop (0x10800) (source: r2 axtj)
  FACT: main_loop calls send_telemetry (0x8400) in loop (source: r2 pdf)

HYPOTHESIS UPDATE: Telemetry client sending to api.thermco.com every 30 seconds (confidence: 0.85)
  Supporting: URL string, curl imports, sleep(30) in loop
  Contradicting: none

New questions:
  QUESTION: What data fields are included in telemetry payload?
  QUESTION: Is there any authentication/API key?

Answered questions:
  RESOLVED: "What endpoint?" → api.thermco.com/telemetry via HTTPS

Decision Points

After static analysis:

  1. Identified critical functions? → Ready for dynamic verification
  2. Unclear behavior? → Try dynamic analysis for runtime observation
  3. Crypto detected? → Document key handling, note for security review
  4. Anti-analysis patterns? → Consider Unicorn snippet emulation

Next Steps

binary-re-dynamic-analysis to verify hypotheses with runtime observation → binary-re-synthesis if sufficient understanding reached

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/2389-research/binary-re-static-analysis/SKILL.mdView on GitHub

Overview

This skill enables deep static analysis of binaries to map functions, track data flow, and decompile critical code without executing. It combines radare2 (r2) and Ghidra headless for function enumeration, cross-references (xrefs), decompilation, and control-flow graph (CFG) analysis.

How This Skill Works

It follows a Two-Stage Approach: Stage 1 Light Analysis using r2 for function enumeration, strings, and imports; Stage 2 Deep Analysis with targeted decompilation and CFG exploration. Commands like aflj, afl, axtj, axfj, and izj are used to build the function map, references, and string correlations, while Ghidra headless handles heavy decompilation when needed.

When to Use It

  • After triage confirms architecture and ABI
  • When you need to understand a small set of interesting functions
  • If dynamic analysis is impractical or risky
  • To build hypotheses before dynamic verification
  • When cross-referencing functions, imports, and strings to map behavior

Quick Start

  1. Step 1: Prepare and obtain explicit approval for any execution; assess cross-arch emulation if needed
  2. Step 2: Run Stage 1 Light Analysis with r2: r2 -q0 -e scr.color=false -e anal.timeout=120 -e anal.maxsize=67108864 binary; inside r2: aa; aac
  3. Step 3: Move to Stage 2/Deep Analysis: use aflj, axtj, axfj, izj to map functions and references; perform targeted decompilation with Ghidra headless as needed

Best Practices

  • Obtain explicit human approval before executing any binary
  • Begin with Stage 1 light analysis (aaa/aa; aac) for broad coverage
  • Tune anal.timeout and anal.maxsize to prevent runaway analysis
  • Use cross-reference and string-function correlation (axtj, izj, afi) to locate function boundaries
  • Document findings and corroborate with reference outputs before moving to dynamic testing

Example Use Cases

  • Map all functions in a 64MB+ binary and build a call graph with aflj and afl
  • Identify xrefs to critical routines using axtj and axfj to understand usage
  • Decompile a hotspot function to infer behavior and data-flow
  • Trace data flow from imports to sinks by following references (axtj @sym.impo.., axtj @@hit*)
  • Compare I/O patterns against expected outputs to classify bugs without running the binary

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers