A function or API that consumes data in a way that could lead to code execution or other security flaws; see the dangerous-functions skill for lists.

How do I map data transformations?

Record each step the data takes between the source and the sink, including encoding/decoding, concatenations, and type changes.

Which tools help for tracing data flow?

Use IDE features (Find references, Go to definition, Find usages, Call hierarchy), grep, and static analysis approaches described in this skill.

Data Flow Tracing

npx machina-cli add skill allsmog/vuln-scout/data-flow-tracing --openclaw

Files (1)

SKILL.md

5.7 KB

Data Flow Tracing

Purpose

Guide the process of tracing user-controlled input from entry points (sources) through the application to security-sensitive functions (sinks). This is essential for confirming vulnerability exploitability.

When to Use

Activate this skill when:

Confirming if identified sinks receive user input
Mapping the path from source to sink
Understanding data transformations and filters
Determining if sanitization can be bypassed

Core Concepts

Sources (Input Entry Points)

HTTP Sources:

Language	Common Sources
PHP	`$_GET`, `$_POST`, `$_REQUEST`, `$_COOKIE`, `$_FILES`, `$_SERVER`
Java	`request.getParameter()`, `request.getHeader()`, `@RequestParam`
Python	`request.args`, `request.form`, `request.data`, `request.json`
Node.js	`req.query`, `req.body`, `req.params`, `req.headers`
.NET	`Request.QueryString`, `Request.Form`, `Request["param"]`

Other Sources:

Database queries (stored user data)
File contents (user-uploaded or modified)
Environment variables
External API responses

Sinks (Dangerous Functions)

Refer to the dangerous-functions skill for comprehensive sink lists.

Data Transformations

Track how data changes between source and sink:

Encoding/Decoding (base64, URL, HTML)
Concatenation with other strings
Array/object property access
Type conversions
String manipulations

Tracing Methodology

Step 1: Identify the Sink

Start from the dangerous function identified during code review.

Step 2: Find Direct Parameters

Identify what variables/parameters are passed to the sink.

Example: system($cmd);
Direct parameter: $cmd

Step 3: Trace Backwards

Follow each parameter to its origin:

Check function parameters
Check variable assignments
Check conditional branches
Check loop iterations
Check included/required files

Step 4: Identify Sources

Determine where user input enters:

$cmd = $_GET['command'];  // Direct source
$cmd = $row['command'];   // Database (check how it was stored)
$cmd = $config['cmd'];    // Config file (check if user-modifiable)

Step 5: Map Transformations

Document all changes to the data:

Source: $_GET['input']
  -> urldecode()
  -> str_replace(['../', '..\\'], '', $input)
  -> escapeshellarg()
  -> Sink: exec()

Step 6: Assess Exploitability

Consider:

Are filters/sanitization bypassable?
Is the full input controllable?
Are there alternative paths?

Tracing Techniques

Static Analysis (Manual)

Forward Tracing: Start from source, follow to sinks

$input = $_GET['x'];
$processed = process($input);
dangerous_function($processed);

Backward Tracing: Start from sink, trace to source

dangerous_function($var);
  <- $var = transform($data);
  <- $data = $_POST['param'];

Using IDE Features

Find all references to variable
Go to definition
Find usages
Call hierarchy

Using Grep

# Find where variable is assigned
grep -rn "\$varname\s*=" --include="*.php"

# Find where variable is used
grep -rn "\$varname" --include="*.php"

# Find function calls
grep -rn "functionName\s*(" --include="*.php"

Common Patterns

Direct Flow

$input = $_GET['cmd'];
system($input);  // Vulnerable

Database-Mediated Flow

// Store
$db->insert(['cmd' => $_POST['cmd']]);

// Later, retrieve and execute
$row = $db->query("SELECT cmd FROM jobs")->fetch();
system($row['cmd']);  // Vulnerable if original input wasn't sanitized

Configuration Flow

// Config loaded from user-modifiable file
$config = parse_ini_file('/var/www/config.ini');
system($config['backup_cmd']);  // Vulnerable if config is modifiable

Multi-File Flow

// file1.php
$_SESSION['cmd'] = $_GET['cmd'];

// file2.php
system($_SESSION['cmd']);  // Vulnerable

Sanitization Analysis

Identify Sanitization Functions

$input = htmlspecialchars($_GET['x']);  // XSS protection
$input = escapeshellarg($_GET['x']);    // Command injection protection
$input = intval($_GET['x']);            // Type casting
$input = preg_replace('/[^a-z]/', '', $_GET['x']);  // Whitelist

Assess Bypass Potential

Sanitization	Bypass Considerations
Blacklist	Missing characters, encoding
Whitelist	Logic errors, regex flaws
Type casting	Depends on sink requirements
Encoding	Double encoding, context
Length limits	Truncation attacks

Common Bypass Techniques

Case variations
Encoding (URL, Unicode, HTML)
Null bytes
Double encoding
Alternative representations

Documentation Template

When tracing, document findings:

## Finding: [Vulnerability Type]

### Sink
- File: path/to/file.php
- Line: 42
- Function: system($cmd)

### Source
- File: path/to/file.php  
- Line: 35
- Source: $_GET['command']

### Data Flow
1. $_GET['command'] received (line 35)
2. Passed to sanitize() function (line 36)
3. Concatenated with prefix (line 38)
4. Passed to system() (line 42)

### Sanitization
- sanitize() removes semicolons and pipes
- Bypass: Use newline (%0a) or $() syntax

### Exploitability
- Confirmed exploitable
- Payload: `valid_command%0awhoami`

Integration with Other Skills

Use dangerous-functions to identify sinks
Use vuln-patterns for exploitation techniques
Use exploit-techniques to develop PoC

Source

git clone https://github.com/allsmog/vuln-scout/blob/main/whitebox-pentest/skills/data-flow-tracing/SKILL.mdView on GitHub

Overview

Data Flow Tracing guides you to follow user-controlled input from entry points (sources) through the application to security-sensitive sinks. It helps confirm exploitability by revealing data paths, transformations, and whether sanitization can be bypassed.

How This Skill Works

It begins by identifying the sink and its direct parameters, then traces those values backward to their origins. The process emphasizes documenting data transformations (encoding, concatenation, type conversions) and uses static analysis, IDE features, and grep to map the flow and assess exploitability.

When to Use It

Confirming if identified sinks receive user input
Mapping the path from source to sink
Understanding data transformations and filters
Determining if sanitization can be bypassed
Tracing tainted data during whitebox pentesting

Quick Start

Step 1: Identify the sink in code review (dangerous function).
Step 2: Trace direct parameters to their origins and map all transformations.
Step 3: Assess exploitability and document findings with sources, sinks, and paths.

Best Practices

Identify the sink first during code review and map back to sources
Document all data transformations between source and sink
Leverage language-specific source lists to locate initial inputs
Test sanitization by attempting controlled bypass techniques in a safe test environment
Capture and annotate each step with code references and line numbers

Example Use Cases

Direct Flow: $input = $_GET['cmd']; system($input); // vulnerable
Database-Mediated Flow: Store in DB then read back and execute via system($row['cmd']);
Configuration Flow: Config value used in a command without proper validation
File Contents Flow: Read user-uploaded file contents and pass to a dangerous function
Environment/API Flow: Environment variable or external API response used to drive a shell command

Frequently Asked Questions

Add this skill to your agents

Data Flow Tracing

Data Flow Tracing

Purpose

When to Use

Core Concepts

Sources (Input Entry Points)

Sinks (Dangerous Functions)

Data Transformations

Tracing Methodology

Step 1: Identify the Sink

Step 2: Find Direct Parameters

Step 3: Trace Backwards

Step 4: Identify Sources

Step 5: Map Transformations

Step 6: Assess Exploitability

Tracing Techniques

Static Analysis (Manual)

Using IDE Features

Using Grep

Common Patterns

Direct Flow

Database-Mediated Flow

Configuration Flow

Multi-File Flow

Sanitization Analysis

Identify Sanitization Functions

Assess Bypass Potential

Common Bypass Techniques

Documentation Template

Integration with Other Skills

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is a sink?

How do I map data transformations?

Which tools help for tracing data flow?