cpg-analysis
npx machina-cli add skill allsmog/vuln-scout/cpg-analysis --openclawCode Property Graph (CPG) Analysis
What is a Code Property Graph?
A Code Property Graph (CPG) is a unified data structure that combines three representations of code:
- Abstract Syntax Tree (AST) - Structural representation
- Control Flow Graph (CFG) - Execution paths
- Program Dependence Graph (PDG) - Data and control dependencies
This combination enables powerful semantic queries that pattern-matching tools cannot achieve.
When to Use CPG vs Pattern Matching
| Approach | Use When | Example |
|---|---|---|
| Pattern Matching (Semgrep) | Known vulnerability patterns, syntax-level issues | Finding dynamic code execution calls |
| CPG Analysis (Joern) | Data flow tracking, cross-function analysis | Proving request input reaches database query through 5 functions |
Rule of thumb: Use CPG when you need to prove data flows between points, especially across function boundaries.
Joern Overview
Joern is the primary tool for CPG analysis. It:
- Parses source code into CPG representation
- Provides CPGQL (Scala-based) query language
- Supports JavaScript, TypeScript, Python, Java, C/C++, Go, PHP
Basic Joern Workflow
# 1. Parse codebase into CPG
joern-parse /path/to/code --output cpg.bin
# 2. Start Joern REPL or run scripts
joern --script analysis.sc --params cpgFile=cpg.bin
# 3. Or use Joern REPL interactively
joern
> importCpg("cpg.bin")
> cpg.method.name(".*login.*").l
CPGQL Query Language
CPGQL uses Scala syntax with CPG-specific operations.
Core Concepts
Nodes: Represent code elements
cpg.method- All methods/functionscpg.call- All function callscpg.parameter- Function parameterscpg.literal- Literal valuescpg.identifier- Variable references
Traversals: Navigate the graph
.name("pattern")- Filter by name (regex).code("pattern")- Filter by code content.argument- Get call arguments.caller- Get calling methods.callee- Get called methods
Data Flow: Track how data moves
.reachableBy(source)- Find if source reaches this point.reachableByFlows(source)- Get full paths
Common Query Patterns
Find all calls to a function:
cpg.call.name("query").l
Find parameters that reach dangerous sinks:
val sources = cpg.parameter.name("req.*|request.*")
val sinks = cpg.call.name("query|execute|run")
sinks.argument.reachableBy(sources).l
Get full data flow paths:
val sources = cpg.parameter.name("userInput")
val sinks = cpg.call.name("executeQuery")
sinks.argument.reachableByFlows(sources).p
Confidence Scoring
After CPG verification:
| Verification Result | Confidence | Meaning |
|---|---|---|
| Data flow confirmed | HIGH (0.9+) | CPG proves exploitability |
| Partial flow found | MEDIUM (0.6-0.9) | Some path exists, manual review needed |
| No flow found | LOW (0.3-0.6) | May be false positive or complex flow |
| Verification failed | UNKNOWN | Query error, manual analysis required |
Skill References
references/cpgql-patterns.md- Common vulnerability query patternsreferences/joern-cheatsheet.md- Quick Joern/CPGQL reference
Related Skills
- data-flow-tracing - Manual source-to-sink analysis
- dangerous-functions - Sink identification by language
- vuln-patterns - Pattern-based vulnerability knowledge
Source
git clone https://github.com/allsmog/vuln-scout/blob/main/whitebox-pentest/skills/cpg-analysis/SKILL.mdView on GitHub Overview
Code Property Graph (CPG) unifies AST, CFG, and PDG into a single model to enable semantic queries beyond pattern matching. This skill focuses on using Joern and CPGQL to perform data-flow verification and taint tracking, helping you prove how input moves through code and where vulnerabilities may arise.
How This Skill Works
Joern parses source code into a Code Property Graph and exposes CPGQL for traversing code elements and dependencies. Data-flow analysis is done with traversals like reachableBy and reachableByFlows to prove paths from sources (inputs) to sinks (vulnerable calls), across functions when needed.
When to Use It
- Proving that user input reaches a dangerous sink across multiple functions
- Performing taint tracking and data-flow verification with Joern
- Analyzing data flows in multi-language codebases (JS, Python, Java, C/C++, Go, PHP)
- When pattern matching misses complex paths and you need end-to-end flow proof
- Verifying full data-flow paths using reachableByFlows in CPGQL
Quick Start
- Step 1: joern-parse /path/to/code --output cpg.bin
- Step 2: joern --script analysis.sc --params cpgFile=cpg.bin
- Step 3: In Joern REPL, importCpg("cpg.bin"); run queries like cpg.call.name("query").l
Best Practices
- Start by parsing the codebase into a CPG with joern-parse /path/to/code --output cpg.bin
- Use Joern REPL or an analysis script to run queries against the CPG (e.g., joern --script analysis.sc --params cpgFile=cpg.bin)
- In queries, leverage .reachableBy(source) and .reachableByFlows(source) to identify paths and full data-flow chains
- Compare CPG-based results with pattern-based checks (e.g., Semgrep) when appropriate, to validate findings
- Keep a library of known sources (inputs) and sinks (dangerous calls) and update as code evolves
Example Use Cases
- Find all calls to a dangerous sink: cpg.call.name("query").l
- Trace whether request parameters reach a sink: val sources = cpg.parameter.name("req.*|request.*"); val sinks = cpg.call.name("query|execute|run"); sinks.argument.reachableBy(sources).l
- Get full data-flow paths from userInput to executeQuery: val sources = cpg.parameter.name("userInput"); val sinks = cpg.call.name("executeQuery"); sinks.argument.reachableByFlows(sources).p
- Basic Joern workflow: joern-parse /path/to/code --output cpg.bin; joern --script analysis.sc --params cpgFile=cpg.bin; joern; importCpg("cpg.bin"); cpg.method.name(".*login.*").l
- Explore core CPGQL concepts: cpg.method, cpg.call, cpg.parameter, cpg.literal, cpg.identifier and traversals like .name(), .code(), .caller, .callee