dataflow-analysis
Dataflow Analysis
Perform intra-procedural dataflow analysis to track how data flows within functions.
When to use
- Track if a function parameter flows to a function call argument
- Track if a function call's output flows to another function call's argument
- Find taint propagation paths (e.g., user input reaching dangerous functions)
- Detect vulnerabilities like command injection, buffer overflows
Instructions
Using the VulHunt MCP tools, open the project (open_project) and run the following Lua query (query_project).
To perform dataflow analysis, use project:calls_matching{}:
local calls = project:calls_matching({
to = <target_call>,
where = function(caller)
return caller:named("<function_name>") and caller:has_call(<target_call>)
end,
using = {
-- Annotate caller parameters
parameters = {var:named "first_param", _},
-- Annotate callees
callees = {
["malloc"] = {inputs = {var:named "size"}},
["strlen"] = {output = var:named "len", inputs = {_}},
["check_len"] = {inputs = {var:sanitised()}}
}
}
})
local results = {}
for _, c in ipairs(calls) do
local entry = {
caller_name = c.caller.name,
call_address = c.call_address,
}
if c.inputs[1] and c.inputs[1].annotation then
entry.arg1_annotation = c.inputs[1].annotation
entry.arg1_source = c.inputs[1].origin.source_address
end
if c.inputs[2] and c.inputs[2].annotation then
entry.arg2_annotation = c.inputs[2].annotation
entry.arg2_source = c.inputs[2].origin.source_address
end
if c.output then
entry.return_annotation = c.output.annotation
end
table.insert(results, entry)
end
return results
Possible values for <target_call>:
- A string, e.g.
"system" - An AddressValue
- VulHunt APIs return addresses as AddressValue instances
- Create one with
AddressValue.new(<hex_addr>)(e.g.,<hex_addr> = 0x1234)
- A regex, e.g.
{matching = "<regex>", kind = "symbol"} - A byte pattern, e.g.
{matching = "41544155", kind = "bytes"}
Inputs and output are of type OperandInfo (see operand-info.md). Origins are of type OperandOrigin (see operand-origin.md).
Annotations
var:named "x"- Tags a variable with the name "x". This tag follows the data through the function, allowing later checks on where it ends up (e.g., which function argument it flows into)._- Placeholder for variables that don't need to be tracked.var:sanitised()- Stops taint propagation when a tainted variable flows through that function argument.
The annotation set in using appears in c.inputs[N].annotation in the results.
For example, if annotated with var:named "cmd", then c.inputs[1].annotation == "cmd"
indicates the first argument came from that tracked variable.
Examples
Function parameter -> Function argument
Example 1: Buffer overflow via memcpy
C code snippet:
void vulnerable_function(int len, char *path) {
char buffer[256];
memcpy(buffer, path, len);
}
Lua query:
local calls = project:calls_matching{
to = "memcpy",
using = {
parameters = {var:named "len", var:named "path"}
}
}
local findings = {}
for _, call in ipairs(calls) do
local len_src = call.inputs[3]
local data_src = call.inputs[2]
if (len_src ~= nil and len_src.annotation == "len") or
(data_src ~= nil and data_src.annotation == "path") then
table.insert(findings, {
caller_address = tostring(call.caller_address),
call_address = tostring(call.call_address),
})
end
end
return findings
Example 2: Command injection via snprintf -> system
C code snippet:
void vulnerable_function(char *cmd) {
char buffer[256];
snprintf(buffer, sizeof(buffer), "sh -c %s", cmd);
system(buffer);
}
Lua query:
local calls = project:calls_matching{
to = "system",
where = function(caller)
return caller:has_call("snprintf")
end,
using = {
callees = {snprintf = {inputs = {var:named "cmd", _, _}}}
}
}
local findings = {}
for _, call in ipairs(calls) do
local src = call.inputs[1]
if src ~= nil and src.annotation == "cmd" then
table.insert(findings, {
caller_address = tostring(call.caller_address),
call_address = tostring(call.call_address),
})
end
end
return findings
Use cases
Command injection
Shell commands built from format strings
Find calls to system() where the argument was built using snprintf():
local calls = project:calls_matching{
to = "system", -- system(cmd)
where = function(caller)
return caller:has_call("snprintf") -- snprintf(cmd, ...)
end,
using = {
callees = {snprintf = {inputs = {var:named "cmd", _, _}}}
}
}
local findings = {}
for _, call in ipairs(calls) do
local src = call.inputs[1]
if src ~= nil and src.annotation == "cmd" then
table.insert(findings, {
snprintf_address = tostring(src.origin.source_address),
caller_name = tostring(call.caller.name),
caller_address = tostring(call.caller_address),
call_address = tostring(call.call_address),
})
end
end
return findings
NOTE: Only change the propagated value if the source changes.
Returns a JSON object containing:
snprintf_addressis the address of the call site tosnprintfcaller_addressis the address of the function that makes the callcall_addressis the address of the call site tosystem(the code block address where the call is made)
References
- calls-matching-param.md - Input format for
calls_matching - calls-matching-table.md - Structure of the returned table from
calls_matching - regex-matcher.md - Regex matching utilities
URLs to additional documentation pages are available at https://vulhunt.re/llm.txt
Related Skills
- functions (
/functions) - Use this skill to find target functions by name, address, or pattern before performing dataflow analysis - call-sites (
/call-sites) - To find where functions are called without tracking data flow, use this simpler skill instead - decompiler (
/decompiler) - View decompiled code to understand function logic before setting up complex dataflow annotations
More from vulhunt-re/skills
decompiler
Decompile a function to C-like pseudocode for human-readable analysis. Use to understand function logic, review control flow, or prepare for code pattern matching.
16functions
Find and list functions in a binary by name, address, regex, or byte pattern. Use as the starting point for binary analysis, to locate specific functions, or to enumerate all functions matching criteria.
6btp-ba2-cli
Interact with the Binarly Transparency Platform (BTP) via CLI commands for uploading firmware, running scans, downloading BA2 archives, and pushing custom rules. Use when you need to interact with the Binarly Transparency Platform or working with BA2s.
6call-sites
Find all locations where functions are called in a binary. Use when analyzing callers of a function, checking call relationships, or identifying which functions invoke a specific API.
6code-pattern-matching
Search for code patterns in decompiled output using Weggli semantic matching. Use when finding vulnerable code constructs like unchecked memcpy, buffer operations, or specific function call patterns in pseudocode.
6byte-pattern-matching
Search for raw byte patterns (hex sequences, opcodes) in binary code. Use when looking for specific instruction sequences, machine code patterns, UEFI SMI handlers, or known vulnerability signatures by their byte representation.
5