taint-instrumentation-assistant
SKILL.md
Taint Instrumentation Assistant
Instrument code to track untrusted and sensitive data flow for security vulnerability detection.
Workflow
Follow these steps to add taint tracking instrumentation:
1. Identify Taint Sources and Sinks
Define what data to track and where violations occur:
Taint sources (untrusted/sensitive data origins):
- User input (HTTP parameters, form data, command-line args)
- File reads (configuration files, user uploads)
- Database queries (user-provided data)
- Network input (API responses, socket data)
- Environment variables
Taint sinks (dangerous operations):
- SQL queries (SQL injection risk)
- System commands (command injection risk)
- HTML output (XSS risk)
- File operations (path traversal risk)
- Eval/exec statements (code injection risk)
- Network output (data leak risk)
2. Instrument Taint Sources
Mark data from untrusted sources as tainted:
# Mark user input as tainted
def mark_tainted(value, source):
"""Mark a value as tainted from a specific source"""
if hasattr(value, '__taint__'):
value.__taint__ = source
return value
# Example: HTTP parameter
user_input = request.GET['username']
user_input = mark_tainted(user_input, source="HTTP_PARAM")
3. Propagate Taint Through Operations
Track taint as data flows through the program:
# Taint propagation for string operations
def tainted_concat(str1, str2):
result = str1 + str2
# If either input is tainted, result is tainted
if hasattr(str1, '__taint__') or hasattr(str2, '__taint__'):
result.__taint__ = getattr(str1, '__taint__', None) or getattr(str2, '__taint__', None)
return result
4. Check Taint at Sinks
Detect when tainted data reaches dangerous operations:
# Check for tainted data at SQL sink
def execute_query(query):
if hasattr(query, '__taint__'):
print(f"TAINT VIOLATION: Tainted data from {query.__taint__} used in SQL query")
print(f"Query: {query}")
# Optionally: raise exception or log for analysis
# Execute query...
5. Generate Instrumented Code
Produce code with complete taint tracking:
- Instrumented source code with taint tracking
- Taint policy configuration (sources and sinks)
- Violation report format
- Usage instructions
Language-Specific Patterns
Python
# Taint tracking infrastructure
class TaintedStr(str):
"""String wrapper that carries taint information"""
def __new__(cls, value, taint_source=None):
instance = super().__new__(cls, value)
instance.taint_source = taint_source
return instance
def __add__(self, other):
result = TaintedStr(super().__add__(other))
result.taint_source = self.taint_source or getattr(other, 'taint_source', None)
return result
# Mark taint source
def get_user_input():
user_data = input("Enter username: ")
return TaintedStr(user_data, taint_source="USER_INPUT")
# Check taint sink
def execute_sql(query):
if isinstance(query, TaintedStr) and query.taint_source:
print(f"[TAINT VIOLATION] SQL Injection risk!")
print(f" Source: {query.taint_source}")
print(f" Query: {query}")
raise SecurityError("Tainted data in SQL query")
# Execute query...
# Example usage
username = get_user_input()
query = TaintedStr("SELECT * FROM users WHERE name = '") + username + TaintedStr("'")
execute_sql(query) # Triggers violation
Java
// Taint tracking class
class TaintedString {
private String value;
private String taintSource;
public TaintedString(String value, String taintSource) {
this.value = value;
this.taintSource = taintSource;
}
public String getValue() { return value; }
public String getTaintSource() { return taintSource; }
public boolean isTainted() { return taintSource != null; }
public TaintedString concat(TaintedString other) {
String newValue = this.value + other.value;
String newSource = this.taintSource != null ? this.taintSource : other.taintSource;
return new TaintedString(newValue, newSource);
}
}
// Mark taint source
TaintedString getUserInput() {
Scanner scanner = new Scanner(System.in);
String input = scanner.nextLine();
return new TaintedString(input, "USER_INPUT");
}
// Check taint sink
void executeSQL(TaintedString query) {
if (query.isTainted()) {
System.err.println("[TAINT VIOLATION] SQL Injection risk!");
System.err.println(" Source: " + query.getTaintSource());
System.err.println(" Query: " + query.getValue());
throw new SecurityException("Tainted data in SQL query");
}
// Execute query...
}
JavaScript
// Taint tracking wrapper
class TaintedString {
constructor(value, taintSource = null) {
this.value = value;
this.taintSource = taintSource;
}
concat(other) {
const newValue = this.value + (other.value || other);
const newSource = this.taintSource || other.taintSource;
return new TaintedString(newValue, newSource);
}
toString() {
return this.value;
}
}
// Mark taint source
function getUserInput() {
const input = prompt("Enter username:");
return new TaintedString(input, "USER_INPUT");
}
// Check taint sink
function executeSQL(query) {
if (query instanceof TaintedString && query.taintSource) {
console.error("[TAINT VIOLATION] SQL Injection risk!");
console.error(` Source: ${query.taintSource}`);
console.error(` Query: ${query.value}`);
throw new Error("Tainted data in SQL query");
}
// Execute query...
}
Common Vulnerability Patterns
SQL Injection Detection
# Original vulnerable code
def login(username, password):
query = f"SELECT * FROM users WHERE name='{username}' AND pass='{password}'"
return db.execute(query)
# Instrumented code
def login(username, password):
# Mark inputs as tainted
username = TaintedStr(username, "HTTP_PARAM:username")
password = TaintedStr(password, "HTTP_PARAM:password")
# Build query (taint propagates)
query = TaintedStr(f"SELECT * FROM users WHERE name='") + username + TaintedStr("' AND pass='") + password + TaintedStr("'")
# Check at sink
if isinstance(query, TaintedStr) and query.taint_source:
print(f"[TAINT VIOLATION] SQL Injection detected!")
print(f" Tainted input: {query.taint_source}")
print(f" Query: {query}")
return db.execute(str(query))
XSS Detection
# Original vulnerable code
def render_greeting(name):
return f"<h1>Hello, {name}!</h1>"
# Instrumented code
def render_greeting(name):
# Mark input as tainted
name = TaintedStr(name, "HTTP_PARAM:name")
# Build HTML (taint propagates)
html = TaintedStr("<h1>Hello, ") + name + TaintedStr("!</h1>")
# Check at sink (HTML output)
if isinstance(html, TaintedStr) and html.taint_source:
print(f"[TAINT VIOLATION] XSS risk detected!")
print(f" Tainted input: {html.taint_source}")
print(f" HTML: {html}")
return str(html)
Command Injection Detection
# Original vulnerable code
def process_file(filename):
os.system(f"cat {filename}")
# Instrumented code
def process_file(filename):
# Mark input as tainted
filename = TaintedStr(filename, "USER_INPUT:filename")
# Build command (taint propagates)
command = TaintedStr("cat ") + filename
# Check at sink (system command)
if isinstance(command, TaintedStr) and command.taint_source:
print(f"[TAINT VIOLATION] Command Injection risk!")
print(f" Tainted input: {command.taint_source}")
print(f" Command: {command}")
os.system(str(command))
Taint Policy Configuration
# taint_policy.py
TAINT_SOURCES = {
"HTTP_PARAM": ["request.GET", "request.POST", "request.args"],
"USER_INPUT": ["input()", "sys.stdin.read()"],
"FILE_READ": ["open().read()", "Path.read_text()"],
"ENV_VAR": ["os.getenv()", "os.environ"],
}
TAINT_SINKS = {
"SQL_QUERY": ["db.execute()", "cursor.execute()"],
"SYSTEM_CMD": ["os.system()", "subprocess.call()"],
"HTML_OUTPUT": ["render_template()", "HttpResponse()"],
"FILE_WRITE": ["open().write()", "Path.write_text()"],
"EVAL": ["eval()", "exec()"],
}
TAINT_ENABLED = True
REPORT_FORMAT = "detailed" # or "summary"
Output Format
Taint Violation Report
## Taint Analysis Report
**File**: app.py
**Analysis Date**: 2024-02-17
### Violations Detected
#### Violation 1: SQL Injection Risk
- **Severity**: HIGH
- **Location**: app.py:45
- **Taint Source**: HTTP_PARAM:username
- **Taint Sink**: db.execute()
- **Data Flow**:
1. User input from HTTP parameter 'username' (line 42)
2. String concatenation in query building (line 44)
3. Passed to db.execute() without sanitization (line 45)
- **Recommendation**: Use parameterized queries
#### Violation 2: XSS Risk
- **Severity**: MEDIUM
- **Location**: app.py:78
- **Taint Source**: HTTP_PARAM:comment
- **Taint Sink**: render_template()
- **Data Flow**:
1. User input from HTTP parameter 'comment' (line 75)
2. Embedded in HTML template (line 78)
- **Recommendation**: Use HTML escaping
### Summary
- Total violations: 2
- High severity: 1
- Medium severity: 1
- Low severity: 0
Best Practices
- Comprehensive source marking: Mark all untrusted input sources
- Complete propagation: Track taint through all operations
- Strict sink checking: Verify all dangerous operations
- Minimal false positives: Use precise taint rules
- Performance consideration: Optimize for production use
- Clear reporting: Provide actionable violation reports
Advanced Features
Sanitization Tracking
def sanitize_sql(value):
"""Remove taint after sanitization"""
if isinstance(value, TaintedStr):
# Sanitize and remove taint
sanitized = value.replace("'", "''")
return str(sanitized) # Return regular string (untainted)
return value
# Usage
username = TaintedStr(user_input, "HTTP_PARAM")
safe_username = sanitize_sql(username) # No longer tainted
query = f"SELECT * FROM users WHERE name='{safe_username}'" # Safe
Multi-Level Taint
class TaintLevel:
UNTAINTED = 0
LOW = 1
MEDIUM = 2
HIGH = 3
class TaintedStr(str):
def __init__(self, value, taint_level=TaintLevel.UNTAINTED):
self.taint_level = taint_level
# Different sources have different taint levels
public_data = TaintedStr(data, TaintLevel.LOW)
user_input = TaintedStr(input, TaintLevel.HIGH)
Constraints
- Preserve semantics: Taint tracking shouldn't change program behavior
- Minimal overhead: Keep performance impact low
- Complete coverage: Track all taint propagation paths
- Accurate detection: Minimize false positives and negatives
Weekly Installs
1
Repository
arabelatso/skills-4-seGitHub Stars
47
First Seen
13 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1