Taint Instrumentation Assistant

Instrument code to track untrusted and sensitive data flow for security vulnerability detection.

Workflow

Follow these steps to add taint tracking instrumentation:

1. Identify Taint Sources and Sinks

Define what data to track and where violations occur:

Taint sources (untrusted/sensitive data origins):

User input (HTTP parameters, form data, command-line args)
File reads (configuration files, user uploads)
Database queries (user-provided data)
Network input (API responses, socket data)
Environment variables

Taint sinks (dangerous operations):

SQL queries (SQL injection risk)
System commands (command injection risk)
HTML output (XSS risk)
File operations (path traversal risk)
Eval/exec statements (code injection risk)
Network output (data leak risk)

2. Instrument Taint Sources

Mark data from untrusted sources as tainted:

# Mark user input as tainted
def mark_tainted(value, source):
    """Mark a value as tainted from a specific source"""
    if hasattr(value, '__taint__'):
        value.__taint__ = source
    return value

# Example: HTTP parameter
user_input = request.GET['username']
user_input = mark_tainted(user_input, source="HTTP_PARAM")

3. Propagate Taint Through Operations

Track taint as data flows through the program:

# Taint propagation for string operations
def tainted_concat(str1, str2):
    result = str1 + str2
    # If either input is tainted, result is tainted
    if hasattr(str1, '__taint__') or hasattr(str2, '__taint__'):
        result.__taint__ = getattr(str1, '__taint__', None) or getattr(str2, '__taint__', None)
    return result

4. Check Taint at Sinks

Detect when tainted data reaches dangerous operations:

# Check for tainted data at SQL sink
def execute_query(query):
    if hasattr(query, '__taint__'):
        print(f"TAINT VIOLATION: Tainted data from {query.__taint__} used in SQL query")
        print(f"Query: {query}")
        # Optionally: raise exception or log for analysis
    # Execute query...

5. Generate Instrumented Code

Produce code with complete taint tracking:

Instrumented source code with taint tracking
Taint policy configuration (sources and sinks)
Violation report format
Usage instructions

Language-Specific Patterns

Python

# Taint tracking infrastructure
class TaintedStr(str):
    """String wrapper that carries taint information"""
    def __new__(cls, value, taint_source=None):
        instance = super().__new__(cls, value)
        instance.taint_source = taint_source
        return instance

    def __add__(self, other):
        result = TaintedStr(super().__add__(other))
        result.taint_source = self.taint_source or getattr(other, 'taint_source', None)
        return result

# Mark taint source
def get_user_input():
    user_data = input("Enter username: ")
    return TaintedStr(user_data, taint_source="USER_INPUT")

# Check taint sink
def execute_sql(query):
    if isinstance(query, TaintedStr) and query.taint_source:
        print(f"[TAINT VIOLATION] SQL Injection risk!")
        print(f"  Source: {query.taint_source}")
        print(f"  Query: {query}")
        raise SecurityError("Tainted data in SQL query")
    # Execute query...

# Example usage
username = get_user_input()
query = TaintedStr("SELECT * FROM users WHERE name = '") + username + TaintedStr("'")
execute_sql(query)  # Triggers violation

Java

// Taint tracking class
class TaintedString {
    private String value;
    private String taintSource;

    public TaintedString(String value, String taintSource) {
        this.value = value;
        this.taintSource = taintSource;
    }

    public String getValue() { return value; }
    public String getTaintSource() { return taintSource; }
    public boolean isTainted() { return taintSource != null; }

    public TaintedString concat(TaintedString other) {
        String newValue = this.value + other.value;
        String newSource = this.taintSource != null ? this.taintSource : other.taintSource;
        return new TaintedString(newValue, newSource);
    }
}

// Mark taint source
TaintedString getUserInput() {
    Scanner scanner = new Scanner(System.in);
    String input = scanner.nextLine();
    return new TaintedString(input, "USER_INPUT");
}

// Check taint sink
void executeSQL(TaintedString query) {
    if (query.isTainted()) {
        System.err.println("[TAINT VIOLATION] SQL Injection risk!");
        System.err.println("  Source: " + query.getTaintSource());
        System.err.println("  Query: " + query.getValue());
        throw new SecurityException("Tainted data in SQL query");
    }
    // Execute query...
}

JavaScript

// Taint tracking wrapper
class TaintedString {
    constructor(value, taintSource = null) {
        this.value = value;
        this.taintSource = taintSource;
    }

    concat(other) {
        const newValue = this.value + (other.value || other);
        const newSource = this.taintSource || other.taintSource;
        return new TaintedString(newValue, newSource);
    }

    toString() {
        return this.value;
    }
}

// Mark taint source
function getUserInput() {
    const input = prompt("Enter username:");
    return new TaintedString(input, "USER_INPUT");
}

// Check taint sink
function executeSQL(query) {
    if (query instanceof TaintedString && query.taintSource) {
        console.error("[TAINT VIOLATION] SQL Injection risk!");
        console.error(`  Source: ${query.taintSource}`);
        console.error(`  Query: ${query.value}`);
        throw new Error("Tainted data in SQL query");
    }
    // Execute query...
}

Common Vulnerability Patterns

SQL Injection Detection

# Original vulnerable code
def login(username, password):
    query = f"SELECT * FROM users WHERE name='{username}' AND pass='{password}'"
    return db.execute(query)

# Instrumented code
def login(username, password):
    # Mark inputs as tainted
    username = TaintedStr(username, "HTTP_PARAM:username")
    password = TaintedStr(password, "HTTP_PARAM:password")

    # Build query (taint propagates)
    query = TaintedStr(f"SELECT * FROM users WHERE name='") + username + TaintedStr("' AND pass='") + password + TaintedStr("'")

    # Check at sink
    if isinstance(query, TaintedStr) and query.taint_source:
        print(f"[TAINT VIOLATION] SQL Injection detected!")
        print(f"  Tainted input: {query.taint_source}")
        print(f"  Query: {query}")

    return db.execute(str(query))

XSS Detection

# Original vulnerable code
def render_greeting(name):
    return f"<h1>Hello, {name}!</h1>"

# Instrumented code
def render_greeting(name):
    # Mark input as tainted
    name = TaintedStr(name, "HTTP_PARAM:name")

    # Build HTML (taint propagates)
    html = TaintedStr("<h1>Hello, ") + name + TaintedStr("!</h1>")

    # Check at sink (HTML output)
    if isinstance(html, TaintedStr) and html.taint_source:
        print(f"[TAINT VIOLATION] XSS risk detected!")
        print(f"  Tainted input: {html.taint_source}")
        print(f"  HTML: {html}")

    return str(html)

Command Injection Detection

# Original vulnerable code
def process_file(filename):
    os.system(f"cat {filename}")

# Instrumented code
def process_file(filename):
    # Mark input as tainted
    filename = TaintedStr(filename, "USER_INPUT:filename")

    # Build command (taint propagates)
    command = TaintedStr("cat ") + filename

    # Check at sink (system command)
    if isinstance(command, TaintedStr) and command.taint_source:
        print(f"[TAINT VIOLATION] Command Injection risk!")
        print(f"  Tainted input: {command.taint_source}")
        print(f"  Command: {command}")

    os.system(str(command))

Taint Policy Configuration

# taint_policy.py
TAINT_SOURCES = {
    "HTTP_PARAM": ["request.GET", "request.POST", "request.args"],
    "USER_INPUT": ["input()", "sys.stdin.read()"],
    "FILE_READ": ["open().read()", "Path.read_text()"],
    "ENV_VAR": ["os.getenv()", "os.environ"],
}

TAINT_SINKS = {
    "SQL_QUERY": ["db.execute()", "cursor.execute()"],
    "SYSTEM_CMD": ["os.system()", "subprocess.call()"],
    "HTML_OUTPUT": ["render_template()", "HttpResponse()"],
    "FILE_WRITE": ["open().write()", "Path.write_text()"],
    "EVAL": ["eval()", "exec()"],
}

TAINT_ENABLED = True
REPORT_FORMAT = "detailed"  # or "summary"

Output Format

Taint Violation Report

## Taint Analysis Report

**File**: app.py
**Analysis Date**: 2024-02-17

### Violations Detected

#### Violation 1: SQL Injection Risk
- **Severity**: HIGH
- **Location**: app.py:45
- **Taint Source**: HTTP_PARAM:username
- **Taint Sink**: db.execute()
- **Data Flow**:
  1. User input from HTTP parameter 'username' (line 42)
  2. String concatenation in query building (line 44)
  3. Passed to db.execute() without sanitization (line 45)
- **Recommendation**: Use parameterized queries

#### Violation 2: XSS Risk
- **Severity**: MEDIUM
- **Location**: app.py:78
- **Taint Source**: HTTP_PARAM:comment
- **Taint Sink**: render_template()
- **Data Flow**:
  1. User input from HTTP parameter 'comment' (line 75)
  2. Embedded in HTML template (line 78)
- **Recommendation**: Use HTML escaping

### Summary
- Total violations: 2
- High severity: 1
- Medium severity: 1
- Low severity: 0

Best Practices

Comprehensive source marking: Mark all untrusted input sources
Complete propagation: Track taint through all operations
Strict sink checking: Verify all dangerous operations
Minimal false positives: Use precise taint rules
Performance consideration: Optimize for production use
Clear reporting: Provide actionable violation reports

Advanced Features

Sanitization Tracking

def sanitize_sql(value):
    """Remove taint after sanitization"""
    if isinstance(value, TaintedStr):
        # Sanitize and remove taint
        sanitized = value.replace("'", "''")
        return str(sanitized)  # Return regular string (untainted)
    return value

# Usage
username = TaintedStr(user_input, "HTTP_PARAM")
safe_username = sanitize_sql(username)  # No longer tainted
query = f"SELECT * FROM users WHERE name='{safe_username}'"  # Safe

Multi-Level Taint

class TaintLevel:
    UNTAINTED = 0
    LOW = 1
    MEDIUM = 2
    HIGH = 3

class TaintedStr(str):
    def __init__(self, value, taint_level=TaintLevel.UNTAINTED):
        self.taint_level = taint_level

# Different sources have different taint levels
public_data = TaintedStr(data, TaintLevel.LOW)
user_input = TaintedStr(input, TaintLevel.HIGH)

Constraints

Preserve semantics: Taint tracking shouldn't change program behavior
Minimal overhead: Keep performance impact low
Complete coverage: Track all taint propagation paths
Accurate detection: Minimize false positives and negatives

taint-instrumentation-assistant