Variant Analysis

Security Notice

AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:

Authorized security assessments with written permission
Proactive vulnerability discovery in owned codebases
Post-incident variant hunting after a CVE is reported
Security research with proper disclosure
Educational purposes in controlled environments

NEVER use for:

Scanning systems without authorization
Developing exploits for unauthorized use
Circumventing security controls
Any illegal activities

Step 1: Seed Vulnerability Analysis

Start from a known vulnerability (CVE, bug report, or code pattern):

Extract the Vulnerability Pattern

Identify the bug class: What type of vulnerability is it? (SQL injection, XSS, buffer overflow, TOCTOU, etc.)
Identify the source: Where does untrusted data enter? (user input, network, file, environment)
Identify the sink: Where does the data cause harm? (SQL query, HTML output, memory write, system call)
Identify missing sanitization: What check/transform is absent between source and sink?
Abstract the pattern: Generalize beyond the specific instance

Example Seed Analysis

CVE-2024-XXXX: SQL Injection in user search
- Bug class: CWE-089 (SQL Injection)
- Source: HTTP request parameter `q`
- Sink: String concatenation into SQL query
- Missing: Parameterized query or input sanitization
- Pattern: request.param → string concat → db.query()

Step 2: Pattern Generalization

Transform the seed into a query pattern:

Abstraction Levels

Level	Description	Example
Exact	Same function, same file	`searchUsers(req.query.q)`
Local	Same pattern, different function	Any `db.query("..."+userInput)`
Structural	Same dataflow shape	Any source-to-sink without sanitization
Semantic	Same bug class, any syntax	Any SQL injection variant

CodeQL Pattern Template

/**
 * @name Variant of CVE-XXXX: [description]
 * @description Finds code structurally similar to [seed vulnerability]
 * @kind path-problem
 * @problem.severity error
 * @security-severity 8.0
 * @precision high
 * @id js/variant-cve-xxxx
 * @tags security
 *       external/cwe/cwe-089
 */

import javascript
import DataFlow::PathGraph

class UntrustedSource extends DataFlow::Node {
  UntrustedSource() {
    // Define sources: HTTP parameters, request body, etc.
    this = any(Express::RequestInputAccess ria).flow()
  }
}

class VulnerableSink extends DataFlow::Node {
  VulnerableSink() {
    // Define sinks: string concatenation in SQL context
    exists(DataFlow::CallNode call |
      call.getCalleeName() = "query" and
      this = call.getArgument(0)
    )
  }
}

class VariantConfig extends DataFlow::Configuration {
  VariantConfig() { this = "VariantConfig" }

  override predicate isSource(DataFlow::Node source) {
    source instanceof UntrustedSource
  }

  override predicate isSink(DataFlow::Node sink) {
    sink instanceof VulnerableSink
  }

  override predicate isBarrier(DataFlow::Node node) {
    // Known sanitizers that prevent the vulnerability
    node = any(DataFlow::CallNode c |
      c.getCalleeName() = ["escape", "sanitize", "parameterize"]
    ).getAResult()
  }
}

from VariantConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink,
  "Potential variant of CVE-XXXX: untrusted data flows to SQL query without sanitization."

Semgrep Pattern Template

rules:
  - id: variant-cve-xxxx-sql-injection
    message: >
      Potential variant of CVE-XXXX: User input flows into SQL query
      via string concatenation without parameterization.
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      cwe:
        - CWE-089
      confidence: HIGH
      impact: HIGH
      category: security
      technology:
        - express
        - node.js
      references:
        - https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-XXXX
    patterns:
      - pattern-either:
          - pattern: |
              $DB.query("..." + $USERINPUT + "...")
          - pattern: |
              $DB.query(`...${$USERINPUT}...`)
          - pattern: |
              $QUERY = "..." + $USERINPUT + "..."
              ...
              $DB.query($QUERY)
      - pattern-not:
          - pattern: |
              $DB.query($QUERY, [...])
    fix: |
      $DB.query($QUERY, [$USERINPUT])

Step 3: Variant Discovery

Run the Analysis

# CodeQL variant scan
codeql database analyze codeql-db \
  --format=sarifv2.1.0 \
  --output=variant-results.sarif \
  ./variant-queries/

# Semgrep variant scan
semgrep scan \
  --config=./variant-rules/ \
  --sarif --output=variant-semgrep.sarif

# Cross-repo CodeQL scan (GitHub)
codeql database analyze codeql-db-repo-1 codeql-db-repo-2 \
  --format=sarifv2.1.0 \
  --output=cross-repo-variants.sarif \
  ./variant-queries/

Manual Pattern Search

When automated tools miss variants, use manual search:

# Search for the syntactic pattern
grep -rn "db\.query.*\+" --include="*.js" --include="*.ts" .

# Search for the function call pattern
grep -rn "\.query\s*(" --include="*.js" --include="*.ts" . | grep -v "parameterized\|escape\|sanitize"

# AST-based search with ast-grep
sg -p 'db.query("..." + $X)' --lang js

Step 4: Variant Classification

Triage Each Variant

For each discovered instance, classify:

Factor	Question	Impact on Priority
Reachability	Can an attacker reach this code path?	Critical if reachable
Exploitability	Can the vulnerability be exploited?	Critical if exploitable
Impact	What damage can exploitation cause?	Based on CIA triad
Confidence	How certain is this a true positive?	HIGH/MEDIUM/LOW
Similarity	How structurally close to seed?	Higher = higher confidence

Variant Family Tracking

## Variant Family: CWE-089 SQL Injection

### Seed: CVE-XXXX (src/api/users.js:42)

- Pattern: request.param -> string concat -> db.query()

### Variants Found:

1. **V-001** src/api/products.js:78 (HIGH confidence)
   - Same pattern, different endpoint
   - Exploitable: YES
   - Fix: Use parameterized query

2. **V-002** src/api/orders.js:123 (MEDIUM confidence)
   - Similar pattern, additional transform
   - Exploitable: NEEDS INVESTIGATION
   - Fix: Use parameterized query

3. **V-003** src/legacy/search.js:45 (LOW confidence)
   - Partial match, may be sanitized upstream
   - Exploitable: UNLIKELY
   - Fix: Verify sanitization chain

Step 5: Remediation and Report

Variant Analysis Report

## Variant Analysis Report

**Seed**: [CVE/bug ID and description]
**Date**: YYYY-MM-DD
**Scope**: [repositories/directories analyzed]
**Tools**: CodeQL, Semgrep, manual review

### Executive Summary

- Variants found: X
- Critical: X | High: X | Medium: X | Low: X
- False positives: X
- Estimated remediation effort: X hours

### Variant Details

[For each variant: location, classification, remediation]

### Pattern Evolution

[How the pattern varies across the codebase]

### Recommendations

1. Fix all CRITICAL/HIGH variants immediately
2. Add regression tests for each variant
3. Add CI/CD checks to prevent pattern recurrence
4. Consider architectural changes to eliminate the bug class

Common Vulnerability Seed Patterns

Injection Variants

Seed Pattern	Variant Discovery Query
SQL injection via concatenation	`source -> string.concat -> db.query`
Command injection via interpolation	`source -> template.literal -> exec`
XSS via innerHTML	`source -> assignment -> innerHTML`
Path traversal via user path	`source -> path.join -> fs.read`

Authentication Variants

Seed Pattern	Variant Discovery Query
Missing auth check	`route.handler without auth.middleware`
Weak comparison	`password == input (not timing-safe)`
Token reuse	`token.generate without uniqueness`

Related Skills

static-analysis - CodeQL and Semgrep with SARIF output
semgrep-rule-creator - Custom vulnerability detection rules
differential-review - Security-focused diff analysis
insecure-defaults - Hardcoded credentials and fail-open detection
security-architect - STRIDE threat modeling

Agent Integration

security-architect (primary): Threat modeling and vulnerability assessment
code-reviewer (secondary): Pattern-aware code review
penetration-tester (secondary): Exploit verification for variants

Iron Laws

ALWAYS start from a confirmed seed vulnerability before writing any pattern queries
NEVER broaden a query without first verifying it matches the known seed vulnerability
ALWAYS test pattern queries against at least one known-vulnerable instance before scanning broadly
NEVER report a variant finding without manual triage confirming reachability and exploitability
ALWAYS check all related repositories when a variant is confirmed in one codebase

Anti-Patterns

Anti-Pattern	Why It Fails	Correct Approach
Exact-match queries only	Misses refactored and syntactically different variants	Abstract the pattern and test all four abstraction levels
No seed verification step	Query may not match the known vulnerability	Test query against seed instance first
Overly broad patterns	High false positive rate wastes triage time	Narrow with `pattern-not` for known-safe patterns
Single-repo scan	Variant may exist in sibling repositories	Scan all related repos with the same framework
Stopping after first variant found	Leaves the bug class partially patched	Perform exhaustive search across the full codebase

Memory Protocol (MANDATORY)

Before starting: Read .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.

variant-analysis