skills/5dlabs/cto/semgrep

semgrep

SKILL.md

Semgrep Static Analysis

When to Use Semgrep

Ideal scenarios:

  • Quick security scans (minutes, not hours)
  • Pattern-based bug detection
  • Enforcing coding standards and best practices
  • Finding known vulnerability patterns
  • Single-file analysis without complex data flow
  • First-pass analysis before deeper tools

Consider CodeQL instead when:

  • Need interprocedural taint tracking across files
  • Complex data flow analysis required
  • Analyzing custom proprietary frameworks

Installation

# pip
pip install semgrep

# Homebrew
brew install semgrep

# Docker
docker run --rm -v "${PWD}:/src" returntocorp/semgrep semgrep --config auto /src

Core Workflow

1. Quick Scan

semgrep --config auto .                    # Auto-detect rules
semgrep --config auto --metrics=off .      # Disable telemetry

2. Use Rulesets

semgrep --config p/<RULESET> .             # Single ruleset
semgrep --config p/security-audit --config p/trailofbits .  # Multiple
Ruleset Description
p/default General security and code quality
p/security-audit Comprehensive security rules
p/owasp-top-ten OWASP Top 10 vulnerabilities
p/cwe-top-25 CWE Top 25 vulnerabilities
p/trailofbits Trail of Bits security rules
p/python Python-specific
p/javascript JavaScript-specific
p/rust Rust-specific

3. Output Formats

semgrep --config p/security-audit --sarif -o results.sarif .  # SARIF
semgrep --config p/security-audit --json -o results.json .    # JSON
semgrep --config p/security-audit --dataflow-traces .         # Show data flow

Writing Custom Rules

Basic Structure

rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

Pattern Syntax

Syntax Description Example
... Match anything func(...)
$VAR Capture metavariable $FUNC($INPUT)
<... ...> Deep expression match <... user_input ...>

Pattern Operators

Operator Description
pattern Match exact pattern
patterns All must match (AND)
pattern-either Any matches (OR)
pattern-not Exclude matches
pattern-inside Match only inside context
pattern-not-inside Match only outside context
metavariable-regex Regex on captured value

Combining Patterns

rules:
  - id: sql-injection
    languages: [python]
    message: "Potential SQL injection"
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: cursor.execute($QUERY)
          - pattern: db.execute($QUERY)
      - pattern-not:
          - pattern: cursor.execute("...", (...))
      - metavariable-regex:
          metavariable: $QUERY
          regex: .*\+.*|.*\.format\(.*|.*%.*

Taint Mode (Data Flow)

Taint mode tracks data through assignments and transformations:

rules:
  - id: command-injection
    languages: [python]
    message: "User input flows to command execution"
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: os.system($SINK)
      - pattern: subprocess.call($SINK, shell=True)
      - pattern: subprocess.run($SINK, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)
      - pattern: int(...)

CI/CD Integration (GitHub Actions)

name: Semgrep
on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'  # Monthly

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

Configuration

.semgrepignore

tests/
fixtures/
**/testdata/
generated/
vendor/
node_modules/

Suppress False Positives

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()         # nosemgrep

Rationalizations to Reject

Shortcut Why It's Wrong
"Semgrep found nothing, code is clean" Semgrep is pattern-based; can't track complex data flow
"I wrote a rule, so we're covered" Rules need testing; false negatives are silent
"Too many findings = noisy tool" High finding count often means real problems

Resources

Attribution

Based on trailofbits/skills semgrep skill.

Weekly Installs
3
Repository
5dlabs/cto
First Seen
Jan 24, 2026
Installed on
claude-code2
windsurf1
trae1
opencode1
codex1
antigravity1