semgrep
SKILL.md
Semgrep Static Analysis
When to Use Semgrep
Ideal scenarios:
- Quick security scans (minutes, not hours)
- Pattern-based vulnerability detection
- Enforcing coding standards and best practices
- Finding known vulnerability patterns (OWASP Top 10, CWE Top 25)
- Intra-file taint analysis and data flow tracking
- Custom rule development for specific code patterns
- First-pass security analysis before deeper tools
- CI/CD security gates for fast feedback
- Multi-language security scanning
Complements other tools:
- Use before manual code review to catch common patterns
- Combine with SARIF Issue Reporter for detailed findings
- Use alongside CodeQL for comprehensive coverage
- Pair with dependency scanners (OSV-Scanner, Depscan)
Consider CodeQL instead when:
- Need interprocedural taint tracking across files
- Complex data flow analysis across modules required
- Analyzing custom proprietary frameworks with deep integration
When NOT to Use
Do NOT use this skill for:
- Complex interprocedural data flow analysis (use CodeQL instead)
- Binary analysis or compiled code without source
- Custom deep semantic analysis requiring AST/CFG traversal
- Tracking taint across many function boundaries and files
- Secrets detection (use Gitleaks)
- Dependency vulnerability scanning (use OSV-Scanner or Depscan)
- IaC security analysis (use KICS)
- API endpoint discovery (use Noir)
Installation
# pip
python3 -m pip install semgrep
# pipx (recommended)
pipx install semgrep
# Homebrew
brew install semgrep
# Docker
docker pull returntocorp/semgrep:latest
docker run --rm -v "${PWD}:/src" returntocorp/semgrep semgrep --config auto /src
# Update
pip install --upgrade semgrep
# Verify
semgrep --version
Core Workflow
1. Quick Scan
semgrep --config auto . # Auto-detect rules
semgrep --config auto --metrics=off . # Disable telemetry for proprietary code
2. Use Rulesets
semgrep --config p/<RULESET> . # Single ruleset
semgrep --config p/security-audit --config p/trailofbits . # Multiple
| Ruleset | Description |
|---|---|
p/default |
General security and code quality |
p/security-audit |
Comprehensive security rules |
p/owasp-top-ten |
OWASP Top 10 vulnerabilities |
p/cwe-top-25 |
CWE Top 25 vulnerabilities |
p/r2c-security-audit |
r2c security audit rules |
p/trailofbits |
Trail of Bits security rules |
p/python |
Python-specific |
p/javascript |
JavaScript-specific |
p/golang |
Go-specific |
3. Output Formats
# SARIF output (for CI/CD)
semgrep --config p/security-audit --sarif -o results.sarif .
# JSON output
semgrep --config p/security-audit --json -o results.json .
# Text output with dataflow traces
semgrep --config p/security-audit --dataflow-traces .
# JUnit XML
semgrep --config p/security-audit --junit-xml -o results.xml .
# GitLab SAST format
semgrep --config p/security-audit --gitlab-sast -o gl-sast-report.json .
# Vim quickfix
semgrep --config p/security-audit --vim .
4. Scan Specific Paths
# Single file
semgrep --config p/python app.py
# Specific directory
semgrep --config p/javascript src/
# Include tests (excluded by default)
semgrep --config auto --include='**/test/**' .
# Exclude paths
semgrep --config auto --exclude='vendor' --exclude='node_modules' .
# Multiple languages
semgrep --config p/python --config p/javascript .
5. Advanced Features
# Enable Pro Engine features (requires license)
semgrep --config p/security-audit --pro .
# Pro Engine interfile analysis
semgrep --config p/security-audit --pro --pro-intrafile .
# Disable telemetry
semgrep --config auto --metrics=off .
# Verbose output
semgrep --config p/security-audit --verbose .
# Quiet mode (only show findings)
semgrep --config p/security-audit --quiet .
Writing Custom Rules
Basic Structure
rules:
- id: hardcoded-password
languages: [python]
message: "Hardcoded password detected: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"
Pattern Syntax
| Syntax | Description | Example |
|---|---|---|
... |
Match anything | func(...) |
$VAR |
Capture metavariable | $FUNC($INPUT) |
<... ...> |
Deep expression match | <... user_input ...> |
Pattern Operators
| Operator | Description |
|---|---|
pattern |
Match exact pattern |
patterns |
All must match (AND) |
pattern-either |
Any matches (OR) |
pattern-not |
Exclude matches |
pattern-inside |
Match only inside context |
pattern-not-inside |
Match only outside context |
pattern-regex |
Regex matching |
metavariable-regex |
Regex on captured value |
metavariable-comparison |
Compare values |
Combining Patterns
rules:
- id: sql-injection
languages: [python]
message: "Potential SQL injection"
severity: ERROR
patterns:
- pattern-either:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
- pattern-not:
- pattern: cursor.execute("...", (...))
- metavariable-regex:
metavariable: $QUERY
regex: .*\+.*|.*\.format\(.*|.*%.*
Taint Mode (Data Flow)
Simple pattern matching finds obvious cases:
# Pattern `os.system($CMD)` catches this:
os.system(user_input) # Found
But misses indirect flows:
# Same pattern misses this:
cmd = user_input
processed = cmd.strip()
os.system(processed) # Missed - no direct match
Taint mode tracks data through assignments and transformations:
- Source: Where untrusted data enters (
user_input) - Propagators: How it flows (
cmd = ...,processed = ...) - Sanitizers: What makes it safe (
shlex.quote()) - Sink: Where it becomes dangerous (
os.system())
rules:
- id: command-injection
languages: [python]
message: "User input flows to command execution"
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
- pattern: request.json
pattern-sinks:
- pattern: os.system($SINK)
- pattern: subprocess.call($SINK, shell=True)
- pattern: subprocess.run($SINK, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)
- pattern: int(...)
Full Rule with Metadata
rules:
- id: flask-sql-injection
languages: [python]
message: "SQL injection: user input flows to query without parameterization"
severity: ERROR
metadata:
cwe: "CWE-89: SQL Injection"
owasp: "A03:2021 - Injection"
confidence: HIGH
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
- pattern: request.json
pattern-sinks:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
pattern-sanitizers:
- pattern: int(...)
fix: cursor.execute($QUERY, (params,))
Testing Rules
Test File Format
# test_rule.py
def test_vulnerable():
user_input = request.args.get("id")
# ruleid: flask-sql-injection
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: flask-sql-injection
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
semgrep --test rules/
CI/CD Integration (GitHub Actions)
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *' # Monthly
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for diff-aware scanning
- name: Run Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbits
Configuration
.semgrepignore
tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/
Suppress False Positives
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgrep
Performance
semgrep --config rules/ --time . # Check rule performance
ulimit -n 4096 # Increase file descriptors for large codebases
Path Filtering in Rules
rules:
- id: my-rule
paths:
include: [src/]
exclude: [src/generated/]
Common Use Cases
1. Comprehensive Security Audit
# Multi-ruleset scan with SARIF output
semgrep scan \
--config p/security-audit \
--config p/owasp-top-ten \
--config p/cwe-top-25 \
--sarif -o security-audit.sarif \
.
2. Language-Specific Deep Scan
# Python with taint mode
semgrep scan \
--config p/python \
--config p/flask \
--config p/django \
--dataflow-traces \
--sarif -o python-security.sarif \
./backend
# JavaScript/TypeScript
semgrep scan \
--config p/javascript \
--config p/typescript \
--config p/react \
--sarif -o js-security.sarif \
./frontend
3. Custom Rules with Existing Rulesets
# Combine custom and community rules
semgrep scan \
--config ./custom-rules \
--config p/security-audit \
--sarif -o combined-scan.sarif \
.
4. CI/CD Diff Scanning
# Scan only changed files (PR context)
git diff --name-only origin/main...HEAD | \
xargs semgrep scan --config p/security-audit --sarif -o diff-scan.sarif
Understanding Output
SARIF Structure
Semgrep SARIF v2.1.0 includes:
- Rules: Each Semgrep rule with metadata
- Results: Specific code locations matching patterns
- Properties:
- Severity: ERROR, WARNING, INFO
- CWE and OWASP mappings
- Confidence levels
- Fix suggestions (if available)
- Dataflow traces (if enabled)
Result Categories
| Severity | Meaning |
|---|---|
| ERROR | High-confidence security vulnerability |
| WARNING | Potential security issue requiring review |
| INFO | Code smell or best practice violation |
Autofix
# Show available fixes
semgrep scan --config p/security-audit --autofix --dryrun .
# Apply fixes automatically
semgrep scan --config p/security-audit --autofix .
# Review fixes before applying
semgrep scan --config p/security-audit --autofix --dryrun . | less
Third-Party Rules
# Trail of Bits rules
git clone https://github.com/trailofbits/semgrep-rules.git
semgrep scan -f semgrep-rules/rules --sarif -o results.sarif .
# Semgrep Registry
semgrep scan --config "r/trailofbits" .
# Custom remote rules
semgrep scan --config https://example.com/custom-rules.yaml .
Advanced Rule Development
Using Metavariable Propagation
rules:
- id: context-aware-xss
languages: [javascript]
message: "XSS: User input flows to innerHTML"
severity: ERROR
mode: taint
pattern-sources:
- pattern: req.query.$PARAM
pattern-propagators:
- pattern: $X.toString()
from: $X
to: $X.toString()
- pattern: `${$X}`
from: $X
to: `${$X}`
pattern-sinks:
- pattern: $ELEMENT.innerHTML = $DATA
pattern-sanitizers:
- pattern: DOMPurify.sanitize($X)
Focus Metavariables
rules:
- id: sql-injection-advanced
languages: [python]
message: "SQL injection via string formatting"
severity: ERROR
pattern: |
$CURSOR.execute($QUERY)
focus-metavariable: $QUERY
metavariable-regex:
metavariable: $QUERY
regex: .*(\+|format|%).*
Performance Optimization
# Limit to specific file types
semgrep scan --include='*.py' --include='*.js' .
# Increase timeout for large files
semgrep scan --timeout 60 .
# Use baseline for faster incremental scans
semgrep scan --baseline-commit HEAD~1 .
# Parallel processing (default uses all CPUs)
semgrep scan --jobs 4 .
# Disable expensive rules
semgrep scan --config p/security-audit --exclude-rule 'expensive-rule-id' .
Supported Languages
Semgrep supports 30+ languages:
- Web: JavaScript, TypeScript, JSX, TSX, HTML
- Backend: Python, Go, Java, Kotlin, Scala, C#
- Systems: C, C++, Rust
- Mobile: Swift, Kotlin, Java, Objective-C
- Scripting: Ruby, PHP, Bash, Lua, Perl
- Infrastructure: Terraform, Dockerfile, YAML, JSON
- Data: SQL (generic)
- Other: Elixir, Clojure, Solidity, Apex, R
Semgrep Pro vs Community Edition
| Feature | Community | Pro |
|---|---|---|
| Pattern matching | ✓ | ✓ |
| Intra-file taint | ✓ | ✓ |
| Custom rules | ✓ | ✓ |
| SARIF output | ✓ | ✓ |
| Cross-file analysis | ✗ | ✓ |
| Interfile taint | ✗ | ✓ |
| Supply chain | ✗ | ✓ |
| Secrets detection | ✗ | ✓ |
| Assistant (AI) | ✗ | ✓ |
Troubleshooting
Common Issues
# Rule parsing errors
semgrep scan --validate --config custom-rules.yaml
# Timeout on large files
semgrep scan --timeout 120 .
# Memory issues
semgrep scan --max-memory 4000 . # MB
# Debug mode
semgrep scan --debug --config p/security-audit .
Rule Testing
# Test rules against test files
semgrep scan --test rules/
# Validate rule syntax
semgrep scan --validate --config rules/my-rule.yaml
# Benchmark rules
semgrep scan --time --config rules/ test-codebase/
Limitations
- Cross-file limited: Intra-file taint only in Community Edition
- Pattern-based: Can't understand complex business logic
- Performance: Large codebases with many rules can be slow
- False positives: Regex patterns may over-match
- Language gaps: Some languages have limited rule coverage
Rationalizations to Reject
| Shortcut | Why It's Wrong |
|---|---|
| "Semgrep found nothing, code is clean" | Semgrep is pattern-based; it can't track complex data flow across functions |
| "I wrote a rule, so we're covered" | Rules need testing with semgrep --test; false negatives are silent |
| "Taint mode catches injection" | Only if you defined all sources, sinks, AND sanitizers correctly |
| "Pro rules are comprehensive" | Pro rules are good but not exhaustive; supplement with custom rules for your codebase |
| "Too many findings = noisy tool" | High finding count often means real problems; tune rules, don't disable them |
References
- Registry: https://semgrep.dev/explore
- Playground: https://semgrep.dev/playground
- Documentation: https://semgrep.dev/docs/
- Rule Examples: https://semgrep.dev/docs/writing-rules/rule-ideas
- Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
- Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules
- OWASP Rules: https://semgrep.dev/p/owasp-top-ten
- Blog: https://semgrep.dev/blog/
- GitHub Action: https://github.com/returntocorp/semgrep-action
- SARIF Spec: https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html
- Initial Source: Trail of Bits skills
Weekly Installs
2
Repository
igbuend/grimbardGitHub Stars
4
First Seen
Feb 19, 2026
Security Audits
Installed on
openclaw2
claude-code2
replit2
codex2
kiro-cli2
kimi-cli2