semgrep-rule-creator
Semgrep Rule Creator
Security Notice
AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:
- Custom security rule development for owned codebases
- Coding standard enforcement via automated checks
- CI/CD security gate rule authoring
- Vulnerability pattern codification for prevention
- Educational purposes in controlled environments
NEVER use for:
- Creating rules to bypass security controls
- Scanning systems without authorization
- Any illegal activities
Step 1: Define the Detection Goal
Before writing a rule, clearly define:
- What to detect: The vulnerable or undesired code pattern
- Why it matters: The security impact or quality concern
- What languages: Which programming languages to target
- True positive example: Code that SHOULD match
- True negative example: Code that should NOT match (safe alternative)
- False positive risks: What similar-looking code is actually safe
Detection Goal Template
## Rule: [rule-id]
- **Detect**: [description of what to find]
- **Why**: [security impact / quality concern]
- **Languages**: [javascript, typescript, python, etc.]
- **CWE**: [CWE-XXX]
- **OWASP**: [A0X category]
- **True Positive**: [code example that should match]
- **True Negative**: [safe code that should NOT match]
Step 2: Write the Semgrep Rule
Basic Rule Structure
rules:
- id: rule-id-here
message: >
Clear description of what was found and why it matters.
Include remediation guidance in the message.
severity: ERROR # ERROR, WARNING, INFO
languages: [javascript, typescript]
metadata:
cwe:
- CWE-089
owasp:
- A03:2021
confidence: HIGH # HIGH, MEDIUM, LOW
impact: HIGH # HIGH, MEDIUM, LOW
category: security
subcategory:
- vuln
technology:
- express
- node.js
references:
- https://owasp.org/Top10/A03_2021-Injection/
source-rule-url: https://semgrep.dev/r/rule-id
# Pattern goes here (see below)
Pattern Types
Simple Pattern Match
pattern: |
eval($X)
Pattern with Alternatives (OR)
pattern-either:
- pattern: eval($X)
- pattern: new Function($X)
- pattern: setTimeout($X, ...)
- pattern: setInterval($X, ...)
Pattern with Exclusions (AND NOT)
patterns:
- pattern: $DB.query($QUERY)
- pattern-not: $DB.query($QUERY, $PARAMS)
- pattern-not: $DB.query($QUERY, [...])
Pattern Inside Context
patterns:
- pattern: $RES.send($DATA)
- pattern-inside: |
app.$METHOD($PATH, function($REQ, $RES) {
...
})
- pattern-not-inside: |
app.$METHOD($PATH, authenticate, function($REQ, $RES) {
...
})
Metavariable Constraints
patterns:
- pattern: crypto.createHash($ALGO)
- metavariable-regex:
metavariable: $ALGO
regex: (md5|sha1|MD5|SHA1)
- focus-metavariable: $ALGO
patterns:
- pattern: setTimeout($FUNC, $TIME)
- metavariable-comparison:
metavariable: $TIME
comparison: $TIME > 60000
Taint Mode Rules (Advanced)
For tracking data flow from sources to sinks:
mode: taint
pattern-sources:
- patterns:
- pattern: $REQ.query.$PARAM
- patterns:
- pattern: $REQ.body.$PARAM
- patterns:
- pattern: $REQ.params.$PARAM
pattern-sinks:
- patterns:
- pattern: $DB.query($SINK, ...)
- focus-metavariable: $SINK
pattern-sanitizers:
- patterns:
- pattern: escape($X)
- patterns:
- pattern: sanitize($X)
- patterns:
- pattern: $DB.query($QUERY, [...])
Step 3: Common Rule Templates
SQL Injection Detection
rules:
- id: sql-injection-string-concat
message: >
Possible SQL injection via string concatenation. User input appears
to be concatenated into a SQL query string. Use parameterized
queries instead.
severity: ERROR
languages: [javascript, typescript]
metadata:
cwe: [CWE-089]
owasp: [A03:2021]
confidence: HIGH
impact: HIGH
category: security
patterns:
- pattern-either:
- pattern: $DB.query("..." + $VAR + "...")
- pattern: $DB.query(`...${$VAR}...`)
- pattern-not: $DB.query("..." + $VAR + "...", [...])
fix: |
$DB.query("... $1 ...", [$VAR])
XSS Detection
rules:
- id: xss-innerhtml-assignment
message: >
Direct assignment to innerHTML with potentially untrusted data.
Use textContent for text or a sanitization library for HTML.
severity: ERROR
languages: [javascript, typescript]
metadata:
cwe: [CWE-079]
owasp: [A03:2021]
confidence: MEDIUM
impact: HIGH
category: security
pattern-either:
- pattern: $EL.innerHTML = $DATA
- pattern: document.getElementById($ID).innerHTML = $DATA
Hardcoded Secrets
rules:
- id: hardcoded-api-key
message: >
Hardcoded API key detected. Store secrets in environment
variables or a secrets manager.
severity: ERROR
languages: [javascript, typescript, python]
metadata:
cwe: [CWE-798]
owasp: [A02:2021]
confidence: MEDIUM
impact: HIGH
category: security
pattern-either:
- pattern: |
$KEY = "AKIA..."
- pattern: |
$KEY = "sk-..."
- pattern: |
$KEY = "ghp_..."
pattern-regex: (AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{48}|ghp_[a-zA-Z0-9]{36})
Missing Authentication
rules:
- id: express-route-missing-auth
message: >
Express route handler without authentication middleware.
Add authentication middleware before the handler.
severity: WARNING
languages: [javascript, typescript]
metadata:
cwe: [CWE-306]
owasp: [A07:2021]
confidence: MEDIUM
impact: HIGH
category: security
patterns:
- pattern-either:
- pattern: app.post($PATH, function($REQ, $RES) { ... })
- pattern: app.put($PATH, function($REQ, $RES) { ... })
- pattern: app.delete($PATH, function($REQ, $RES) { ... })
- pattern: router.post($PATH, function($REQ, $RES) { ... })
- pattern: router.put($PATH, function($REQ, $RES) { ... })
- pattern: router.delete($PATH, function($REQ, $RES) { ... })
- pattern-not-inside: |
app.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
- pattern-not-inside: |
router.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
Insecure Randomness
rules:
- id: insecure-random-for-security
message: >
Math.random() is not cryptographically secure. Use
crypto.getRandomValues() or crypto.randomBytes() for
security-sensitive random values.
severity: WARNING
languages: [javascript, typescript]
metadata:
cwe: [CWE-330]
confidence: MEDIUM
impact: MEDIUM
category: security
patterns:
- pattern: Math.random()
- pattern-inside: |
function $FUNC(...) {
...
}
- metavariable-regex:
metavariable: $FUNC
regex: (generateToken|createSecret|randomPassword|generateKey|createSession|generateId|createNonce)
Step 4: Write Rule Tests
Test File Format
Create a test file alongside the rule:
// ruleid: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = ' + userId);
// ruleid: sql-injection-string-concat
db.query(`SELECT * FROM users WHERE id = ${userId}`);
// ok: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = $1', [userId]);
// ok: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = ?', [userId]);
Running Tests
# Test a single rule
semgrep --test --config=rules/sql-injection.yml tests/
# Test all rules
semgrep --test --config=rules/ tests/
# Validate rule syntax
semgrep --validate --config=rules/
Step 5: Rule Optimization
Performance Best Practices
- Be specific with patterns: Avoid overly broad matches like
$X($Y) - Use pattern-inside to scope: Narrow the search context
- Use language-specific syntax: Leverage language features
- Avoid deep ellipsis nesting:
... ... ...is slow - Use focus-metavariable: Narrow the reported location
- Test with large codebases: Verify performance at scale
Reducing False Positives
- Add pattern-not for safe patterns: Exclude known-safe alternatives
- Use metavariable-regex: Constrain metavariable values
- Use pattern-not-inside: Exclude safe contexts
- Set appropriate confidence: Be honest about detection certainty
- Add technology metadata: Help users filter relevant rules
- Provide fix suggestions: When possible, include
fix:field
Rule Validation Checklist
- Rule has unique, descriptive ID
- Message explains the issue AND remediation
- Severity matches actual risk
- Metadata includes CWE, OWASP, confidence, impact
- At least 2 true positive test cases
- At least 2 true negative test cases
- Rule validated with
semgrep --validate - Rule tested with
semgrep --test - Performance acceptable on large codebase
- Fix suggestion provided (if applicable)
Semgrep Pattern Syntax Reference
| Syntax | Meaning | Example |
|---|---|---|
$X |
Single metavariable | eval($X) |
$...X |
Multiple metavariable args | func($...ARGS) |
... |
Ellipsis (any statements) | if (...) { ... } |
<... $X ...> |
Deep expression match | <... eval($X) ...> |
pattern-either |
OR operator | Match any of N patterns |
pattern-not |
NOT operator | Exclude specific patterns |
pattern-inside |
Context requirement | Must be inside this pattern |
pattern-not-inside |
Context exclusion | Must NOT be inside this |
metavariable-regex |
Regex constraint | Constrain $X to match regex |
metavariable-comparison |
Numeric constraint | $X > 100 |
focus-metavariable |
Narrow match location | Report only $X location |
Related Skills
static-analysis- CodeQL and Semgrep with SARIF outputvariant-analysis- Pattern-based vulnerability discoverydifferential-review- Security-focused diff analysisinsecure-defaults- Hardcoded credentials detectionsecurity-architect- STRIDE threat modeling
Agent Integration
- security-architect (primary): Custom rule development for security audits
- code-reviewer (primary): Automated code review rule authoring
- penetration-tester (secondary): Vulnerability detection rule creation
- qa (secondary): Quality enforcement rule authoring
Iron Laws
- NEVER publish a rule without at least 2 true positive and 2 true negative test cases
- ALWAYS validate rule syntax with
semgrep --validatebefore committing - NEVER set confidence to HIGH without testing the rule against a real codebase
- ALWAYS include WHAT was found, WHY it matters, and HOW to fix it in every rule message
- NEVER use
pattern-regexas the primary matcher — use structural patterns and constrain withmetavariable-regex
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Publishing untested rules | False positives erode developer trust and rules get ignored | Write test cases with // ruleid: and // ok: annotations and run semgrep --test |
| Setting HIGH confidence without validation | Overconfident rules mislead reviewers into trusting bad signal | Calibrate confidence based on measured false positive rate on real codebases |
| Vague rule messages | Developers cannot remediate without specific guidance | Include WHAT was found, WHY it matters, and HOW to fix it in every message |
| Overly broad patterns with no exclusions | High false positive rate causes rule fatigue | Add pattern-not clauses for all known-safe alternatives |
Using pattern-regex as primary matcher |
Regex is slower and less precise than structural pattern matching | Use structural patterns as primary; constrain with metavariable-regex only |
Memory Protocol (MANDATORY)
Before starting:
Read .claude/context/memory/learnings.md
After completing:
- New pattern ->
.claude/context/memory/learnings.md - Issue found ->
.claude/context/memory/issues.md - Decision made ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: If it's not in memory, it didn't happen.
Cross-Reference: Creator Ecosystem
This skill is part of the Creator Ecosystem. When research uncovers gaps, trigger the appropriate companion creator:
| Gap Discovered | Required Artifact | Creator to Invoke | When |
|---|---|---|---|
| Domain knowledge needs a reusable skill | skill | Skill({ skill: 'skill-creator' }) |
Gap is a full skill domain |
| Existing skill has incomplete coverage | skill update | Skill({ skill: 'skill-updater' }) |
Close skill exists but incomplete |
| Capability needs a dedicated agent | agent | Skill({ skill: 'agent-creator' }) |
Agent to own the capability |
| Existing agent needs capability update | agent update | Skill({ skill: 'agent-updater' }) |
Close agent exists but incomplete |
| Domain needs code/project scaffolding | template | Skill({ skill: 'template-creator' }) |
Reusable code patterns needed |
| Behavior needs pre/post execution guards | hook | Skill({ skill: 'hook-creator' }) |
Enforcement behavior required |
| Process needs multi-phase orchestration | workflow | Skill({ skill: 'workflow-creator' }) |
Multi-step coordination needed |
| Artifact needs structured I/O validation | schema | Skill({ skill: 'schema-creator' }) |
JSON schema for artifact I/O |
| User interaction needs a slash command | command | Skill({ skill: 'command-creator' }) |
User-facing shortcut needed |
| Repeated logic needs a reusable CLI tool | tool | Skill({ skill: 'tool-creator' }) |
CLI utility needed |
| Narrow/single-artifact capability only | inline | Document within this artifact only | Too specific to generalize |
Ecosystem Alignment Contract (MANDATORY)
This creator skill is part of a coordinated creator ecosystem. Any artifact created here must align with and validate against related creators:
agent-creatorfor ownership and execution pathsskill-creatorfor capability packaging and assignmenttool-creatorfor executable automation surfaceshook-creatorfor enforcement and guardrailsrule-creatorandsemgrep-rule-creatorfor policy and static checkstemplate-creatorfor standardized scaffoldsworkflow-creatorfor orchestration and phase gatingcommand-creatorfor user/operator command UX
Cross-Creator Handshake (Required)
Before completion, verify all relevant handshakes:
- Artifact route exists in
.claude/CLAUDE.mdand related routing docs. - Discovery/registry entries are updated (catalog/index/registry as applicable).
- Companion artifacts are created or explicitly waived with reason.
validate-integration.cjspasses for the created artifact.- Skill index is regenerated when skill metadata changes.
Research Gate (Exa + arXiv — BOTH MANDATORY)
For new patterns, templates, or workflows, research is mandatory:
- Use Exa for implementation and ecosystem patterns:
mcp__Exa__web_search_exa({ query: '<topic> 2025 best practices' })mcp__Exa__get_code_context_exa({ query: '<topic> implementation examples' })
- Search arXiv for academic research (mandatory for AI/ML, agents, evaluation, orchestration, memory/RAG, security):
- Via Exa:
mcp__Exa__web_search_exa({ query: 'site:arxiv.org <topic> 2024 2025' }) - Direct API:
WebFetch({ url: 'https://arxiv.org/search/?query=<topic>&searchtype=all&start=0' })
- Via Exa:
- Record decisions, constraints, and non-goals in artifact references/docs.
- Keep updates minimal and avoid overengineering.
arXiv is mandatory (not fallback) when topic involves: AI agents, LLM evaluation, orchestration, memory/RAG, security, static analysis, or any emerging methodology.
Regression-Safe Delivery
- Follow strict RED -> GREEN -> REFACTOR for behavior changes.
- Run targeted tests for changed modules.
- Run lint/format on changed files.
- Keep commits scoped by concern (logic/docs/generated artifacts).