Semgrep Rule Creator

Security Notice

AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:

Custom security rule development for owned codebases
Coding standard enforcement via automated checks
CI/CD security gate rule authoring
Vulnerability pattern codification for prevention
Educational purposes in controlled environments

NEVER use for:

Creating rules to bypass security controls
Scanning systems without authorization
Any illegal activities

Step 1: Define the Detection Goal

Before writing a rule, clearly define:

What to detect: The vulnerable or undesired code pattern
Why it matters: The security impact or quality concern
What languages: Which programming languages to target
True positive example: Code that SHOULD match
True negative example: Code that should NOT match (safe alternative)
False positive risks: What similar-looking code is actually safe

Detection Goal Template

## Rule: [rule-id]

- **Detect**: [description of what to find]
- **Why**: [security impact / quality concern]
- **Languages**: [javascript, typescript, python, etc.]
- **CWE**: [CWE-XXX]
- **OWASP**: [A0X category]
- **True Positive**: [code example that should match]
- **True Negative**: [safe code that should NOT match]

Step 2: Write the Semgrep Rule

Basic Rule Structure

rules:
  - id: rule-id-here
    message: >
      Clear description of what was found and why it matters.
      Include remediation guidance in the message.
    severity: ERROR # ERROR, WARNING, INFO
    languages: [javascript, typescript]
    metadata:
      cwe:
        - CWE-089
      owasp:
        - A03:2021
      confidence: HIGH # HIGH, MEDIUM, LOW
      impact: HIGH # HIGH, MEDIUM, LOW
      category: security
      subcategory:
        - vuln
      technology:
        - express
        - node.js
      references:
        - https://owasp.org/Top10/A03_2021-Injection/
      source-rule-url: https://semgrep.dev/r/rule-id
    # Pattern goes here (see below)

Pattern Types

Simple Pattern Match

pattern: |
  eval($X)

Pattern with Alternatives (OR)

pattern-either:
  - pattern: eval($X)
  - pattern: new Function($X)
  - pattern: setTimeout($X, ...)
  - pattern: setInterval($X, ...)

Pattern with Exclusions (AND NOT)

patterns:
  - pattern: $DB.query($QUERY)
  - pattern-not: $DB.query($QUERY, $PARAMS)
  - pattern-not: $DB.query($QUERY, [...])

Pattern Inside Context

patterns:
  - pattern: $RES.send($DATA)
  - pattern-inside: |
      app.$METHOD($PATH, function($REQ, $RES) {
        ...
      })
  - pattern-not-inside: |
      app.$METHOD($PATH, authenticate, function($REQ, $RES) {
        ...
      })

Metavariable Constraints

patterns:
  - pattern: crypto.createHash($ALGO)
  - metavariable-regex:
      metavariable: $ALGO
      regex: (md5|sha1|MD5|SHA1)
  - focus-metavariable: $ALGO

patterns:
  - pattern: setTimeout($FUNC, $TIME)
  - metavariable-comparison:
      metavariable: $TIME
      comparison: $TIME > 60000

Taint Mode Rules (Advanced)

For tracking data flow from sources to sinks:

mode: taint
pattern-sources:
  - patterns:
      - pattern: $REQ.query.$PARAM
  - patterns:
      - pattern: $REQ.body.$PARAM
  - patterns:
      - pattern: $REQ.params.$PARAM
pattern-sinks:
  - patterns:
      - pattern: $DB.query($SINK, ...)
      - focus-metavariable: $SINK
pattern-sanitizers:
  - patterns:
      - pattern: escape($X)
  - patterns:
      - pattern: sanitize($X)
  - patterns:
      - pattern: $DB.query($QUERY, [...])

Step 3: Common Rule Templates

SQL Injection Detection

rules:
  - id: sql-injection-string-concat
    message: >
      Possible SQL injection via string concatenation. User input appears
      to be concatenated into a SQL query string. Use parameterized
      queries instead.
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-089]
      owasp: [A03:2021]
      confidence: HIGH
      impact: HIGH
      category: security
    patterns:
      - pattern-either:
          - pattern: $DB.query("..." + $VAR + "...")
          - pattern: $DB.query(`...${$VAR}...`)
      - pattern-not: $DB.query("..." + $VAR + "...", [...])
    fix: |
      $DB.query("... $1 ...", [$VAR])

XSS Detection

rules:
  - id: xss-innerhtml-assignment
    message: >
      Direct assignment to innerHTML with potentially untrusted data.
      Use textContent for text or a sanitization library for HTML.
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-079]
      owasp: [A03:2021]
      confidence: MEDIUM
      impact: HIGH
      category: security
    pattern-either:
      - pattern: $EL.innerHTML = $DATA
      - pattern: document.getElementById($ID).innerHTML = $DATA

Hardcoded Secrets

rules:
  - id: hardcoded-api-key
    message: >
      Hardcoded API key detected. Store secrets in environment
      variables or a secrets manager.
    severity: ERROR
    languages: [javascript, typescript, python]
    metadata:
      cwe: [CWE-798]
      owasp: [A02:2021]
      confidence: MEDIUM
      impact: HIGH
      category: security
    pattern-either:
      - pattern: |
          $KEY = "AKIA..."
      - pattern: |
          $KEY = "sk-..."
      - pattern: |
          $KEY = "ghp_..."
    pattern-regex: (AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{48}|ghp_[a-zA-Z0-9]{36})

Missing Authentication

rules:
  - id: express-route-missing-auth
    message: >
      Express route handler without authentication middleware.
      Add authentication middleware before the handler.
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-306]
      owasp: [A07:2021]
      confidence: MEDIUM
      impact: HIGH
      category: security
    patterns:
      - pattern-either:
          - pattern: app.post($PATH, function($REQ, $RES) { ... })
          - pattern: app.put($PATH, function($REQ, $RES) { ... })
          - pattern: app.delete($PATH, function($REQ, $RES) { ... })
          - pattern: router.post($PATH, function($REQ, $RES) { ... })
          - pattern: router.put($PATH, function($REQ, $RES) { ... })
          - pattern: router.delete($PATH, function($REQ, $RES) { ... })
      - pattern-not-inside: |
          app.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
      - pattern-not-inside: |
          router.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })

Insecure Randomness

rules:
  - id: insecure-random-for-security
    message: >
      Math.random() is not cryptographically secure. Use
      crypto.getRandomValues() or crypto.randomBytes() for
      security-sensitive random values.
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-330]
      confidence: MEDIUM
      impact: MEDIUM
      category: security
    patterns:
      - pattern: Math.random()
      - pattern-inside: |
          function $FUNC(...) {
            ...
          }
      - metavariable-regex:
          metavariable: $FUNC
          regex: (generateToken|createSecret|randomPassword|generateKey|createSession|generateId|createNonce)

Step 4: Write Rule Tests

Test File Format

Create a test file alongside the rule:

// ruleid: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = ' + userId);

// ruleid: sql-injection-string-concat
db.query(`SELECT * FROM users WHERE id = ${userId}`);

// ok: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = $1', [userId]);

// ok: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = ?', [userId]);

Running Tests

# Test a single rule
semgrep --test --config=rules/sql-injection.yml tests/

# Test all rules
semgrep --test --config=rules/ tests/

# Validate rule syntax
semgrep --validate --config=rules/

Step 5: Rule Optimization

Performance Best Practices

Be specific with patterns: Avoid overly broad matches like $X($Y)
Use pattern-inside to scope: Narrow the search context
Use language-specific syntax: Leverage language features
Avoid deep ellipsis nesting: ... ... ... is slow
Use focus-metavariable: Narrow the reported location
Test with large codebases: Verify performance at scale

Reducing False Positives

Add pattern-not for safe patterns: Exclude known-safe alternatives
Use metavariable-regex: Constrain metavariable values
Use pattern-not-inside: Exclude safe contexts
Set appropriate confidence: Be honest about detection certainty
Add technology metadata: Help users filter relevant rules
Provide fix suggestions: When possible, include fix: field

Rule Validation Checklist

Semgrep Pattern Syntax Reference

Syntax	Meaning	Example
`$X`	Single metavariable	`eval($X)`
`$...X`	Multiple metavariable args	`func($...ARGS)`
`...`	Ellipsis (any statements)	`if (...) { ... }`
`<... $X ...>`	Deep expression match	`<... eval($X) ...>`
`pattern-either`	OR operator	Match any of N patterns
`pattern-not`	NOT operator	Exclude specific patterns
`pattern-inside`	Context requirement	Must be inside this pattern
`pattern-not-inside`	Context exclusion	Must NOT be inside this
`metavariable-regex`	Regex constraint	Constrain $X to match regex
`metavariable-comparison`	Numeric constraint	`$X > 100`
`focus-metavariable`	Narrow match location	Report only $X location

Related Skills

static-analysis - CodeQL and Semgrep with SARIF output
variant-analysis - Pattern-based vulnerability discovery
differential-review - Security-focused diff analysis
insecure-defaults - Hardcoded credentials detection
security-architect - STRIDE threat modeling

Agent Integration

security-architect (primary): Custom rule development for security audits
code-reviewer (primary): Automated code review rule authoring
penetration-tester (secondary): Vulnerability detection rule creation
qa (secondary): Quality enforcement rule authoring

Iron Laws

NEVER publish a rule without at least 2 true positive and 2 true negative test cases
ALWAYS validate rule syntax with semgrep --validate before committing
NEVER set confidence to HIGH without testing the rule against a real codebase
ALWAYS include WHAT was found, WHY it matters, and HOW to fix it in every rule message
NEVER use pattern-regex as the primary matcher — use structural patterns and constrain with metavariable-regex

Anti-Patterns

Anti-Pattern	Why It Fails	Correct Approach
Publishing untested rules	False positives erode developer trust and rules get ignored	Write test cases with `// ruleid:` and `// ok:` annotations and run `semgrep --test`
Setting HIGH confidence without validation	Overconfident rules mislead reviewers into trusting bad signal	Calibrate confidence based on measured false positive rate on real codebases
Vague rule messages	Developers cannot remediate without specific guidance	Include WHAT was found, WHY it matters, and HOW to fix it in every message
Overly broad patterns with no exclusions	High false positive rate causes rule fatigue	Add `pattern-not` clauses for all known-safe alternatives
Using `pattern-regex` as primary matcher	Regex is slower and less precise than structural pattern matching	Use structural patterns as primary; constrain with `metavariable-regex` only

Memory Protocol (MANDATORY)

Before starting: Read .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.

Cross-Reference: Creator Ecosystem

This skill is part of the Creator Ecosystem. When research uncovers gaps, trigger the appropriate companion creator:

Gap Discovered	Required Artifact	Creator to Invoke	When
Domain knowledge needs a reusable skill	skill	`Skill({ skill: 'skill-creator' })`	Gap is a full skill domain
Existing skill has incomplete coverage	skill update	`Skill({ skill: 'skill-updater' })`	Close skill exists but incomplete
Capability needs a dedicated agent	agent	`Skill({ skill: 'agent-creator' })`	Agent to own the capability
Existing agent needs capability update	agent update	`Skill({ skill: 'agent-updater' })`	Close agent exists but incomplete
Domain needs code/project scaffolding	template	`Skill({ skill: 'template-creator' })`	Reusable code patterns needed
Behavior needs pre/post execution guards	hook	`Skill({ skill: 'hook-creator' })`	Enforcement behavior required
Process needs multi-phase orchestration	workflow	`Skill({ skill: 'workflow-creator' })`	Multi-step coordination needed
Artifact needs structured I/O validation	schema	`Skill({ skill: 'schema-creator' })`	JSON schema for artifact I/O
User interaction needs a slash command	command	`Skill({ skill: 'command-creator' })`	User-facing shortcut needed
Repeated logic needs a reusable CLI tool	tool	`Skill({ skill: 'tool-creator' })`	CLI utility needed
Narrow/single-artifact capability only	inline	Document within this artifact only	Too specific to generalize

Ecosystem Alignment Contract (MANDATORY)

This creator skill is part of a coordinated creator ecosystem. Any artifact created here must align with and validate against related creators:

agent-creator for ownership and execution paths
skill-creator for capability packaging and assignment
tool-creator for executable automation surfaces
hook-creator for enforcement and guardrails
rule-creator and semgrep-rule-creator for policy and static checks
template-creator for standardized scaffolds
workflow-creator for orchestration and phase gating
command-creator for user/operator command UX

Cross-Creator Handshake (Required)

Before completion, verify all relevant handshakes:

Artifact route exists in .claude/CLAUDE.md and related routing docs.
Discovery/registry entries are updated (catalog/index/registry as applicable).
Companion artifacts are created or explicitly waived with reason.
validate-integration.cjs passes for the created artifact.
Skill index is regenerated when skill metadata changes.

Research Gate (Exa + arXiv — BOTH MANDATORY)

For new patterns, templates, or workflows, research is mandatory:

Use Exa for implementation and ecosystem patterns:
- mcp__Exa__web_search_exa({ query: '<topic> 2025 best practices' })
- mcp__Exa__get_code_context_exa({ query: '<topic> implementation examples' })
Search arXiv for academic research (mandatory for AI/ML, agents, evaluation, orchestration, memory/RAG, security):
- Via Exa: mcp__Exa__web_search_exa({ query: 'site:arxiv.org <topic> 2024 2025' })
- Direct API: WebFetch({ url: 'https://arxiv.org/search/?query=<topic>&searchtype=all&start=0' })
Record decisions, constraints, and non-goals in artifact references/docs.
Keep updates minimal and avoid overengineering.

arXiv is mandatory (not fallback) when topic involves: AI agents, LLM evaluation, orchestration, memory/RAG, security, static analysis, or any emerging methodology.

Regression-Safe Delivery

Follow strict RED -> GREEN -> REFACTOR for behavior changes.
Run targeted tests for changed modules.
Run lint/format on changed files.
Keep commits scoped by concern (logic/docs/generated artifacts).

semgrep-rule-creator