skills/oimiragieo/agent-studio/semgrep-rule-creator

semgrep-rule-creator

SKILL.md

Semgrep Rule Creator

Security Notice

AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:

  • Custom security rule development for owned codebases
  • Coding standard enforcement via automated checks
  • CI/CD security gate rule authoring
  • Vulnerability pattern codification for prevention
  • Educational purposes in controlled environments

NEVER use for:

  • Creating rules to bypass security controls
  • Scanning systems without authorization
  • Any illegal activities

Step 1: Define the Detection Goal

Before writing a rule, clearly define:

  1. What to detect: The vulnerable or undesired code pattern
  2. Why it matters: The security impact or quality concern
  3. What languages: Which programming languages to target
  4. True positive example: Code that SHOULD match
  5. True negative example: Code that should NOT match (safe alternative)
  6. False positive risks: What similar-looking code is actually safe

Detection Goal Template

## Rule: [rule-id]

- **Detect**: [description of what to find]
- **Why**: [security impact / quality concern]
- **Languages**: [javascript, typescript, python, etc.]
- **CWE**: [CWE-XXX]
- **OWASP**: [A0X category]
- **True Positive**: [code example that should match]
- **True Negative**: [safe code that should NOT match]

Step 2: Write the Semgrep Rule

Basic Rule Structure

rules:
  - id: rule-id-here
    message: >
      Clear description of what was found and why it matters.
      Include remediation guidance in the message.
    severity: ERROR # ERROR, WARNING, INFO
    languages: [javascript, typescript]
    metadata:
      cwe:
        - CWE-089
      owasp:
        - A03:2021
      confidence: HIGH # HIGH, MEDIUM, LOW
      impact: HIGH # HIGH, MEDIUM, LOW
      category: security
      subcategory:
        - vuln
      technology:
        - express
        - node.js
      references:
        - https://owasp.org/Top10/A03_2021-Injection/
      source-rule-url: https://semgrep.dev/r/rule-id
    # Pattern goes here (see below)

Pattern Types

Simple Pattern Match

pattern: |
  eval($X)

Pattern with Alternatives (OR)

pattern-either:
  - pattern: eval($X)
  - pattern: new Function($X)
  - pattern: setTimeout($X, ...)
  - pattern: setInterval($X, ...)

Pattern with Exclusions (AND NOT)

patterns:
  - pattern: $DB.query($QUERY)
  - pattern-not: $DB.query($QUERY, $PARAMS)
  - pattern-not: $DB.query($QUERY, [...])

Pattern Inside Context

patterns:
  - pattern: $RES.send($DATA)
  - pattern-inside: |
      app.$METHOD($PATH, function($REQ, $RES) {
        ...
      })
  - pattern-not-inside: |
      app.$METHOD($PATH, authenticate, function($REQ, $RES) {
        ...
      })

Metavariable Constraints

patterns:
  - pattern: crypto.createHash($ALGO)
  - metavariable-regex:
      metavariable: $ALGO
      regex: (md5|sha1|MD5|SHA1)
  - focus-metavariable: $ALGO
patterns:
  - pattern: setTimeout($FUNC, $TIME)
  - metavariable-comparison:
      metavariable: $TIME
      comparison: $TIME > 60000

Taint Mode Rules (Advanced)

For tracking data flow from sources to sinks:

mode: taint
pattern-sources:
  - patterns:
      - pattern: $REQ.query.$PARAM
  - patterns:
      - pattern: $REQ.body.$PARAM
  - patterns:
      - pattern: $REQ.params.$PARAM
pattern-sinks:
  - patterns:
      - pattern: $DB.query($SINK, ...)
      - focus-metavariable: $SINK
pattern-sanitizers:
  - patterns:
      - pattern: escape($X)
  - patterns:
      - pattern: sanitize($X)
  - patterns:
      - pattern: $DB.query($QUERY, [...])

Step 3: Common Rule Templates

SQL Injection Detection

rules:
  - id: sql-injection-string-concat
    message: >
      Possible SQL injection via string concatenation. User input appears
      to be concatenated into a SQL query string. Use parameterized
      queries instead.
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-089]
      owasp: [A03:2021]
      confidence: HIGH
      impact: HIGH
      category: security
    patterns:
      - pattern-either:
          - pattern: $DB.query("..." + $VAR + "...")
          - pattern: $DB.query(`...${$VAR}...`)
      - pattern-not: $DB.query("..." + $VAR + "...", [...])
    fix: |
      $DB.query("... $1 ...", [$VAR])

XSS Detection

rules:
  - id: xss-innerhtml-assignment
    message: >
      Direct assignment to innerHTML with potentially untrusted data.
      Use textContent for text or a sanitization library for HTML.
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-079]
      owasp: [A03:2021]
      confidence: MEDIUM
      impact: HIGH
      category: security
    pattern-either:
      - pattern: $EL.innerHTML = $DATA
      - pattern: document.getElementById($ID).innerHTML = $DATA

Hardcoded Secrets

rules:
  - id: hardcoded-api-key
    message: >
      Hardcoded API key detected. Store secrets in environment
      variables or a secrets manager.
    severity: ERROR
    languages: [javascript, typescript, python]
    metadata:
      cwe: [CWE-798]
      owasp: [A02:2021]
      confidence: MEDIUM
      impact: HIGH
      category: security
    pattern-either:
      - pattern: |
          $KEY = "AKIA..."
      - pattern: |
          $KEY = "sk-..."
      - pattern: |
          $KEY = "ghp_..."
    pattern-regex: (AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{48}|ghp_[a-zA-Z0-9]{36})

Missing Authentication

rules:
  - id: express-route-missing-auth
    message: >
      Express route handler without authentication middleware.
      Add authentication middleware before the handler.
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-306]
      owasp: [A07:2021]
      confidence: MEDIUM
      impact: HIGH
      category: security
    patterns:
      - pattern-either:
          - pattern: app.post($PATH, function($REQ, $RES) { ... })
          - pattern: app.put($PATH, function($REQ, $RES) { ... })
          - pattern: app.delete($PATH, function($REQ, $RES) { ... })
          - pattern: router.post($PATH, function($REQ, $RES) { ... })
          - pattern: router.put($PATH, function($REQ, $RES) { ... })
          - pattern: router.delete($PATH, function($REQ, $RES) { ... })
      - pattern-not-inside: |
          app.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
      - pattern-not-inside: |
          router.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })

Insecure Randomness

rules:
  - id: insecure-random-for-security
    message: >
      Math.random() is not cryptographically secure. Use
      crypto.getRandomValues() or crypto.randomBytes() for
      security-sensitive random values.
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      cwe: [CWE-330]
      confidence: MEDIUM
      impact: MEDIUM
      category: security
    patterns:
      - pattern: Math.random()
      - pattern-inside: |
          function $FUNC(...) {
            ...
          }
      - metavariable-regex:
          metavariable: $FUNC
          regex: (generateToken|createSecret|randomPassword|generateKey|createSession|generateId|createNonce)

Step 4: Write Rule Tests

Test File Format

Create a test file alongside the rule:

// ruleid: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = ' + userId);

// ruleid: sql-injection-string-concat
db.query(`SELECT * FROM users WHERE id = ${userId}`);

// ok: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = $1', [userId]);

// ok: sql-injection-string-concat
db.query('SELECT * FROM users WHERE id = ?', [userId]);

Running Tests

# Test a single rule
semgrep --test --config=rules/sql-injection.yml tests/

# Test all rules
semgrep --test --config=rules/ tests/

# Validate rule syntax
semgrep --validate --config=rules/

Step 5: Rule Optimization

Performance Best Practices

  1. Be specific with patterns: Avoid overly broad matches like $X($Y)
  2. Use pattern-inside to scope: Narrow the search context
  3. Use language-specific syntax: Leverage language features
  4. Avoid deep ellipsis nesting: ... ... ... is slow
  5. Use focus-metavariable: Narrow the reported location
  6. Test with large codebases: Verify performance at scale

Reducing False Positives

  1. Add pattern-not for safe patterns: Exclude known-safe alternatives
  2. Use metavariable-regex: Constrain metavariable values
  3. Use pattern-not-inside: Exclude safe contexts
  4. Set appropriate confidence: Be honest about detection certainty
  5. Add technology metadata: Help users filter relevant rules
  6. Provide fix suggestions: When possible, include fix: field

Rule Validation Checklist

  • Rule has unique, descriptive ID
  • Message explains the issue AND remediation
  • Severity matches actual risk
  • Metadata includes CWE, OWASP, confidence, impact
  • At least 2 true positive test cases
  • At least 2 true negative test cases
  • Rule validated with semgrep --validate
  • Rule tested with semgrep --test
  • Performance acceptable on large codebase
  • Fix suggestion provided (if applicable)

Semgrep Pattern Syntax Reference

Syntax Meaning Example
$X Single metavariable eval($X)
$...X Multiple metavariable args func($...ARGS)
... Ellipsis (any statements) if (...) { ... }
<... $X ...> Deep expression match <... eval($X) ...>
pattern-either OR operator Match any of N patterns
pattern-not NOT operator Exclude specific patterns
pattern-inside Context requirement Must be inside this pattern
pattern-not-inside Context exclusion Must NOT be inside this
metavariable-regex Regex constraint Constrain $X to match regex
metavariable-comparison Numeric constraint $X > 100
focus-metavariable Narrow match location Report only $X location

Related Skills

Agent Integration

  • security-architect (primary): Custom rule development for security audits
  • code-reviewer (primary): Automated code review rule authoring
  • penetration-tester (secondary): Vulnerability detection rule creation
  • qa (secondary): Quality enforcement rule authoring

Iron Laws

  1. NEVER publish a rule without at least 2 true positive and 2 true negative test cases
  2. ALWAYS validate rule syntax with semgrep --validate before committing
  3. NEVER set confidence to HIGH without testing the rule against a real codebase
  4. ALWAYS include WHAT was found, WHY it matters, and HOW to fix it in every rule message
  5. NEVER use pattern-regex as the primary matcher — use structural patterns and constrain with metavariable-regex

Anti-Patterns

Anti-Pattern Why It Fails Correct Approach
Publishing untested rules False positives erode developer trust and rules get ignored Write test cases with // ruleid: and // ok: annotations and run semgrep --test
Setting HIGH confidence without validation Overconfident rules mislead reviewers into trusting bad signal Calibrate confidence based on measured false positive rate on real codebases
Vague rule messages Developers cannot remediate without specific guidance Include WHAT was found, WHY it matters, and HOW to fix it in every message
Overly broad patterns with no exclusions High false positive rate causes rule fatigue Add pattern-not clauses for all known-safe alternatives
Using pattern-regex as primary matcher Regex is slower and less precise than structural pattern matching Use structural patterns as primary; constrain with metavariable-regex only

Memory Protocol (MANDATORY)

Before starting: Read .claude/context/memory/learnings.md

After completing:

  • New pattern -> .claude/context/memory/learnings.md
  • Issue found -> .claude/context/memory/issues.md
  • Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.

Cross-Reference: Creator Ecosystem

This skill is part of the Creator Ecosystem. When research uncovers gaps, trigger the appropriate companion creator:

Gap Discovered Required Artifact Creator to Invoke When
Domain knowledge needs a reusable skill skill Skill({ skill: 'skill-creator' }) Gap is a full skill domain
Existing skill has incomplete coverage skill update Skill({ skill: 'skill-updater' }) Close skill exists but incomplete
Capability needs a dedicated agent agent Skill({ skill: 'agent-creator' }) Agent to own the capability
Existing agent needs capability update agent update Skill({ skill: 'agent-updater' }) Close agent exists but incomplete
Domain needs code/project scaffolding template Skill({ skill: 'template-creator' }) Reusable code patterns needed
Behavior needs pre/post execution guards hook Skill({ skill: 'hook-creator' }) Enforcement behavior required
Process needs multi-phase orchestration workflow Skill({ skill: 'workflow-creator' }) Multi-step coordination needed
Artifact needs structured I/O validation schema Skill({ skill: 'schema-creator' }) JSON schema for artifact I/O
User interaction needs a slash command command Skill({ skill: 'command-creator' }) User-facing shortcut needed
Repeated logic needs a reusable CLI tool tool Skill({ skill: 'tool-creator' }) CLI utility needed
Narrow/single-artifact capability only inline Document within this artifact only Too specific to generalize

Ecosystem Alignment Contract (MANDATORY)

This creator skill is part of a coordinated creator ecosystem. Any artifact created here must align with and validate against related creators:

  • agent-creator for ownership and execution paths
  • skill-creator for capability packaging and assignment
  • tool-creator for executable automation surfaces
  • hook-creator for enforcement and guardrails
  • rule-creator and semgrep-rule-creator for policy and static checks
  • template-creator for standardized scaffolds
  • workflow-creator for orchestration and phase gating
  • command-creator for user/operator command UX

Cross-Creator Handshake (Required)

Before completion, verify all relevant handshakes:

  1. Artifact route exists in .claude/CLAUDE.md and related routing docs.
  2. Discovery/registry entries are updated (catalog/index/registry as applicable).
  3. Companion artifacts are created or explicitly waived with reason.
  4. validate-integration.cjs passes for the created artifact.
  5. Skill index is regenerated when skill metadata changes.

Research Gate (Exa + arXiv — BOTH MANDATORY)

For new patterns, templates, or workflows, research is mandatory:

  1. Use Exa for implementation and ecosystem patterns:
    • mcp__Exa__web_search_exa({ query: '<topic> 2025 best practices' })
    • mcp__Exa__get_code_context_exa({ query: '<topic> implementation examples' })
  2. Search arXiv for academic research (mandatory for AI/ML, agents, evaluation, orchestration, memory/RAG, security):
    • Via Exa: mcp__Exa__web_search_exa({ query: 'site:arxiv.org <topic> 2024 2025' })
    • Direct API: WebFetch({ url: 'https://arxiv.org/search/?query=<topic>&searchtype=all&start=0' })
  3. Record decisions, constraints, and non-goals in artifact references/docs.
  4. Keep updates minimal and avoid overengineering.

arXiv is mandatory (not fallback) when topic involves: AI agents, LLM evaluation, orchestration, memory/RAG, security, static analysis, or any emerging methodology.

Regression-Safe Delivery

  • Follow strict RED -> GREEN -> REFACTOR for behavior changes.
  • Run targeted tests for changed modules.
  • Run lint/format on changed files.
  • Keep commits scoped by concern (logic/docs/generated artifacts).
Weekly Installs
38
GitHub Stars
16
First Seen
Feb 19, 2026
Installed on
github-copilot38
gemini-cli38
cursor38
codex37
kimi-cli37
opencode37