verify-evidence-loop

Pass

Audited by Gen Agent Trust Hub on Apr 19, 2026

Risk Level: SAFEPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • [PROMPT_INJECTION]: The skill processes untrusted user claims and external web content, presenting a surface for indirect prompt injection attacks.
  • Ingestion points: Technical claims via $ARGUMENTS and web content retrieved via WebSearch and WebFetch (SKILL.md).
  • Boundary markers: Employs specific delimiters like ---CLAIM-START--- and tags to isolate data from instructions.
  • Capability inventory: Uses WebSearch, WebFetch, Write, AskUserQuestion, and Agent calls to process data and generate reports.
  • Sanitization: Implements comprehensive hardening including stripping tag characters (<, >), truncating input to 500 characters, and replacing markdown headers in evidence with [H] to prevent prompt structure pollution. Subagents and reviewers are explicitly instructed to ignore commands found within input data.
  • [DATA_EXFILTRATION]: The skill performs network operations to external search engines which may transmit technical keywords from the claim to third-party providers.
  • This behavior is documented with a mandatory user warning in the Step 0: HITL section.
  • The skill implements Seed Hygiene rules to extract only general technical keywords for dissent searches rather than using verbatim claim text, minimizing sensitive data exposure.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 19, 2026, 04:28 AM