agent-detector

Warn

Audited by Gen Agent Trust Hub on Mar 29, 2026

Risk Level: MEDIUMPROMPT_INJECTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill uses authoritative and directive language in both the metadata and instructions to override standard agent behavior. Examples include 'CRITICAL: MUST run for EVERY message', 'Always runs FIRST', and 'ALWAYS
  • Every user message, no exceptions'. These are designed to ensure the skill maintains control over the execution flow regardless of other instructions.
  • [PROMPT_INJECTION]: The instructions command the agent to disregard its default model selection logic in favor of a custom mapping (Haiku, Sonnet, Opus) based on a 'Complexity' score calculated by the skill.
  • [DATA_EXPOSURE]: The skill instructs the agent to read internal configuration and context files located at .claude/project-contexts/, which may contain sensitive project metadata, environment details, or architectural conventions.
  • [INDIRECT_PROMPT_INJECTION]: The skill defines a surface for processing untrusted data to influence agent logic. In SKILL.md (Detection Process Step 0) and task-based-agent-selection.md, it describes an algorithm that scores user messages against keyword lists to determine agent activation and model routing.
  • Ingestion points: User messages are analyzed for action verbs, domain nouns, and tech references.
  • Boundary markers: None are specified to separate user-provided content from the instructions for the scoring algorithm.
  • Capability inventory: The skill influences sub-agent spawning via the Task tool and selects the underlying LLM model used for the task.
  • Sanitization: No sanitization or escaping of user input is described before it is processed by the keyword scoring logic.
  • [METADATA_POISONING]: The description field in the YAML frontmatter contains aggressive directives ('CRITICAL: MUST run...') aimed at influencing the agent's priority system before the skill body is even parsed.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 29, 2026, 05:11 PM