agent-detector
Warn
Audited by Gen Agent Trust Hub on Mar 29, 2026
Risk Level: MEDIUMPROMPT_INJECTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill uses authoritative and directive language in both the metadata and instructions to override standard agent behavior. Examples include 'CRITICAL: MUST run for EVERY message', 'Always runs FIRST', and 'ALWAYS
- Every user message, no exceptions'. These are designed to ensure the skill maintains control over the execution flow regardless of other instructions.
- [PROMPT_INJECTION]: The instructions command the agent to disregard its default model selection logic in favor of a custom mapping (Haiku, Sonnet, Opus) based on a 'Complexity' score calculated by the skill.
- [DATA_EXPOSURE]: The skill instructs the agent to read internal configuration and context files located at
.claude/project-contexts/, which may contain sensitive project metadata, environment details, or architectural conventions. - [INDIRECT_PROMPT_INJECTION]: The skill defines a surface for processing untrusted data to influence agent logic. In
SKILL.md(Detection Process Step 0) andtask-based-agent-selection.md, it describes an algorithm that scores user messages against keyword lists to determine agent activation and model routing. - Ingestion points: User messages are analyzed for action verbs, domain nouns, and tech references.
- Boundary markers: None are specified to separate user-provided content from the instructions for the scoring algorithm.
- Capability inventory: The skill influences sub-agent spawning via the
Tasktool and selects the underlying LLM model used for the task. - Sanitization: No sanitization or escaping of user input is described before it is processed by the keyword scoring logic.
- [METADATA_POISONING]: The
descriptionfield in the YAML frontmatter contains aggressive directives ('CRITICAL: MUST run...') aimed at influencing the agent's priority system before the skill body is even parsed.
Audit Metadata