model-hierarchy
Pass
Audited by Gen Agent Trust Hub on Mar 13, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The task classification system is vulnerable to indirect prompt injection. By including keywords associated with complex tasks (such as 'debug', 'architect', or 'security') in their input, an attacker can intentionally trigger the use of Tier 3 (Premium) models, resulting in a 'denial of wallet' attack that inflates operational costs.
- Ingestion points: User-provided task descriptions are ingested through the classification prompts in
SKILL.mdand theclassify_tasklogic intests/test_classification.py. - Boundary markers: No delimiters or isolation instructions are present to prevent the classification logic from acting on instructions embedded within the untrusted task data.
- Capability inventory: The skill manages model selection and routing for the session, as well as model configuration for spawned sub-agents.
- Sanitization: There is no filtering or validation of input strings before they are matched against classification keywords.
Audit Metadata