Architect
Warn
Audited by Gen Agent Trust Hub on Mar 12, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill contains a self-evolution subsystem that allows the agent to autonomously modify its own instruction files (SKILL.md and reference files).
- Evidence: File 'references/self-evolution.md' defines a 'MUTATE' phase where the agent is instructed to 'Update references/ first → then SKILL.md (bottom-up), take pre-mutation snapshot.'
- Evidence: The 'Safety Level B' classification in 'references/self-evolution.md' permits autonomous updates to existing content with self-verification, allowing the agent to persist changes to its own operational logic across sessions.
- [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because its primary function is to generate new agents based on potentially untrusted requirements.
- Ingestion points: The agent receives design requirements and gap signals from the User, Nexus, and Atlas through '_AGENT_CONTEXT' and handoffs.
- Boundary markers: The agent parses input but lacks explicit sanitization or delimiters to prevent embedded instructions in requirements from influencing the generated skill's core logic.
- Capability inventory: The agent possesses file-writing capabilities (GENERATE and MUTATE phases) used to create or modify skills on the file system.
- Sanitization: Although a 'validation-checklist.md' is used, it focuses on structure and quality metrics rather than filtering malicious instruction patterns in generated outputs.
Audit Metadata