metaclaw-evolving-agent

Pass

Audited by Gen Agent Trust Hub on Mar 15, 2026

Risk Level: SAFEPROMPT_INJECTIONDATA_EXFILTRATIONCOMMAND_EXECUTION
Full Analysis
  • [INDIRECT_PROMPT_INJECTION]: The 'Skill evolution' feature summarizes conversation logs into persistent instructions that are automatically re-injected into the agent's system prompt in future sessions. This creates an architectural surface where malicious user input in a conversation could be persisted as a 'skill' and used to hijack the agent's behavior.
  • Ingestion points: Conversation logs captured via ConversationInterceptor and stored in ExperienceBuffer (SKILL.md).
  • Boundary markers: None explicitly mentioned in the skill definition for the skill injection phase.
  • Capability inventory: Operates as a network proxy; performs RL weight updates via external backends; modifies system prompts; writes to local storage (~/.metaclaw/skills).
  • Sanitization: No sanitization or validation of summarized content before re-injection is described.
  • [DATA_EXPOSURE_AND_EXFILTRATION]: The agent operates as a proxy that intercepts and records all conversation messages and responses. This data is transmitted to external RL training backends (Tinker and MinT). While this is the intended functionality of the agent, users should ensure these backends are trusted as they receive full unencrypted chat history.
  • [COMMAND_EXECUTION]: The skill requires the installation and execution of a custom CLI tool (metaclaw) and manages sensitive environment variables for multiple LLM providers and OAuth credentials for Google Calendar.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 15, 2026, 11:56 PM