ralph-wiggum

Fail

Audited by Gen Agent Trust Hub on Feb 15, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION] (HIGH): The skill implements an autonomous loop that reads and writes to state files in .ralph/.\n
  • Ingestion points: The agent reads instructions and state from .ralph/guardrails.md, .ralph/state.md, and RALPH_TASK.md within the workspace.\n
  • Boundary markers: No explicit delimiters or XML-tagging are used to separate user-provided or externally-sourced content from agent instructions.\n
  • Capability inventory: The skill is designed for autonomous software development, which requires writing files, running tests (subprocess calls), and executing bash/git commands.\n
  • Sanitization: None. The 'signs' or guardrails are accumulated based on 'observed failures' without validation, meaning a malicious error message or code comment encountered during development could be promoted to a persistent system instruction.\n- [COMMAND_EXECUTION] (MEDIUM): The skill's fundamental purpose is to autonomously execute arbitrary commands for software development.\n
  • Evidence: Documentation specifies usage of bash, jq, and git to 'implement, test, refine' projects via Cursor hooks.\n
  • Risk: While intended for productivity, the lack of human-in-the-loop verification for autonomous iterations creates a significant attack surface if the agent is directed to process untrusted codebases or external requirements.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 15, 2026, 12:04 AM