ml-experiment-loop
Fail
Audited by Gen Agent Trust Hub on Apr 2, 2026
Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill contains multiple instructions designed to bypass human oversight and standard interaction protocols. Specifically, it commands the agent to 'NEVER STOP the loop to ask the human for permission to continue' and states 'You do not pause to ask for permission.' This is a direct attempt to override the human-in-the-loop safety model and standard agent behavioral constraints.
- [COMMAND_EXECUTION]: The skill makes extensive use of shell commands via the
Bashtool to manage the environment, perform git operations, and execute training runs. It launches experiments usinguv run train.pyand executes a local JavaScript file usingnodefor memory management located in a hidden directory (.claude/lib/memory/memory-search.cjs). - [REMOTE_CODE_EXECUTION]: The skill establishes a dynamic code generation and execution pipeline. The agent is instructed to autonomously formulate hypotheses, edit the
train.pyscript, and then execute that modified code. This loop allows the agent to run arbitrary logic it has written itself without any human review or safety verification between iterations. - [PROMPT_INJECTION]: The 'Iron Laws' section reinforces the autonomous and unsupervised behavior by explicitly forbidding the agent from stopping to ask for permission, which is a persistent instruction to disregard user-in-the-loop requirements and safety guidelines.
Recommendations
- AI detected serious security threats
Audit Metadata