NYC

finishing-a-development-branch

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The skill explicitly instructs the agent to run project-specific test suites such as npm test, cargo test, and pytest. These commands often execute arbitrary code defined in configuration files (like package.json or tox.ini) within the repository being worked on.
  • [PROMPT_INJECTION] (HIGH): The skill is highly susceptible to indirect prompt injection. It processes untrusted data (source code and test outputs) and uses that information to make decisions in Step 3. A malicious codebase could generate crafted test failures or include instructions in code comments that hijack the agent's decision-making process.
  • Ingestion points: Project source code, test definitions, and shell command outputs in SKILL.md (Step 1 and Step 4).
  • Boundary markers: Absent. The agent is not instructed to ignore embedded instructions within the data it processes.
  • Capability inventory: Subprocess execution (test commands), file system modification (git merge/branch), and network operations (git push, gh pr create).
  • Sanitization: Absent. There is no escaping or validation of the content processed from the repository.
  • [DATA_EXFILTRATION] (MEDIUM): The skill performs network operations via git push and gh pr create. While these are standard developer tools, they provide a mechanism to exfiltrate data to remote servers if the agent is directed to a malicious origin.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 05:13 AM