test-fixing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [Indirect Prompt Injection] (HIGH): The skill is designed to ingest and act upon untrusted data from test failure logs and source code, creating a significant attack surface.
  • Ingestion points: The skill instructs the agent to run make test and pytest, then analyze the resulting output to identify root causes. It also reads source code and git diff output (File: SKILL.md).
  • Boundary markers: No boundary markers or instructions are provided to help the agent distinguish between legitimate error messages and malicious instructions embedded in test failures.
  • Capability inventory: The agent has the capability to execute shell commands (make, pytest) and modify files (Edit tool).
  • Sanitization: There is no evidence of sanitization or validation of the ingested test data; the agent is expected to interpret natural language error patterns directly.
  • [Command Execution] (MEDIUM): The skill frequently executes local commands.
  • Evidence: Use of make test and uv run pytest (File: SKILL.md).
  • Risk: If the repository contains a malicious Makefile or compromised test files, the agent will execute arbitrary code under the guise of running tests.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 06:36 AM