systematic-debugging

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • PROMPT_INJECTION (HIGH): The skill creates a high-severity surface for Indirect Prompt Injection.
  • Ingestion points: External content such as error messages, logs, and stack traces are ingested in Phase 1 (SKILL.md) and during backward tracing (root-cause-tracing.md).
  • Boundary markers: Absent. There are no instructions to use delimiters or specific safety warnings when interpolating untrusted logs into the agent's context.
  • Capability inventory: The skill utilizes command execution via 'npm test' (find-polluter.sh) and instructs the agent to implement code fixes and diagnostic instrumentation (Phase 1.4, Phase 4).
  • Sanitization: Absent. The agent is expected to directly read and trace the flow of external data without filtering.
  • COMMAND_EXECUTION (LOW): The 'find-polluter.sh' script executes code locally using 'npm test'. This is a standard development tool pattern but involves dynamic assembly of commands from the local file system.
  • DYNAMIC_EXECUTION (LOW): Runtime execution of test files via shell script is used for bisection, which is low risk for this specific use case but noted as a dynamic execution pattern.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 06:49 AM