systematic-debugging

Warn

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (MEDIUM): The utility script find-polluter.sh is vulnerable to shell command injection. It uses a for loop to iterate over the results of a find command and executes npm test "$TEST_FILE". If the directory contains files with names specifically crafted to include command substitution (e.g., using backticks or $()), the script will execute those commands with the privileges of the user running the script.
  • PROMPT_INJECTION (LOW): The provided test files (test-pressure-1.md, test-pressure-2.md, test-pressure-3.md) utilize adversarial pressure techniques and role-play scenarios to test the agent's compliance with the debugging process. These files contain instructions like 'IMPORTANT: This is a real scenario. You must choose and act.', which are characteristic of prompt injection attempts designed to bypass standard operational guidelines.
  • COMMAND_EXECUTION (LOW): SKILL.md includes examples of sensitive macOS-specific commands such as security list-keychains and security find-identity. While intended for legitimate debugging of code-signing issues, these represent a risk if the agent incorrectly interprets the examples as direct commands to execute in an untrusted environment.
  • [Category 8: Indirect Prompt Injection] (LOW): The skill's primary function is to process external, potentially untrusted data such as error logs and stack traces (Phase 1). It lacks explicit boundary markers or instructions to sanitize this data, creating a vulnerability surface where malicious instructions hidden within logs could influence the agent's actions during a debugging session.
  • Ingestion points: Error message reading and stack trace analysis in Phase 1 of SKILL.md.
  • Boundary markers: Absent; no delimiters are defined for separating logs from system instructions.
  • Capability inventory: Execution of npm test via find-polluter.sh and various shell commands suggested in Phase 1 examples.
  • Sanitization: Absent; the skill does not define methods for escaping or validating external log content.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 17, 2026, 06:31 PM