analyzing-test-effectiveness

Warn

Audited by Gen Agent Trust Hub on Mar 6, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [COMMAND_EXECUTION]: The bash scripts provided in Phase 3 use xargs -I {} sh -c '...' and unquoted variables within a while read loop to process file names. If a repository contains files with names specifically crafted to include shell metacharacters (e.g., test.js;rm -rf $HOME;), the shell will interpret those characters as commands, leading to arbitrary code execution.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it ingests untrusted production and test code to perform its analysis.
  • Ingestion points: The skill reads file paths and code content using fd and rg in Phase 1 and Phase 3.
  • Boundary markers: None. External code content is not wrapped in delimiters or accompanied by instructions to ignore embedded commands.
  • Capability inventory: The agent can execute shell commands (sh, rg, fd) and create system tasks via bd create.
  • Sanitization: The skill does not perform any escaping or validation on the ingested content before processing it or including it in task descriptions.
  • [EXTERNAL_DOWNLOADS]: The analysis process recommends using standard mutation testing tools such as Stryker (JavaScript/TypeScript), Pitest (Java), and Mutmut (Python). These are well-known technology tools used for assessing test suite reliability.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 6, 2026, 05:18 PM