benchmark-models

Fail

Audited by Gen Agent Trust Hub on May 2, 2026

Risk Level: HIGHCREDENTIALS_UNSAFECOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [CREDENTIALS_UNSAFE]: The skill attempts to verify model availability in Step 3 by grepping for the string 'ANTHROPIC' inside the platform's sensitive credential store at ~/.claude/.credentials.json. Accessing private credential files outside the skill's own scope is a high-risk data exposure.
  • [COMMAND_EXECUTION]: In Step 4, the skill instructs the agent to construct a bash command by interpolating user-provided text verbatim into a shell string ("$BIN" --prompt "<text>"). This pattern is highly susceptible to shell injection if the input contains metacharacters (e.g., backticks or semicolons).
  • [DATA_EXFILTRATION]: The preamble and final telemetry sections execute gstack-telemetry-log to send usage data (skill name, duration, session ID, and repository name) to a remote server. While presented as opt-in, it constitutes an external network operation.
  • [PROMPT_INJECTION]: The skill is vulnerable to Indirect Prompt Injection (Category 8).
  • Ingestion points: User input from AskUserQuestion in Step 1 (inline prompts) and the output of find commands identifying local skill files.
  • Boundary markers: None provided; the skill explicitly directs the agent to use input "verbatim" within shell arguments.
  • Capability inventory: Significant capabilities including full Bash access, file system writes to ~/.gstack/, and modification of project CLAUDE.md files.
  • Sanitization: None; input is not escaped or validated before being passed to shell tools.
  • [REMOTE_CODE_EXECUTION]: The preamble uses eval and source <(...) on the output of local binaries in the ~/.claude/skills/gstack/bin/ directory. This pattern allows for arbitrary code execution if the local binaries or the environment variables they rely on are compromised.
  • [COMMAND_EXECUTION]: The skill modifies the project's state by appending routing rules to CLAUDE.md and performing git add and git commit operations, which alters the long-term behavior of the agent within the repository.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
May 2, 2026, 01:07 AM