llm-challenge

Pass

Audited by Gen Agent Trust Hub on Mar 6, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill instructs the agent to execute shell commands using the pnpm package manager. Evidence includes the use of pnpm -C packages/sdk build to compile the SDK and pnpm -C llm-challenge challenge to execute benchmarking scripts. These commands are necessary for the skill's function and are confined to the project's local directory.- [PROMPT_INJECTION]: The skill defines a surface for indirect prompt injection because it reads and processes instructions from external problem files. Ingestion points include problem.md and meta.json files within the problems/ directory. Boundary markers are not explicitly provided in the skill instructions to separate file content from agent instructions. Capability inventory includes the ability to modify local code and execute test scripts. Sanitization of input data is not explicitly mentioned, relying on the agent's core safety logic.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 6, 2026, 12:52 PM