llm-challenge
Pass
Audited by Gen Agent Trust Hub on Mar 6, 2026
Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill instructs the agent to execute shell commands using the pnpm package manager. Evidence includes the use of
pnpm -C packages/sdk buildto compile the SDK andpnpm -C llm-challenge challengeto execute benchmarking scripts. These commands are necessary for the skill's function and are confined to the project's local directory.- [PROMPT_INJECTION]: The skill defines a surface for indirect prompt injection because it reads and processes instructions from external problem files. Ingestion points includeproblem.mdandmeta.jsonfiles within theproblems/directory. Boundary markers are not explicitly provided in the skill instructions to separate file content from agent instructions. Capability inventory includes the ability to modify local code and execute test scripts. Sanitization of input data is not explicitly mentioned, relying on the agent's core safety logic.
Audit Metadata