The Agent Skills Directory

[COMMAND_EXECUTION]: The skill uses the bun run benchmark command to execute its testing harness. This allows for local code execution within the agent's environment, which is necessary for the skill's stated purpose of benchmarking performance.
[PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it processes content from untrusted evals.json files and uses an LLM to judge the output, potentially causing the model to obey instructions embedded in the test data.
Ingestion points: Untrusted data enters the system context via user-provided evals/evals.json files referenced in the documentation.
Boundary markers: The documentation does not provide instructions for using delimiters or boundary markers to isolate the evaluation data from the agent's instructions.
Capability inventory: The skill can execute shell commands via bun and likely performs network requests to external LLM APIs for the judging process.
Sanitization: No sanitization, validation, or escaping of the user-provided prompt content or assertions is described.

benchmark-skills