The Agent Skills Directory

[COMMAND_EXECUTION]: The skill invokes python scripts/aggregate_benchmark.py using parameters from the benchmark configuration to summarize performance metrics.
[PROMPT_INJECTION]: The skill demonstrates an indirect prompt injection surface by consuming external data from evals/evals.json and various run logs.
Ingestion points: Reads evaluation sets and trial results (grading.json, timing.json) from the workspace.
Boundary markers: None; the skill lacks specific delimiters or instructions to ignore embedded prompts in the data.
Capability inventory: Executes shell-based Python scripts and orchestrates the activity of other sub-agents.
Sanitization: There is no mention of input validation or content filtering for the ingested JSON files.

skill-forge-benchmark