The Agent Skills Directory

[COMMAND_EXECUTION]: The skill automatically locates and executes Python scripts within the project structure, such as scripts/run_eval.py or eval/run.py. This capability is used to run the evaluation pipeline but could execute malicious code if a script in the repository is compromised.
[PROMPT_INJECTION]: The skill ingests external content from gold_standards/ and test_data/ to be processed by an LLM-as-judge, creating a surface for indirect prompt injection.
Ingestion points: Gold standard files and pipeline outputs discovered in Phase 1 (SKILL.md).
Boundary markers: None identified in the skill instructions to delimit untrusted data.
Capability inventory: Subprocess execution of Python scripts via Bash and file writing to timestamped directories (SKILL.md).
Sanitization: No validation or sanitization of input data before LLM evaluation is described.

sc-evaluate