sc-evaluate

Pass

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill automatically locates and executes Python scripts within the project structure, such as scripts/run_eval.py or eval/run.py. This capability is used to run the evaluation pipeline but could execute malicious code if a script in the repository is compromised.
  • [PROMPT_INJECTION]: The skill ingests external content from gold_standards/ and test_data/ to be processed by an LLM-as-judge, creating a surface for indirect prompt injection.
  • Ingestion points: Gold standard files and pipeline outputs discovered in Phase 1 (SKILL.md).
  • Boundary markers: None identified in the skill instructions to delimit untrusted data.
  • Capability inventory: Subprocess execution of Python scripts via Bash and file writing to timestamped directories (SKILL.md).
  • Sanitization: No validation or sanitization of input data before LLM evaluation is described.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 10, 2026, 06:41 AM