model-evaluation-metrics

Pass

Audited by Gen Agent Trust Hub on Feb 18, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (LOW): The skill requests permission for 'Bash(python:*)' which allows execution of arbitrary Python code. While no malicious scripts are present in the provided file, this capability increases the potential impact of an exploit.
  • [EXTERNAL_DOWNLOADS] (LOW): Permission for 'Bash(pip:*)' is requested, allowing the installation of external packages. Without version pinning or source verification, this presents a supply chain risk.
  • [PROMPT_INJECTION] (LOW): The skill is susceptible to Indirect Prompt Injection (Category 8) because it auto-activates on 'model evaluation metrics' tasks. 1. Ingestion points: Untrusted data entered via model evaluation requests or external dataset metrics. 2. Boundary markers: Absent; there are no instructions to ignore embedded commands in the processed data. 3. Capability inventory: The skill has access to Bash, Python, Pip, Read, Write, and Edit tools. 4. Sanitization: No input sanitization or validation logic is specified in the skill body.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 18, 2026, 03:03 PM